metric for block coverage

Richard Wordingham via Unicode unicode at unicode.org
Mon Feb 19 14:41:01 CST 2018


On Mon, 19 Feb 2018 20:02:28 +0100
Philippe Verdy via Unicode <unicode at unicode.org> wrote:

> This pair of punctuation should have been considered since long as
> common punctuations (independantly of their assigned names), i.e.
> assigned the script property "Comn" and not "Deva". I don't see why
> they could not be used in non-indic scripts (because they are not
> semantically equivalent to Latin punctuations in their use).

They currently both have sc=Common, so common sense prevails here.

> I can easily imagine valid uses cases even in Latin, Greek or
> Cyrillic to properly translate poems, religious texts, or citations...

They have had scx ∍ Latn, but no longer.  It may be because  CLDR lacks
sa_Latn; perhaps someone will claim that that the dandas and double
dandas I've seen in Sanskrit verses in Latin script are actually
something else.

> Their presence in fonts designed for Indic scripts should be
> mandatory or strongly recommanded...

They're generally not necessary for scripts in whose encoding Michael
Everson has had a significant hand.  He defines script-specific
dandas.  Tai Tham has two such pairs!

>... (just like the mapping of SPACE,
> NBSP, dotted circle or blank square, and a few others listed in
> OpenType development documentation), meaning that given their
> "Common" script property we don't need to test their presence to
> compute a script coverage (any other font available could also be
> used by renderers to insert their own glyph if some Indic fonts are
> ever defective for forgetting to map glyphs to them, just like a
> renderer is allowed to substitute or infer a synthetized glyph for
> the dotted circle or blank square, or any whitespace variant, if ever
> they are not mapped, using only the basic font metrics to scale the
> glyph or infer a suitable advance width/height; the renderer just
> needs to look at the generic font metrics providing average width and
> heights and relative position of the baselines in the em-square).

Microsoft Word and the USE document the use or recommendation for quite
a few such shapes and special letters.  They make ulUnicodeRange rather
unreliable.

Note, however, that ulUnicodeRange works by Unicode range, not script.

Richard.



More information about the Unicode mailing list