metric for block coverage

Philippe Verdy via Unicode unicode at
Mon Feb 19 13:02:28 CST 2018

This pair of punctuation should have been considered since long as common
punctuations (independantly of their assigned names), i.e. assigned the
script property "Comn" and not "Deva". I don't see why they could not be
used in non-indic scripts (because they are not semantically equivalent to
Latin punctuations in their use).

I can easily imagine valid uses cases even in Latin, Greek or Cyrillic to
properly translate poems, religious texts, or citations without
transforming them in inaccurate full stops, colons, semi-colons, commas, or
even exclamation marks (such transform is an interpretation by the
translator), where they would typically be used along with surrounding
spaces and not glued to Latin/Greek/Cyrillic words. Such use in Latin would
be part of "extended Latin", but if these punctuations are "Common", this
is not so much extended, and many fonts could have these two simple
punctuations (which do not need any "complex" feature in OpenType).

Their presence in fonts designed for Indic scripts should be mandatory or
strongly recommanded (just like the mapping of SPACE, NBSP, dotted circle
or blank square, and a few others listed in OpenType development
documentation), meaning that given their "Common" script property we don't
need to test their presence to compute a script coverage (any other font
available could also be used by renderers to insert their own glyph if some
Indic fonts are ever defective for forgetting to map glyphs to them, just
like a renderer is allowed to substitute or infer a synthetized glyph for
the dotted circle or blank square, or any whitespace variant, if ever they
are not mapped, using only the basic font metrics to scale the glyph or
infer a suitable advance width/height; the renderer just needs to look at
the generic font metrics providing average width and heights and relative
position of the baselines in the em-square).

2018-02-19 15:58 GMT+01:00 Bobby de Vos via Unicode <unicode at>:

> On 2018-02-18 12:10, Richard Wordingham via Unicode wrote:
> It's only a single bit without a meaning beyond "range is considered
> functional".  No "basic coverage" vs "good coverage" vs "full
> coverage".
> It's worse than that when a script uses characters primarily
> associated with another script.  For example, to have any confidence
> that my Tai Tham font will be used for U+0E4A THAI CHARACTER MAI
> LETTER A, I have to set the Thai bit, even though I only have four Thai
> characters in my font.  (The other two are punctuation.)
> Indic scripts (other than Devanagari) also use a few characters from
> another block. Specifically, two punctuation characters (from the
> Devanagari block)
> are expected to be used with the non-Devanagari Indic scripts. Looking at
> the fonts Noto Sans Kannada and Noto Sans Tamil, the expected Unicode range
> bit is set for Kannada or Tamil, but not Devanagari, even though those
> fonts contain U+0964 and U+0965.
> Bobby
> --
> Bobby de Vos
> *bobby_devos at <bobby_devos at>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Unicode mailing list