metric for block coverage

Philippe Verdy via Unicode unicode at unicode.org
Tue Feb 27 13:32:58 CST 2018


I agree that the 'dlng' is far better than this old legacy bitset (which
was defined in a time where all Unicode was in the BMP, and the envisioned
CJK extended blocks outside the BMP were assumed to be handled by the bits
defined for CJK).

At least 'dlng' is intended to indicate if a font supports adequately the
examplar charset set needed for each language (or language-script) rather
than a full script.
This is however challenging for rendering arbitrary text where the language
is not identified (by metadata beside the text itself, including lang=""
attributes in HTML/XML and lang() selectors in CSS, or document-level
metadata or MIME headers in HTTP or emails): many documents do not properly
tag the language they use, and don't identify all embedded foreign
languages in multilingual documents; some applications do not even have
such info (e.g. text fields in most SQL databases or files with simple
structures like CSV, dBF...), and renderers may need to use a "language
guesser" heuristic (which may turn to be wrong on short text fields, where
it will be simply be better to check if all characters are covered.

So there's no simple solution. What has been done in most OSes is to
provide a better basic set of preinstalled fonts that have good coverage,
and use them as fallbacks each time there's a problem and an application
did not indicate a specific font (or just used generic font name aliases
like "serif", "sans-serif", "monospace", "symbols"). These OSes (or
libraries in indepedant text rendering engines) also contain in their
renderers a database of rules for font fallbacks from wellknown font names
which may be replaced by other supported fonts with "similar"
characteristics and metrics.

2018-02-27 16:36 GMT+01:00 Peter Constable via Unicode <unicode at unicode.org>
:

> You have clarified what exactly the usage is; you've only asked what it
> means to cover a script.
>
> James Kass mentioned a font's OS/2 table. That is obsolete: as Khaled
> pointed out, there has never been a clear definition of "supported" and
> practice has been inconsistent. Moreover, the available bits were exhausted
> after Unicode 5.2, and we're now working on Unicode 11. Both Apple and
> Microsoft have started to use 'dlng' and 'slng' values in the 'meta' table
> of OpenType fonts to convey what a font can and is designed to support — a
> distinction that the OS/2 table never allows for, but that is actually more
> useful. (I'd also point out that, in the upcoming Windows 10 feature
> update, the 'dlng' entries in fonts is used to determine what preview
> strings to use in the Fonts settings UI.) For scripts like Latin that have
> a large set of characters, most of which have infrequent usage, there can
> still be a challenge in characterizing the font, but the mechanism does
> provide flexibility in what is declared.
>
> But again, you haven't said what data to put into fonts is your issue. If
> you are trying to determine whether a given font supports a particular
> language, the OS/2 and 'meta' table provide heuristics — with 'meta' being
> recommended; but the only way to know for absolute certain is to compare an
> exemplar character list for the particular language with the font's cmap
> table. But note, that can only tell you that a font _is able to support_
> the language, which doesn't necessarily imply that it's actually a good
> choice for users of that language. For example, every font in Windows
> includes Basic Latin characters, but that definitely doesn't mean that the
> fonts are useful for an English speaker. This is why the 'dlng' entry in
> the 'meta' table was created.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180227/4251d754/attachment.html>


More information about the Unicode mailing list