Fonts and Canonical Equivalence

Richard Wordingham via Unicode unicode at unicode.org
Sat Aug 10 09:44:04 CDT 2019


On Sat, 10 Aug 2019 11:22:01 +0100
Andrew West via Unicode <unicode at unicode.org> wrote:

> On Sat, 10 Aug 2019 at 08:29, Richard Wordingham via Unicode
> <unicode at unicode.org> wrote:
> >
> > There are similar issues with Tibetan; some fonts do not work
> > properly if a vowel below (ccc=132) is separated from the base of
> > the consonant stack by a vowel above (ccc=130).  
> 
> It's not that the fonts don't work, it's that some the rendering
> engines do not apply the OpenType features in the font that support
> both sequences of vowels (vowel-above followed by vowel-below, and
> vowel-below followed by vowel-above).

My observation was based on a Tibetan font that failed when pre-USE
HarfBuzz added or changed the normalisation for Tibetan.

> Just retested on Windows 10 with
> a Tibetan font that supports both sequences of vowels, and both
> sequences display correctly under Harfbuzz (as expected), but only
> vowel-below followed by vowel-above displays correctly when using
> built-in Windows rendering.

Does vowel above before vowel below yield a dotted circle?

According to the documentation - and the USE may have been improved in
undocumented ways - the blwf feature will not apply across a
Tibetan sequence of vowel above (VBlw) followed by vowel below (Vabv
or CMBlw), but the blws feature will, even if a dotted circle has been
added at the boundary.

> It is very frustrating that Windows cannot correctly support the
> display of Tibetan in normalized form, yet Harfbuzz does not have any
> problems. Personally, I think USE is a failed experiment, and I wish
> Microsoft would simply adopt Harfbuzz as the default rendering engine.

>From what I've seen from discussions on HarfBuzz, the USE seems to work
well for non-Indic scripts and Devanagari clones - possibly even
for Bengali clones.  It's also a definition that HarfBuzz can fall back
on.  The problems is that it doesn't address the quirks of scripts, and
its anti-spoofing measures are draconian and overdone.

There may well be an issue of funding for the USE - for all I know, it
may in part be charity work.

If Microsoft gave up on rendering engines, who would write the
rendering specifications for HarfBuzz?

I was wondering how the USE might be modified to handle canonical
equivalence.  The simplest way may be to permute the canonical
combining classes, normalise (NFD) according to these classes, and
process the rearranged string.  That's roughly what HarfBuzz does.

Another technique would be to derive regular expressions that would
match any string canonically equivalent to a string matching the
original regular expressions and use them instead.  (It may be
simpler to derive a regular expression that finds matches from amongst
normalised strings - that's what my canonical equivalence respecting
regular expression does.) Using a different canonical equivalent to the
present one could 'break' fonts whose sets of properly handled strings
were not closed under canonical equivalence - which is why I asked the
original question.

Richard.



More information about the Unicode mailing list