Multiple Preposed Marks

Richard Wordingham richard.wordingham at ntlworld.com
Tue Nov 8 02:30:25 CST 2016


TUS Section 2.11 says, "If the combining characters can interact
typographically—for example, U+0304 combining macron and  U+0308  
combining  diaeresis — then  the  order  of  graphic  display  is
determined  by  the  order  of  coded  characters  (see Table 2-5).
By  default,  the  diacritics  or other combining characters are
positioned from the base character’s glyph outward".

So, if I have two spacing combining marks E and O that are each
positioned to the left of the base (say X) in a left-to-right script,
so that the encodings <X, E> and <X, O> appear with the glyph orders
<gE, gX> and <gO, gX>, and codings <X, E, O> and <X ,O, E>, if not
total gibberish, represent a horizontal sequence of the glyphs with
gX on the right, should <X, E, O> render as <gE, gO, gX> or <gO, gE,
gX>?  The phonetics and collation (in so far as it is meaningful) of
the words provide no cue as to the order of the encoded characters.  I
have encountered both renderings.

The issue came up when I was checking, in both the Firefox and MS Edge
browsers, that my OpenType Tai Tham font Da Lekh could handle all the
headwords of two Northern Thai dictionaries. (Sparing dotted circle
deletion and orthographic syllable reunification are tricky.)  One
of the dictionaries spells a few words with a combination of the Tai and
Pali notations for the vowel /o:/ in open syllables where one might
expect to see an independent vowel.

I'm down to two other rendering engine issues - a combination of tone
mark and then vowel in 4 words, where the dictionary probably has a
misspelling, and the need for an OpenType feature (probably a cvXX) for
inconsistent handling of U+1A58 MAI KANG LAI.  The latter may be a
challenge - I couldn't persuade MS Edge to use the font's Lao shaping
for the Tai Tham script or for the Latin script in a transliteration
mode.  (That mode is triggered by feature ss02 for the Latin script, and
that works well enough in browsers.)

Richard.



More information about the Unicode mailing list