Choosing the Set of Renderable Strings

Fri May 18 17:06:18 CDT 2018

On Thu, 17 May 2018 23:38:27 -0800
James Kass via Unicode <unicode at unicode.org> wrote:

> I wrote,
> 
> > Changing the entry order to:
> > ᨽᩮᩣᨾᨶᩣᩮ
> > <LOW PHA, SIGN E, SIGN AA, MA, NA, SIGN AA, SIGN E>
> > ... forms the NAA ligature and the vowel re-ordering matches the
> > Lamphun graphic you sent.  But that kludge probably breaks the
> > preferred encoding model/order.  
> 
> On the other hand, do the script users normally input the NAA ligature
> sequence first and then add any additional signs or marks?  If the
> users consider NAA to be a distinct "letter", then that might explain
> why a font developed by a user accomodates the ligation for the string
> "NA" + "AA" only when nothing else appears between them.  If, for
> example, there's a popular input method or keyboard driver which puts
> "NAA" on its own key, then the users will be producing data which is
> "NA" plus "AA" plus anything else.

There was a keyboard map in the zip file that you may have got the font
from,
http://www.kengtung.org/font-download/Tai-Tham-Unicode-for-PC.zip .  It
has three key symbols per key - plan, shift and capslock.  All the
combinations correspond to a single character.

There's also a zip file for a non-Unicode font,
http://www.kengtung.org/font-download/Tai-Tham-Non-Unicode-for-PC.zip
and that has a corresponding keyboard.  Now, while I haven't looked at
the font, it looks like a direct key to glyph mapping, and as I would
have expected from the pre-Unicode Wat Inn hack encoding, the English
key stroke for 'o' (the key stroke for THAI CHARACTER NO NU) yields NA
and the key stroke for 'O' yields the NAA ligature.  I may be wrong
about the relationship - the top vowel + tone ligatures seem to be
missing from the keyboard.

So, the evidence is ambiguous.

The dictionaries I have seen do not treat NAA as an indivisible
character - NAA plus subscript is treated differently depending on
whether the subscript phonetically precedes or follows the subscript
consonant.  However, the rule that homorganic subscript precedes and
others follow the vowel works pretty well.

Now, the chanting of Pali declensions, if related to writing, should
bring home via the participles in -nt- that there is a close
relationship between <NA, subscript HIGH TA> and <NAA, subscript HIGH
TA>.  It would be interesting to see how often ligation fails in
participles.

However, I think there is a different explanation for the sequence.
There are suggestions around that aksharas should be encoded with left
matras in second place.  This makes it easier for fonts.  I think we're
seeing an encoding based on ease of font design.  Now, one doesn't need
this.  If feature ss02 is enabled, the fonts of my Da Lekh family will
convert a transliteration of Tai Tham letters, numbers and marks to
ASCII back to the original Tai Tham text.  All I need is a feature
activation, which ASCII is normally has the privilege of receiving.  I
believe I could do it all by ccmp, but this feature is a fall back for
when the renderer does not support Tai Tham.

At present, Tai Tham seems to be in grave danger of breaking up into a
number of font encodings - one chooses the rendering system, and that
determines the allowed sequences, even for fairly simple words.  The
Xishuangbanna News appears to be using a visual order encoding.  I
suspect this works because syllables are separated by spaces, so they
don't have to worry about Indic rearrangement being applied despite the
lack of lookups for OTL script "lana".

Richard.