Adding Experimental Control Characters for Tai Tham

Sat Jan 25 12:41:27 CST 2020

This topic is very similar to the recent topic "How to make custom
combining diacritical marks for arabic letters?".

There is a suggestion that the encoding of Tai Tham syllables be
changed
(https://www.unicode.org/L2/L2019/19365-tai-tham-structure.pdf, by
Martin Hosken), and there is a strong desire to experiment with it.
However, unless it is to proscribe good rendering, it needs at least
two extra 'control' characters, which have been suggested as:

1A8E TAI THAM SIGN INITIAL
1A8F TAI THAM SIGN FINAL

These would follow a subscript character.  In simple cases, they
would indicate whether the subscript is part of the onset or part of
the coda of a syllable.

The idea that has been floated is that the experimentation be done by
changing the renderer, which is invoked by various applications.

However, there is the problem of script runs - these characters are not
yet in the Tai Tham script, and most applications lack a mechanism
for assigning PUA characters to a script.

However, there is a set of inherited characters which in a Tai Tham
context have not yet been assigned any meaning - the variation
selectors.  I have experimented with them, and at least in the older
versions of the HarfBuzz renderer (near Version 1.2.7), they do not
cause any problems with the implementation of the USE - no dotted
characters arise, and they can interact in shaping as suggested by a
font.

How inappropriate would it be to usurp a pair of variation selectors
for this purpose?  For mnemonic purposes, I would suggest usurping

FE0E VARIATION SELECTOR-15 for *1A8E TAI THAM SIGN INITIAL
FE0F VARIATION SELECTOR-16 for *1A8F TAI THAM SIGN FINAL

I can think of the follow relevant factors:

(a) It is a maxim of English law that a person intends the reasonable
foreseeable consequences of his actions.  By allowing grapheme cluster
boundaries between script changes, the UTC can hardly complain
loudly about inherited characters being usurped.

(b) Most subscript consonants are defined by SAKOT plus a base
consonant, and therefore the suggested control characters have the
nature of variation sequences.  The effect of these characters is,
though, mostly on how other characters are positioned relative to them,
rather than directly on the subscript characters themselves.

(c) There are 7 subscript consonants that are represented by single
characters:

U+1A55 TAI THAM CONSONANT SIGN MEDIAL RA
This seems not to need marking for position relative to the nucleus.
If it did, the marking up of logical order ᩉᩕ᩠ᩅ᩠᩶ᨿ  /huai/ <HIGH HA,
MEDIAL RA, SAKOT, WA, SAKOT, TONE-2, LOW YA> 'brook' as semi-visual
order <HIGH HA, SAKOT, WA, SAKOT, LOW YA, SIGN FINAL, MEDIAL RA, TONE-2>
would not be so simple, as SIGN FINAL should not apply to the leftmost
character, MEDIAL RA.

U+1A56 TAI THAM CONSONANT SIGN MEDIAL LA
This will have to be excluded from the experiment.  It is very rare as
a final consonant, and I suspect its exclusion will have no effect on
the experiment.

U+1A57 TAI THAM CONSONANT SIGN LA TANG LAI
This appears to be restricted to a single word, so its exclusion should
not matter at all.

U+1A5B TAI THAM CONSONANT SIGN HIGH RATHA OR LOW PA
Bizarrely, L2-19/365 treats this as a consonant modifier!  As the USE
does not require consonant modifiers to be applied to the base
consonant, this ought to have no adverse effects.  The combination
<RATA, U+1A5B 'HANG'> frequently acts as a single consonant trespassing
on the territory of HIGH RATHA, but my suggestion that the sequence be
encoded as a precomposed character was rejected.

As far as I can tell, U+1A5B is always part of the phonetic onset.   As
the only case where one might need these control characters would be an
implausible contraction *ᩁᩢ᩠ᨭᩛᩣ /rat tʰaː/ logical order <RA,
MAI SAT, SAKOT, RATA, HANG, SIGN AA> parallel to Lao contraction
ᨣᩢ᩠ᩅᩣ /kʰan waː/ 'if' logical order <LOW KA, MAI SAT, SAKOT, WA, SIGN
AA> undisambiguated semi-visual order <LOW KA, SAKOT, WA, MAI SAT, SIGN
AA>, which for Lao is rendered differently to ᨣ᩠ᩅᩢᩣ /kʰwaːk/ loɡical
order <LOW KA, SAKOT, WA, MAI SAT, SIGN AA>.  Now, the
disambiguated semi-visual order encoding for *ᩁᩢ᩠ᨭᩛᩣ is <RA, SAKOT,
RATA, SIGN FINAL, HANG>.  This is consistent with the USE if SIGN FINAL
is a variation selector, but is a seemingly needless flaw in L2-19/365
Section 5.1.1.

U+1A5C TAI THAM CONSONANT SIGN MA
This character seems only to occur immediately following
akshara-initial MA, so I think there are no issues.

U+1A5D TAI THAM CONSONANT SIGN BA
This sign is of very limited occurrence in Northern Thai.  In Lao, it
can occur as the subscript of a base consonant acting as a mater
lectionis, but I cannot see any scope for needing to mark the role of
the mark for proper rendering. 

U+1A5E TAI THAM CONSONANT SIGN SA
As this is a non-spacing mark principally used as a coda consonant, it
seems unlikely that we would need to mark the role at the experimental
stage.

(d) This scheme does not address the representation of the sequences
<MAI SAM, SIGN INITIAL> and <MAI SAM, SIGN FINAL>.  The best ideas I
have is the totally hacky sequences <MAI YAMOK, SIGN INITIAL> and <MAI
YAMOK, SIGN FINAL>.

Richard.