What is the time frame for USE shapers to provide support for CV+C ?
Richard Wordingham via Unicode
unicode at unicode.org
Mon May 13 21:08:04 CDT 2019
On Tue, 14 May 2019 00:58:07 +0000
Andrew Glass via Unicode <unicode at unicode.org> wrote:
> Here is the essence of the initial changes needed to support CV+C.
> Open to feedback.
> * Create new SAKOT class
> SAKOT (Sk) based on UISC = Invisible_Stacker
> * Reduced HALANT class
> Now only HALANT (H) based on UISC = Virama
> * Updated Standard cluster mode
> [< R | CS >] < B | GB > [VS] (CMAbv)* (CMBlw)* (< < H | Sk > B | SUB
> > [VS] (CMAbv)* (CMBlw)*)* [MPre] [MAbv] [MBlw] [MPst] (VPre)*
> > (VAbv)* (VBlw)* (VPst)* (VMPre)* (VMAbv)* (VMBlw)* (VMPst)* (Sk B)*
> > (FAbv)* (FBlw)* (FPst)* [FM]
This comes a lot closer to supporting Tai Tham monosyllabic clusters.
Although this shouldn't affect Tai Tham, some of those medials need to
be made repeatable; I belief this has already been done in HarfBuzz.
I trust you'll be reclassifying U+1A55 TAI THAM CONSONANT SIGN MEDIAL RA
and U+1A56 TAI THAM CONSONANT SIGN MEDIAL LA into the category SUB so
that we can write about bananas forever (ᨠᩖ᩠ᩅ᩠᩶ᨿᨲᩕ᩠ᩃᩬᨯ):
<HIGH KA, MEDIAL LA, SAKOT, WA, TONE-2, SAKOT, LOW YA> /kluai/ 'banana'
<HIGH TA, MEDIAL RA, SAKOT, LA, SIGN OA BELOW, DA> /tʰalɔːt/ 'for ever'
The issues here are that WA in a medial rôle is indistinguishable from
a coda ('sakot') consonant and that MEDIAL RA can act as a consonant
Unfortunately, we didn't define a consonant HIGH RATTHA with a
canonical decomposition to <U+1A2D RATA, U+1A5B SIGN HIGH RATHA OR LOW
PA>. The problem is that 'HIGH RATTHA', widely seen as an alternative
form of HIGH RATHA, can act as a subscript coda consonant. There are
also a couple of words in the Northern Thai Dictionary of Palm-Leaf
Manuscripts where MEDIAL LA acts as a coda consonant. Together,
these call for (Sk B)* to be replaced by (<Sk B | SUB>).
This next question does not, I believe, affect HarfBuzz. Will NFC
code render as well as unnormalised code? In the first example above,
<TONE-2, SAKOT, LOW YA> normalises to <SAKOT, TONE-2, LOW YA>, which
does not match any portion of the regular expression.
More information about the Unicode