What is the time frame for USE shapers to provide support for CV+C ?

Richard Wordingham via Unicode unicode at unicode.org
Mon May 13 21:08:04 CDT 2019

On Tue, 14 May 2019 00:58:07 +0000
Andrew Glass via Unicode <unicode at unicode.org> wrote:

> Here is the essence of the initial changes needed to support CV+C.
> Open to feedback.
>   *   Create new SAKOT class
> SAKOT (Sk) based on UISC = Invisible_Stacker
>   *   Reduced HALANT class
> Now only HALANT (H) based on UISC = Virama
>   *   Updated Standard cluster mode
> [< R | CS >] < B | GB > [VS] (CMAbv)* (CMBlw)* (< < H | Sk > B | SUB
> > [VS] (CMAbv)* (CMBlw)*)* [MPre] [MAbv] [MBlw] [MPst] (VPre)*
> > (VAbv)* (VBlw)* (VPst)* (VMPre)* (VMAbv)* (VMBlw)* (VMPst)* (Sk B)*
> > (FAbv)* (FBlw)* (FPst)* [FM]

This comes a lot closer to supporting Tai Tham monosyllabic clusters.

Although this shouldn't affect Tai Tham, some of those medials need to
be made repeatable; I belief this has already been done in HarfBuzz.

I trust you'll be reclassifying U+1A55 TAI THAM CONSONANT SIGN MEDIAL RA
and U+1A56 TAI THAM CONSONANT SIGN MEDIAL LA into the category SUB so
that we can write about bananas forever (ᨠᩖ᩠ᩅ᩠᩶ᨿᨲᩕ᩠ᩃᩬᨯ):

<HIGH KA, MEDIAL LA, SAKOT, WA, TONE-2, SAKOT, LOW YA> /kluai/ 'banana'

<HIGH TA, MEDIAL RA, SAKOT, LA, SIGN OA BELOW, DA> /tʰalɔːt/ 'for ever'

The issues here are that WA in a medial rôle is indistinguishable from
a coda ('sakot') consonant and that MEDIAL RA can act as a consonant

Unfortunately, we didn't define a consonant HIGH RATTHA with a
canonical decomposition to <U+1A2D RATA, U+1A5B SIGN HIGH RATHA OR LOW
PA>.  The problem is that 'HIGH RATTHA', widely seen as an alternative
form of HIGH RATHA, can act as a subscript coda consonant.  There are
also a couple of words in the Northern Thai Dictionary of Palm-Leaf
Manuscripts where MEDIAL LA acts as a coda consonant.  Together,
these call for (Sk B)* to be replaced by (<Sk B | SUB>).

This next question does not, I believe, affect HarfBuzz.  Will NFC
code render as well as unnormalised code?  In the first example above,
<TONE-2, SAKOT, LOW YA> normalises to <SAKOT, TONE-2, LOW YA>, which
does not match any portion of the regular expression.


More information about the Unicode mailing list