What is the time frame for USE shapers to provide support for CV+C ?

梁海 Liang Hai via Unicode unicode at unicode.org
Sun Jun 23 10:33:59 CDT 2019


> (1) When can we anticipate that the USE spec will be updated to provide support for subjoined consonants below vowels (as required for TAI THAM) ?

• The exact scope is actually about allowing conjoined consonant forms (either encoded with a stacker, or encoded atomically?) after vowel signs in an encoded cluster.

> ** A good use case is the Tai Tham word U+1A27 U+1A6A U+1A60 U+1A37 , transcribed to Central Thai script as จูบ, (to kiss). Currently, people are writing this as U+1A27 U+1A60 U+1A37 U+1A6A ("จบู") which violates the "phonetic ordering" but is the current workaround because USE is still broken for TAI THAM.

• I agree with Richard on that this is really not a good use case. This word (as long as it is written with the vowel sign Uu either under or after the conjoined consonant sign B) should really be encoded as <High Ca, stacker, Ba, sign Uu>, according to our best understanding today.

• The “phonetic ordering” principle of Unicode is a frequently misinterpreted one. Note that when there are multiple ways of interpreting the phonetic order of a written structure, we try to stick to the more graphically apparent order, in order to have a stable encoding order.

> An example of the contrast is shown in the attached files luynam.png, with first orthographic syllable <LA, SIGN U, SAKOT, LOW YA>, and yukya.png, with the first orthographic syllable <HIGH HA, SAKOT, LOW YA, SIGN U>.

• Right. I was always wondering to what extent this distinction happens as an orthographically conscious choice.

• Generally I feel, when at least one of the interacting signs (usually a consonant one and a vowel one) has inline advance, it should be safe to take a graphic order approach. The “6th preliminary recommendation” doesn’t have the luynam vs yukya case taken into consideration mostly only because we wasn’t sure about what good attestations are there.

> * Create new SAKOT class SAKOT (Sk) based on UISC = Invisible_Stacker
> * Reduced HALANT class Now only HALANT (H) based on UISC = Virama

• This feels like an undesirable Tham-specific relaxation. Note the artificial distinction between UISC Invisible_Stacker and Virama has nothing to do with whether graphically writing a consonant sign after a vowel sign is attested for a script. (কা)

• At least we need to look into USE-applicable (existing and future) scripts encoded with a Virama and see if any of them does need the relaxation.

> * Updated Standard cluster mode [< R | CS >] < B | GB > [VS] (CMAbv)* (CMBlw)* (< < H | Sk > B | SUB > [VS] (CMAbv)* (CMBlw)) [MPre] [MAbv] [MBlw] [MPst] (VPre)* (VAbv)* (VBlw)* (VPst)* (VMPre)* (VMAbv)* (VMBlw)* (VMPst)* (Sk B)* (FAbv)* (FBlw)* (FPst)* [FM]


• I’m still trying to think about the possibility of only relaxing the cluster when either/both of <vowel sign, consonant sign> has post-base advance…

• The artificial distinction made between < H | Sk > B, SUB, and CM really needs to be resolved together with the relaxation.

> * Updated Halant-terminated cluster [< R | CS >] < B | GB > [VS] (CMAbv)* (CMBlw)* (< < H | Sk > B | SUB > [VS] (CMAbv)* (CMBlw)) < H | Sk >


• So, the intention of allowing Sk at the end is only about allowing the glyph of Sk to be positioned on the preceding character(s), right?

> * New Sakot-terminated cluster [< R | CS >] < B | GB > [VS] (CMAbv)* (CMBlw)* (< < H | Sk > B | SUB > [VS] (CMAbv)* (CMBlw)) [MPre] [MAbv] [MBlw] [MPst] (VPre)* (VAbv)* (VBlw)* (VPst)* (VMPre)* (VMAbv)* (VMBlw)* (VMPst)* (Sk B [VS] (CMAbv)* (CMBlw)) Sk


• The “(Sk B [VS] (CMAbv)* (CMBlw)) Sk” part doesn’t seem to align with the updated Standard cluster’s “(Sk B)*”?

> I trust you'll be reclassifying U+1A55 TAI THAM CONSONANT SIGN MEDIAL RA and U+1A56 TAI THAM CONSONANT SIGN MEDIAL LA into the category SUB so that we can write about bananas forever (ᨠᩖ᩠ᩅ᩠᩶ᨿᨲᩕ᩠ᩃᩬᨯ): <HIGH KA, MEDIAL LA, SAKOT, WA, TONE-2, SAKOT, LOW YA> /kluai/ 'banana' <HIGH TA, MEDIAL RA, SAKOT, LA, SIGN OA BELOW, DA> /tʰalɔːt/ 'for ever' The issues here are that WA in a medial rôle is indistinguishable from a coda ('sakot') consonant and that MEDIAL RA can act as a consonant aspirator.

• The issues here are:

	• Medial consonant sign characters of Tham are not encoded based on a clear phono-orthographical distinction.

	• Tham allows syllable chaining that does not rely on a preceding inline coda letter.

• Consonant sign Medial Ra being a consonant aspirator here is not relevant to its appearance before a non-medial consonant sign here.

Best,
梁海 Liang Hai
https://lianghai.github.io

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20190623/837eec7d/attachment.html>


More information about the Unicode mailing list