What is the time frame for USE shapers to provide support for CV+C ?
Andrew Glass via Unicode
unicode at unicode.org
Mon May 13 19:58:07 CDT 2019
Here is the essence of the initial changes needed to support CV+C. Open to feedback.
* Create new SAKOT class
SAKOT (Sk) based on UISC = Invisible_Stacker
* Reduced HALANT class
Now only HALANT (H) based on UISC = Virama
* Updated Standard cluster mode
[< R | CS >] < B | GB > [VS] (CMAbv)* (CMBlw)* (< < H | Sk > B | SUB > [VS] (CMAbv)* (CMBlw)*)* [MPre] [MAbv] [MBlw] [MPst] (VPre)* (VAbv)* (VBlw)* (VPst)* (VMPre)* (VMAbv)* (VMBlw)* (VMPst)* (Sk B)* (FAbv)* (FBlw)* (FPst)* [FM]
The only required component of a standard cluster is a BASE or BASE_OTHER. A cluster may optionally begin with a REPH or CONS_WITH_STACKER. A BASE or BASE_OTHER may be followed immediately by a VARIATION_SELECTOR and/or multiple CONS_MOD characters in the order CONS_MOD_ABOVE CONS_MOD_BELOW. Multiple sequences of a HALANT BASE or SAKOT BASE with optional VARIATION_SELECTOR or optional CONS_MOD can occur. The sequence can continue with zero or one CONS_MED for each cardinal position (Pre, Above, Below, Post); zero to many VOWEL characters in each cardinal position; zero to many VOWEL_MODs in each cardinal position; zero to many sequences of SAKOT BASE; zero to many CONS_FINALs in each of Above, Below, and Post; and lastly, an optional FINAL_MOD.
* Updated Halant-terminated cluster
[< R | CS >] < B | GB > [VS] (CMAbv)* (CMBlw)* (< < H | Sk > B | SUB > [VS] (CMAbv)* (CMBlw)*)* < H | Sk >
This is similar to the Standard cluster but terminates in a final HALANT or SAKOT after a BASE, BASE_OTHER, or CONS_MOD. When such a HALANT or SAKOT it will form a cluster. When any character other than a BASE or BASE_OTHER follows the HALANT or SAKOT there will be a cluster break between the HALANT or SAKOT and the following character. Multiple sequences of a HALANT BASE or SAKOT BASE with optional VARIATION_SELECTOR or optional CONS_MOD can occur. A CONS_SUBJ is equivalent to the sequence HALANT BASE.
* New Sakot-terminated cluster
[< R | CS >] < B | GB > [VS] (CMAbv)* (CMBlw)* (< < H | Sk > B | SUB > [VS] (CMAbv)* (CMBlw)*)*
[MPre] [MAbv] [MBlw] [MPst]
(VPre)* (VAbv)* (VBlw)* (VPst)*
(VMPre)* (VMAbv)* (VMBlw)* (VMPst)*
(Sk B [VS] (CMAbv)* (CMBlw)*)* Sk
This is similar to the Standard cluster but terminates in a final SAKOT after a VOWEL or VOWEL_MOD. When such a SAKOT follows a VOWEL or VOWEL_MOD it will form a cluster. When any character other than a BASE or BASE_OTHER follows this SAKOT there will be a cluster break between the SAKOT and the following character. Multiple sequences of a SAKOT BASE with optional VARIATION_SELECTOR or optional CONS_MOD can occur. A CONS_SUBJ is equivalent to the sequence HALANT BASE.
This would allow a consonant to follow a vowel when joined with a Sakot. It would support multiple final consonants. It would not support polysyllabic chaining of CV+CV+CV etc.
Cheers,
Andrew
From: Behdad Esfahbod <behdad at behdad.org>
Sent: 10 May 2019 11:32
To: Ed Trager <ed.trager at gmail.com>
Cc: Andrew Glass <Andrew.Glass at microsoft.com>; Unicode Mailing List <unicode at unicode.org>
Subject: Re: What is the time frame for USE shapers to provide support for CV+C ?
I'm open to doing that if there's consensus on how it should be done.
On Thu, May 9, 2019 at 8:55 AM Ed Trager <ed.trager at gmail.com<mailto:ed.trager at gmail.com>> wrote:
Hi, Andrew and Behdad,
Prompted by a conversation I had with Liang Hai yesterday, I am just curious to get some idea about the following:
(1) When can we anticipate that the USE spec will be updated to provide support for subjoined consonants below vowels (as required for TAI THAM) ?
(2) Once the USE spec is updated, how much lag time can we expect until Microsoft actually releases an implementation with said support for CV+C ?
(3a) And the related question —for Behdad and the HarfBuzz development group— is when can we expect to see CV+C support (at least for TAI THAM) in HarfBuzz ?
(3b) Would the HarfBuzz team consider providing CV+C support for TAI THAM even before the USE spec gets updated, so that we could test things out ? * **
---------------------------------------
* PLEASE AND THANKYOU?
** A good use case is the Tai Tham word U+1A27 U+1A6A U+1A60 U+1A37 , transcribed to Central Thai script as จูบ, (to kiss). Currently, people are writing this as U+1A27 U+1A60 U+1A37 U+1A6A ("จบู") which violates the "phonetic ordering" but is the current workaround because USE is still broken for TAI THAM.
REFERENCE DOCUMENT:
http://www.unicode.org/L2/L2018/18332-tai-tham-ad-hoc-report.pdf<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.unicode.org%2FL2%2FL2018%2F18332-tai-tham-ad-hoc-report.pdf&data=02%7C01%7CAndrew.Glass%40microsoft.com%7Cc068e18210314e1e3c3208d6d575d3ac%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636931099374714582&sdata=U6xDQJs6Srh8dfwogdoH4yr%2FrkAoxspXpSWNcYEo0f0%3D&reserved=0>
--
behdad
http://behdad.org/<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fbehdad.org%2F&data=02%7C01%7CAndrew.Glass%40microsoft.com%7Cc068e18210314e1e3c3208d6d575d3ac%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636931099374724592&sdata=LIJyn9L1qVTUSi14GQoSXLt0nBL%2Bp%2BWa5Ua9NZTqPYI%3D&reserved=0>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20190514/bb54a384/attachment.html>
More information about the Unicode
mailing list