Questions about Indic Conjuct Clusters
Don Hosek
don.hosek at gmail.com
Wed Apr 17 13:46:47 CDT 2024
It’s not immediately clear from the specification what the correct
implementation would be for a few pathological cases of the Indic Conjuct
Cluster specification in the Unicode 15.1.0 specification.
For convenience’s sake, let’s use the following shorthand:
C = \p{InCB=Consonant}
E = \p{InCB=Extend}
L = \p{InCB=Linker}
M = \p{M}
1. It appears that both E and L are subsets of M and I think E∪L = M .
Is this correct? If so, is GB9c equivalent to saying that CM+C should be
considered a single cluster iff that sequence of characters M+ contains at
least one character from L? (Having written this question and looking at
the statement of the rule from
https://www.unicode.org/Public/UCD/latest/ucd/auxiliary/GraphemeBreakTest.html,
my restatement seems to correspond to 9.3 in that list).
2. Should a sequence like, e.g., CLCLC be considered a single cluster or
would it be two clusters, CLCL ÷ C?
I would note also that the chart at
https://www.unicode.org/Public/UCD/latest/ucd/auxiliary/GraphemeBreakTest.html
seems
to be not quite correct.
-dh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240417/95f9f0a2/attachment.htm>
More information about the Unicode
mailing list