Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

Richard Wordingham via Unicode unicode at
Sat Dec 9 14:30:17 CST 2017

On Sat, 9 Dec 2017 16:16:44 +0100
Mark Davis ☕️ via Unicode <unicode at> wrote:

> 1. You make a good point about the GB9c. It should probably instead be
> something like:
> GB9c: (Virama | ZWJ )   × Extend* LinkingConsonant
> Extend is a broader than necessary, and there are a few items that
> have ccc!=0 but not gcb=extend. But all of those look to be
> degenerate cases.

Something *like*.

Gcb=Extend includes ZWNJ and U+0D02 MALAYALAM SIGN ANUSVARA.  I believe
these both prevent a preceding candrakkala from extending an akshara -
see TUS Section 12.9 about Table 12-33.  I think Extend will have to be
split between starters and non-starters.

I believe there is a problem with the first two examples in Table
12-33.  If one suffixed <U+0D15 MALAYALAM LETTER KA, U+0D3E MALAYALAM
VOWEL SIGN AA> to the first two examples, yielding *പാലു്കാ and
*എ്ന്നാകാ, one would have three Malayalam aksharas, not two extended
grapheme clusters as the proposed rules would say. This is different to
Tai Tham, where there would indeed just be two aksharas in each word,
albit odd-looking - ᨷᩤᩃᩩ᩠ᨠᩣ and ᩑ᩠ᨶ᩠ᨶᩣᨠᩣ.  Who's checking the impact of
these changes on Malayalam?


More information about the Unicode mailing list