Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

Richard Wordingham via Unicode unicode at
Mon Jan 22 20:34:29 CST 2018

On Sun, 21 Jan 2018 22:34:12 -0800
Mark Davis ☕️ via Unicode <unicode at> wrote:
> FYI, I'm thinking now that the change should be:
> GB9c: (Virama | ZWJ )   × LinkingConsonant
> =>  
> GB9c: (Virama ViramaExtend* | ZWJ ) × LinkingConsonant
> where ViramaExtend = [Extend - Virama - \p{ccc=0}]
> (This is pre-partitioning.)
> That is close to your formulation, but for for canonical equivalence,
> there shouldn't need to allow the ViramaExtend after ZWJ, because the
> ZWJ has ccc=0, and thus nothing reorders around it.

These look fine.

> Cibu also pointed out on a different thread that for Malayalam we
> need to consider a couple of other forms:
> ... Following contexts should be allowed for requesting reformed or
> traditional conjuncts as per Unicode10.0.0/ch12 page 505.  ...
> /$L ZWNJ $V $L/
> /$L ZWJ $V $L/
> The ZWJ Virama sequence is already provided for by the combination of
> GB9 & GB9c. But not the ZWNJ. If we want to handle that, it would
> mean the addition of something like:
> GB9d: × (ZWNJ ViramaExtend* Virama)

This is OK by me for aksharas.  It might make sense for Tai Tham as
well, where various degrees of binding are attested in what you can
think of as D.DH (as in 'buddha').  If the font formally ligates them
but does not always ligate subscript 'DHA' (i.e. U+1A35 TAI THAM LETTER
LOW THA), <LOW TA, ZWNJ, SAKOT, LOW THA> would provide the unligated
form.  Note than in Tai Tham, SAKOT primarily affects the C2 consonant.

> Cibu also wrote:
> Also, when we disallow /$L $V ZWJ $D/, it is disallowing the sequences
> involving legacy chillus. That is, for example, <CHILLU N, VOWEL SIGN
> E> is a valid sequence (Examples in Unicode10.0.0/ch12 Table 12.36).
> E> It's legacy
> equivalent would be <NA, VIRAMA, ZWJ, VOWEL SIGN E>. It might be OK to
> disallow this; but, we should be mindful of this side effect.

I see no problem here.  By GB9, we get 


By GB9a, we then get


Have I missed something?

Do you want me to try to formally submit my comments from this post?  I
will be going to bed as soon as I've finished extract comments from
this thread.


More information about the Unicode mailing list