Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

Martin J. Dürst via Unicode unicode at
Thu Dec 21 02:55:33 CST 2017

On 2017/12/15 07:40, Richard Wordingham via Unicode wrote:
> On Mon, 11 Dec 2017 21:45:23 +0000
> Cibu Johny (സിബു) <cibu at> wrote:

>> Malayalam could be a similar story. In case of Malayalam, it can be
>> font specific because of the existence of traditional and reformed
>> writing styles. A conjunct might be a ligature in traditional; and it
>> might get displayed with explicit virama in the reformed style. For
>> example see the poster with word ഉസ്താദ് broken as [u, sa-virama,
>> ta-aa, da-virama] - as it is written in the reformed style. As per
>> the proposed algorithm, it would be [u, sa-virama-ta-aa, da-virama].
>> These breaks would be used by the traditional style of writing.
> Working round that seems to be tricky.  The best I can think of is to
> have two different locales, traditional and reformed, and hope that the
> right font is selected.  It doesn't seem at all straightforward to
> work out what the font is doing even from a character to glyph map
> without knowing what the glyphs are.  I'm not sure how one should have
> the difference designated - language variants, or two scripts?

I'm not at all familiar with Malayalam, but from my experience with 
typing Japanese (where the average kana character requires two 
keystrokes for input, but only one for deleting) would lead to different 
advice. When typing, it is very helpful to know how many times one has 
to hit backspace when making an error. This kind of knowledge is usually 
assimilated into what one calls muscle memory, i.e. it is done without 
thinking about it. I would guess that would be very difficult to 
maintain two different kinds of muscle memory for typing Malayalam. (My 
assumption is that the populations typing traditional and reformed 
writing styles are not disjoint.)

Regards,   Martin.

More information about the Unicode mailing list