Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

Richard Wordingham via Unicode unicode at unicode.org
Thu Dec 21 15:44:49 CST 2017


On Thu, 21 Dec 2017 17:55:33 +0900
"Martin J. Dürst via Unicode" <unicode at unicode.org> wrote:

> On 2017/12/15 07:40, Richard Wordingham via Unicode wrote:
> > On Mon, 11 Dec 2017 21:45:23 +0000
> > Cibu Johny (സിബു) <cibu at google.com> wrote:  

> >> For example see the poster with word ഉസ്താദ് broken as [u,
> >> sa-virama, ta-aa, da-virama] - as it is written in the reformed
> >> style. As per the proposed algorithm, it would be [u,
> >> sa-virama-ta-aa, da-virama]. These breaks would be used by the
> >> traditional style of writing.  

> I'm not at all familiar with Malayalam, but from my experience with 
> typing Japanese (where the average kana character requires two 
> keystrokes for input, but only one for deleting) would lead to
> different advice. When typing, it is very helpful to know how many
> times one has to hit backspace when making an error. This kind of
> knowledge is usually assimilated into what one calls muscle memory,
> i.e. it is done without thinking about it. I would guess that would
> be very difficult to maintain two different kinds of muscle memory
> for typing Malayalam. (My assumption is that the populations typing
> traditional and reformed writing styles are not disjoint.)

When deleting by backspace, the usual practice is to delete one Unicode
character for each key press.  The proposed change to the definition of
grapheme clusters will not affect this.

What will change, for some systems, is stepping through Indic text in
most scripts. (The visual order scripts will be unaffected.)  In Linux
applications, one can often step to the start of each grapheme cluster,
i.e. to the breaks in |u|sa-virama|ta-aa|da-virama|. If the proposal to
expand extended grapheme clusters to whole aksharas goes through, a
likely effect for traditional Malayalam is that one will only be able to
step to the positions marked as breaks in
|u|sa-virama-ta-aa|da-virama|.  Every major system will then be in the
same position as Windows, where already only the reduced set of cursor
positions is allowed.  Thus if the 'sa' were mistyped, one would have
to retype the entire 4-character akshara.  I find this an unpleasant
prospect, and some Indians already find it extremely annoying not to be
able to edit the join between consonants, e.g. to replace <virama> by
<virama, ZWJ>.

Richard.



More information about the Unicode mailing list