Editing Sinhala and Similar Scripts

Richard Wordingham richard.wordingham at ntlworld.com
Wed Mar 19 15:29:09 CDT 2014

On Wed, 19 Mar 2014 14:17:00 +0100
Daniel Bünzli <daniel.buenzli at erratique.ch> wrote:

> Le mercredi, 19 mars 2014 à 02:33, Doug Ewell a écrit :
> > There are two types of people:
> >  
> > 1. those who fully expect Backspace to erase a single keystroke,
> > and feel it is a fatal flaw if it erases an entire combination, and
> >  
> > 2. those who fully expect Backspace to erase an entire combination,
> > and feel it is a fatal flaw if it erases just a single keystroke.
> >  
> > Unfortunately, both types exist in significant numbers.

And I belong to a third group - I expect it to delete a
Unicode character.

> Isn't it possible to classify appartenance to 1 or 2 according to
> script ? E.g. I suspect most french speaking person when backspacing
> an É would like to erase the whole combination; for é it seems even
> more obvious since usually it's introduced with a single keystroke.

It's not as simple as script.  For an English speaker who enters it on
a keyboard, it's normally entered with multiple keystrokes, most
typically via a dead key.  Now, if I type it in using an out of order
sequence such as 'e, it is quite reasonable for it to be stored as a
single composed character and deleted by backspace.  On the other
hand, if I type it in using an XSAMPA-based keyboard sequence such as
e_H, I expect the backspace to delete just the accent, just as I am
used to for the sequence O_H which yields 2 characters, open o with
acute (ɔ́).  The diacritic here would not not arbitrary - I would
be using it to indicate a specific tone.  (It came as a nasty shock to
find my e-mail client, Claws on Ubuntu, takes out the entire cluster.
For Thai legacy grapheme clusters, it just takes out the last character
entered.)  At the moment I have made my life more difficult for myself
by devising a keyboard that generates NFC if the key strokes are in the
right order. 

As a reasonable guide, backspace should not take out more than one NFC
character, and I would defend this even for Cyrillic-script tone marking
in Serbian.

Now, there's supposed to be an interface definition for using
incremental keyboard typing as in Keyman, where keyboards can be
arranges so that one sees what's been typed in already.  Where is it?
It is rather important for an application to know when it can normalise
input characters.  For example, LibreOffice helpfully swaps round a
tone Thai mark with a following vowel mark below, with the slightly
bizarre consequence that the sequence ko kai, mai ek, sara u, backspace
yields <KO KAI, SARA U>.  Traditionally, the sequence yields a beep and
just <KO KAI> - the input handler rejects the SARA U because it does
not accord with the character order prescribed by WTT (wing thuk thi).


More information about the Unicode mailing list