Editing Sinhala and Similar Scripts

Richard Wordingham richard.wordingham at ntlworld.com
Tue Mar 18 20:00:57 CDT 2014

On Mon, 17 Mar 2014 18:18:50 -0500
Naena Guru <naenaguru at gmail.com> wrote:
(in topic 'Romanized Singhala got great reception in Sri Lanka')

> Typing is a nightmare.

> When you backspace it destroys multiple keystrokes.

I suspect this is a widespread and unsolved problem.  If one positions
the cursor after a character entered by multiple characters in a
previous program, there doesn't seem to be a way of undoing the
previous typing.  A Latin-1 analogy is entering e-acute by typing 'e
and then backspacing later.  That will usually simply delete the e-acute
rather than leaving the dead key apostrophe.

In some ways it may be an insoluble problem rather than merely
difficult. For example, when using KMFL to type the Tai Tham script, I
have two ways of typing the combination <U+1A49 TAI THAM LETTER HIGH
corresponding to the single character U+199C NEW TAI LUE LETTER HIGH LA
and sometimes listed as a letter in its own right):

1) !}
2) s!]

I use '!' because I can't get altGr to work in KMFL.  It works as
a dead key.  Key sequence (1) views the Tai Tham sequence as a single
character: key sequence (2) views it as the sequence of Unicode
characters.  The key stokes are based on the Thai Kesmanee keyboard.
The mnemonic for the sequences with '!' is that the single key stroke
']' results in U+1A43 TAI THAM LETTER LA.  The single, shifted key
stroke '}' results in a comma, as in the Thai keyboard.

If I position the curor after the character sequence,
what should I get after typing <backspace> and then the character '}'?
Should I get:

(a) <U+14A9, U+1A56> (by assuming input sequence 1);
(b) <U+14A9, U+14A9, U+1A56> (by assuming input sequence 2); or
(c) <U+14A9, U+002C COMMA> (what I actually get)?

> Search and
> replace is not possible, at least the way do it with English.

I suspect the problem you have have is that editing tools expect the
user to think of a combination of base character and combining mark as a
single character.  I don't know how to counter this expectation.  For
LibreOffice, I do search and replace by choosing the 'regular
expression' option, as this does allow the user to work with characters
rather than legacy grapheme clusters (UAX #29: Unicode Text


More information about the Unicode mailing list