Romanized Singhala got great reception in Sri Lanka

Whistler, Ken ken.whistler at sap.com
Mon Mar 17 20:36:59 CDT 2014


Well, I actually don’t see. I took a look at the Sinhala you inserted in this
email. I cannot tell what you did at your input end (about “inserted all joiners”),
but there are no actual joiners in the text itself. It displayed just fine
in my email (including the correct conditional formatting of the –u vowel
applied to the ra in purukee), without me doing anything special (or installing
any hacked font). Why? Because it was transmitted in plain Unicode.

I cut and pasted that Unicode Sinhala string into a Word document, and
it worked just fine. The boundaries for all the syllables were correctly
detected.

I saved it as a plain text UTF-8 file, and it worked just
fine. I even then read the plain text UTF-8 file into a UTF-8 aware
programming editor, and it worked just fine. (In a programming editor,
which doesn’t attempt complex script rendering,
the vowels don’t apply to the consonants and no reordering is done, so
the display isn’t correct, but each character is correctly preserved, and
if I write it back out to a document and read it in Word or some other
tool that has access to proper rendering, it is still fine.) And all that
interoperability works, why? Because this is plain Unicode.

So while I don’t doubt that people may be having serious issues with
input methods for Sinhala, I tend to agree with Marc Durdin that you are confusing
encoding with input methods. Yes, I know you know the difference,
but it appears to me that the inescapable conclusion from your
argumentation is that the highest priority for the design of an
encoding system should be to make the design of input methods
as simple as possible. And in my estimation, that is confusing encoding
with input methods.

The art of input methods is to hide encoding details from users, and
instead to provide them with an abstraction that they find easy to
use and which accords with their general understanding of the writing
system they are using. If done correctly, then the details of the input
method *also* recede into the background, and users then simply
do what they want: write and edit text easily on their devices.

--Ken

P.S. Here is an octal dump of that text (after I inserted a closing parenthesis in
the editor). Sinhala sequence highlighted. Plain Unicode in UTF-8,
no fancy stuff, and works just fine.

0000000000    EF  BB  BF  62  61  6C  75  20  76  61  6C  69  67  65  65  C2
0000000020    A0  75  C2  B5  61  20  70  75  72  75  6B  65  65  C2  A0  C3
0000000040    B0  61  61  6C  61  61  20  68  C3  A6  C3  B0  75  76  61  C3
0000000060    BE  20  6E  C3  A6  C3  A6  20  C3  A6  C3  B0  65  65  20  C3
0000000100    A6  72  65  6E  6E  65  65  0D  0A  28  E0  B6  B6  E0  B6  BD
0000000120    E0  B7  94  20  E0  B7  80  E0  B6  BD  E0  B7  92  E0  B6  9C
0000000140    E0  B7  9A  20  E0  B6  8B  E0  B6  AB  20  E0  B6  B4  E0  B7
0000000160    94  E0  B6  BB  E0  B7  94  E0  B6  9A  E0  B7  9A  20  E0  B6
0000000200    AF  E0  B7  8F  E0  B6  BD  E0  B7  8F  20  E0  B7  84  E0  B7
0000000220    90  E0  B6  AF  E0  B7  94  E0  B7  80  E0  B6  AD  E0  B7  8A
0000000240    20  E0  B6  B1  E0  B7  91  20  E0  B6  87  E0  B6  AF  E0  B7
0000000260    9A  20  E0  B6  87  E0  B6  BB  E0  B7  99  E0  B6  B1  E0  B7
0000000300    8A  E0  B6  B1  E0  B7  9A  29  0D  0A  0D  0A

As you see, this is a terrible mess and cannot be straightened, granted few people use it, and there'll be more. What other choice do they have except Anglicizing?. In Singhala, they say, "balu valigee uµa purukee ðaalaa hæðuvaþ nææ æðee ærennee" (බලු වලිගේ උණ පුරුකේ දාලා හැදුවත් නෑ ඇදේ ඇරෙන්නේ <- I inserted all joiners, but can't guarantee if vowel signs would pop out). It means you cannot straighten dog tail even if you put it in a bamboo.piece. You cannot fix Unicode Singhala and sadly, it is bringing down the language with it.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140318/f87b1e57/attachment.html>


More information about the Unicode mailing list