Romanized Singhala got great reception in Sri Lanka
ken.whistler at sap.com
Mon Mar 17 20:36:59 CDT 2014
Well, I actually don’t see. I took a look at the Sinhala you inserted in this
email. I cannot tell what you did at your input end (about “inserted all joiners”),
but there are no actual joiners in the text itself. It displayed just fine
in my email (including the correct conditional formatting of the –u vowel
applied to the ra in purukee), without me doing anything special (or installing
any hacked font). Why? Because it was transmitted in plain Unicode.
I cut and pasted that Unicode Sinhala string into a Word document, and
it worked just fine. The boundaries for all the syllables were correctly
I saved it as a plain text UTF-8 file, and it worked just
fine. I even then read the plain text UTF-8 file into a UTF-8 aware
programming editor, and it worked just fine. (In a programming editor,
which doesn’t attempt complex script rendering,
the vowels don’t apply to the consonants and no reordering is done, so
the display isn’t correct, but each character is correctly preserved, and
if I write it back out to a document and read it in Word or some other
tool that has access to proper rendering, it is still fine.) And all that
interoperability works, why? Because this is plain Unicode.
So while I don’t doubt that people may be having serious issues with
input methods for Sinhala, I tend to agree with Marc Durdin that you are confusing
encoding with input methods. Yes, I know you know the difference,
but it appears to me that the inescapable conclusion from your
argumentation is that the highest priority for the design of an
encoding system should be to make the design of input methods
as simple as possible. And in my estimation, that is confusing encoding
with input methods.
The art of input methods is to hide encoding details from users, and
instead to provide them with an abstraction that they find easy to
use and which accords with their general understanding of the writing
system they are using. If done correctly, then the details of the input
method *also* recede into the background, and users then simply
do what they want: write and edit text easily on their devices.
P.S. Here is an octal dump of that text (after I inserted a closing parenthesis in
the editor). Sinhala sequence highlighted. Plain Unicode in UTF-8,
no fancy stuff, and works just fine.
0000000000 EF BB BF 62 61 6C 75 20 76 61 6C 69 67 65 65 C2
0000000020 A0 75 C2 B5 61 20 70 75 72 75 6B 65 65 C2 A0 C3
0000000040 B0 61 61 6C 61 61 20 68 C3 A6 C3 B0 75 76 61 C3
0000000060 BE 20 6E C3 A6 C3 A6 20 C3 A6 C3 B0 65 65 20 C3
0000000100 A6 72 65 6E 6E 65 65 0D 0A 28 E0 B6 B6 E0 B6 BD
0000000120 E0 B7 94 20 E0 B7 80 E0 B6 BD E0 B7 92 E0 B6 9C
0000000140 E0 B7 9A 20 E0 B6 8B E0 B6 AB 20 E0 B6 B4 E0 B7
0000000160 94 E0 B6 BB E0 B7 94 E0 B6 9A E0 B7 9A 20 E0 B6
0000000200 AF E0 B7 8F E0 B6 BD E0 B7 8F 20 E0 B7 84 E0 B7
0000000220 90 E0 B6 AF E0 B7 94 E0 B7 80 E0 B6 AD E0 B7 8A
0000000240 20 E0 B6 B1 E0 B7 91 20 E0 B6 87 E0 B6 AF E0 B7
0000000260 9A 20 E0 B6 87 E0 B6 BB E0 B7 99 E0 B6 B1 E0 B7
0000000300 8A E0 B6 B1 E0 B7 9A 29 0D 0A 0D 0A
As you see, this is a terrible mess and cannot be straightened, granted few people use it, and there'll be more. What other choice do they have except Anglicizing?. In Singhala, they say, "balu valigee uµa purukee ðaalaa hæðuvaþ nææ æðee ærennee" (බලු වලිගේ උණ පුරුකේ දාලා හැදුවත් නෑ ඇදේ ඇරෙන්නේ <- I inserted all joiners, but can't guarantee if vowel signs would pop out). It means you cannot straighten dog tail even if you put it in a bamboo.piece. You cannot fix Unicode Singhala and sadly, it is bringing down the language with it.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode