Unencoded Lao Characters

Theppitak Karoonboonyanan theppitak at gmail.com
Wed Apr 2 07:16:32 CDT 2014

On Sun, Mar 30, 2014 at 5:35 AM, Richard Wordingham
<richard.wordingham at ntlworld.com> wrote:
> On Sat, 29 Mar 2014 11:10:52 +0700
> Theppitak Karoonboonyanan <theppitak at gmail.com> wrote,
> under topic 'Pali in Thai Script':
>> On Fri, Mar 28, 2014 at 4:15 PM, Richard Wordingham
>> <richard.wordingham at ntlworld.com> wrote:
>> > An older form of the Lao script is called the Thai Noi script.  That
>> > script has many of the characters needed.  It has the characters, to
>> > give them their 'standard' Unicode Indic names, GHA, NYA, TTHA, NNA,
>> > DHA, BHA, and even has the Sanskrit-supporting characters SHA, SSA
>> > and Vocalic R.  The lack of CHA, JHA, TTA, DDA, DDHA and LLA may be
>> > due to their rarity, as with the lack of Vocalic L.
>> I don't think so. From my studies so far, Tai Noi script (aka. Lao
>> Buhan) writing system was not so different from that of contemporary
>> Lao script. Some characters are just obsolete.
>> In fact, I have been drafting a summarized proposal to encode Tai Noi
>> script here:
>>   http://linux.thai.net/~thep/esaan-scripts/tn-issues/tn-encoding.html
> That seems to be based on the analysis that the Tai Noi script is a
> form of the Lao script.  In that case, it ought to address GHA, NYA,
> TTHA, NNA, DHA and BHA as seen in inscriptions, recorded for example in
> the 1979 MA thesis of Thawaj Poonotoke (ธวัช ปุณโณทก) at
> http://www.khamkoo.com/uploads/9/0/0/4/9004485/thai_noi_palaeography.pdf .

I see. As said in the thesis, these Thai-borrowed characters were mostly
used by the elites who were influenced by foreign states. That's why I
don't find them in palm leaf documents which were inscribed by ordinary
people, where the characters were simply borrowed from Tham script,
not from (archaic) Thai when in use.

And, as also said in the thesis, the official letters (Bai Jum) are not
as abundant as palm leaves, and the author himself suggested that
studying the writing system used in palm leaves were more useful.
That's why most next-generation scholars, including those I consulted,
do not mention the one used by the elites in their books at all.
At least, they don't suggest it for contemporary use when the script
is revivied.

Anyway, I think we should take the elite's writing system into account
when we encode it.

> The Buddhist Institute 'additions' should also be handled.  There are
> several fonts around that make presumptions about their encoding in
> Unicode.  I'm not convinced that the old Tai Noi and Buddhist Institute
> forms of each of NYA and NNA are the same character - I suspect we may
> have four characters here.  The two versions of NYA are particularly
> difficult to reconcile.

Don't you think it's a matter of style, in the same manner that Lao Tham
share the same block with Lanna and Khun?

> My though on the subscript consonants are:
> 1) The Lao block already has two subscript consonants, U+0EBC LAO
> the various forms of the latter need to disunified.  How does the
> latter's J-shaped glyph kern?

I'd rather leave the kerning to fonts (i.e. fonts for contemporary Lao and
those for Tai Noi would kern differently). For the variations, I'm afraid it's
a matter of style again. In case one insists to use different forms in the
same document, I'm not sure how Variation Selectors fit?

> 2) If we allow the Lao script to be split between planes, subscript
> forms could be accommodated in an 'Archaic Lao' block in the SMP.  This
> would have the advantages that:
> (a) In UTF-8, a subscript consonant would only take 4 bytes, whereas
> using a coeng in the BMP would require 6 bytes, 3 for the coeng and and
> 3 for the consonant identity.  The memory requirement is 4 bytes for
> both schemes in UTF-16.
> (b) Distinct subscripts for the same letter can easily be encoded
> distinctly.  For example, the Lao letters LO, DO and NO can easily be
> taken to have two distinct subscript forms, and in the related Thai
> Nithet script (อักษรไทยนิเทศ), formerly used in Northern Thailand, one
> can argue for four forms of the cluster HO MO - the ligature HO MO (as
> LAO HO MO), and HO plus (i) a purely subscript MO (gc=Mn), (ii)
> subscript MO with an ascender (gc=Mc), and (iii) a borrowing of Tai
> Tham <SAKOT, MA> (gc=Mn if treated as a single character).

What's the difference between HO plus (i) and HO plus (ii)?
I think I haven't seen the former case yet.

Yes, the supplement block can be a good alternative, as it can
address different forms of subscripts more flexibly.

Theppitak Karoonboonyanan

More information about the Unicode mailing list