Unencoded Lao Characters

Sat Mar 29 17:35:59 CDT 2014

On Sat, 29 Mar 2014 11:10:52 +0700
Theppitak Karoonboonyanan <theppitak at gmail.com> wrote,
under topic 'Pali in Thai Script':

> On Fri, Mar 28, 2014 at 4:15 PM, Richard Wordingham
> <richard.wordingham at ntlworld.com> wrote:

> > An older form of the Lao script is called the Thai Noi script.  That
> > script has many of the characters needed.  It has the characters, to
> > give them their 'standard' Unicode Indic names, GHA, NYA, TTHA, NNA,
> > DHA, BHA, and even has the Sanskrit-supporting characters SHA, SSA
> > and Vocalic R.  The lack of CHA, JHA, TTA, DDA, DDHA and LLA may be
> > due to their rarity, as with the lack of Vocalic L.
> 
> I don't think so. From my studies so far, Tai Noi script (aka. Lao
> Buhan) writing system was not so different from that of contemporary
> Lao script. Some characters are just obsolete.
> 
> In fact, I have been drafting a summarized proposal to encode Tai Noi
> script here:
> 
>   http://linux.thai.net/~thep/esaan-scripts/tn-issues/tn-encoding.html

That seems to be based on the analysis that the Tai Noi script is a
form of the Lao script.  In that case, it ought to address GHA, NYA,
TTHA, NNA, DHA and BHA as seen in inscriptions, recorded for example in
the 1979 MA thesis of Thawaj Poonotoke (ธวัช ปุณโณทก) at
http://www.khamkoo.com/uploads/9/0/0/4/9004485/thai_noi_palaeography.pdf .
The Buddhist Institute 'additions' should also be handled.  There are
several fonts around that make presumptions about their encoding in
Unicode.  I'm not convinced that the old Tai Noi and Buddhist Institute
forms of each of NYA and NNA are the same character - I suspect we may
have four characters here.  The two versions of NYA are particularly
difficult to reconcile.

My though on the subscript consonants are:

1) The Lao block already has two subscript consonants, U+0EBC LAO
SEMIVOWEL SIGN LO and U+0EBD LAO SEMIVOWEL SIGN NYO, though perhaps
the various forms of the latter need to disunified.  How does the
latter's J-shaped glyph kern?

2) If we allow the Lao script to be split between planes, subscript
forms could be accommodated in an 'Archaic Lao' block in the SMP.  This
would have the advantages that:

(a) In UTF-8, a subscript consonant would only take 4 bytes, whereas
using a coeng in the BMP would require 6 bytes, 3 for the coeng and and
3 for the consonant identity.  The memory requirement is 4 bytes for
both schemes in UTF-16.

(b) Distinct subscripts for the same letter can easily be encoded
distinctly.  For example, the Lao letters LO, DO and NO can easily be
taken to have two distinct subscript forms, and in the related Thai
Nithet script (อักษรไทยนิเทศ), formerly used in Northern Thailand, one
can argue for four forms of the cluster HO MO - the ligature HO MO (as
LAO HO MO), and HO plus (i) a purely subscript MO (gc=Mn), (ii)
subscript MO with an ascender (gc=Mc), and (iii) a borrowing of Tai
Tham <SAKOT, MA> (gc=Mn if treated as a single character).

Richard.