Standardised Encoding of Text

Mon Aug 10 11:51:01 CDT 2015

Richard, you can always submit a document to UTC with proposed text to add to the Tai Tham block description in a future version.

Peter

-----Original Message-----
From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Richard Wordingham
Sent: Sunday, August 9, 2015 11:39 AM
To: Unicode Public <unicode at unicode.org>
Subject: Re: Standardised Encoding of Text

On Sun, 9 Aug 2015 17:10:01 +0200
Mark Davis <mark at macchiato.com> wrote:

> While it would be good to document more scripts, and more language 
> options per script, that is always subject to getting experts signed 
> up to develop them.
> 
> What I'd really like to see instead of documentation is a data-based 
> approach.
> 
> For example, perhaps the addition of real data to CLDR for a 
> "basic-validity-check" on a language-by-language basis.

One aspect this would not help with is with letter forms that do not resemble their forms in the code charts.  The code charts usually broadly answer the question "What does this code represent?".  They don't answer the question, "What code points represent this glyph?".

Problems I've seen in Tai Tham are the use of U+1A57 TAI THAM CONSONANT SIGN LA TANG LAI for the sequence <U+1A60 TAI THAM SIGN SAKOT,
U+1A43 TAI THAM LETTER LA> and of <U+1A6D TAI THAM VOWEL SIGN OY> for
<U+1A60, U+1A3F TAI THAM LETTER LOW YA>.  The problem is that the subscript forms for U+1A43 and U+1A3F are only documented in the proposals.  The subscript consonant signs probably add to the confusion of anyone working from the code chart.  The people making the errors were far from ignorant of the script.

Richard.