Standardised Encoding of Text

Sun Aug 9 13:38:45 CDT 2015

On Sun, 9 Aug 2015 17:10:01 +0200
Mark Davis <mark at macchiato.com> wrote:

> While it would be good to document more scripts, and more language
> options per script, that is always subject to getting experts signed
> up to develop them.
> 
> What I'd really like to see instead of documentation is a data-based
> approach.
> 
> For example, perhaps the addition of real data to CLDR for a
> "basic-validity-check" on a language-by-language basis.

One aspect this would not help with is with letter forms that do not
resemble their forms in the code charts.  The code charts usually
broadly answer the question "What does this code represent?".  They
don't answer the question, "What code points represent this glyph?".

Problems I've seen in Tai Tham are the use of U+1A57 TAI THAM
CONSONANT SIGN LA TANG LAI for the sequence <U+1A60 TAI THAM SIGN SAKOT,
U+1A43 TAI THAM LETTER LA> and of <U+1A6D TAI THAM VOWEL SIGN OY> for
<U+1A60, U+1A3F TAI THAM LETTER LOW YA>.  The problem is that the
subscript forms for U+1A43 and U+1A3F are only documented in the
proposals.  The subscript consonant signs probably add to the confusion
of anyone working from the code chart.  The people making the errors
were far from ignorant of the script.

Richard.