Richard Wordingham via Unicode
unicode at unicode.org
Sun Jun 3 05:50:54 CDT 2018
On Sun, 3 Jun 2018 04:31:32 +0100
Richard Wordingham via Unicode <unicode at unicode.org> wrote:
> However, the text is actually in the Tham script, and without any
> line-breaking controls, the first and third examples read, marking the
> grapheme cluster boundaries with '|', as ᨾ᩠ᨿᩮ <U+1A3E TAI THAM LETTER
> MA, U+1A60 TAI THAM SIGN SAKOT | U+1A3F TAI THAM LETTER LOW YA, U+1A6E
> TAI THAM VOWEL SIGN E> and ᩉ᩠ᩅᩱ <U+1A4C TAI THAM LETTER LOW HA, U+1A60
> TAI THAM SIGN SAKOT | U+1A45 TAI THAM LETTER WA, U+1A71 TAI THAM VOWEL
> SIGN AI>.
What I have marked is the *extended* grapheme cluster boundaries.
There is a *legacy* grapheme cluster break before the vowel sign. This
may make line-breaking after Indic re-ordering a bit easier. However,
in the Lao language, we have sequences in Tham such as <consonant | left
matra, top matra, ...> ('|' = legacy grapheme break), and I now fully
expect there to be renderings such as:
<left matra>, break, <consonant, top matra, ...>
There seems to be an example about the string hole in the middle line
of BAD-13-1-0100 in Figure 5.4 on p222 of Bounleuth's dissertation
but I'm not confident of my reading of the split word as <U+1A2F TAI
THAM LETTER DA | U+1A6E TAI THAM VOWEL SIGN E, U+1A65 TAI THAM VOWEL
SIGN I, U+1A60 TAI THAM SIGN SAKOT | U+1A36 TAI THAM LETTER NA>.
Theppitak would be able to confirm or refute, but he doesn't often
participate in this forum.
More information about the Unicode