<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Jul 23, 2022 at 9:17 AM Richard Wordingham via Unicode <<a href="mailto:unicode@corp.unicode.org" target="_blank">unicode@corp.unicode.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Most characters for writing words in the Tai Tham script in normal<br>

texts have been encoded, though there are a few exceptions, of which<br>

TAI THAM LETTER LAO LOW HA is the most prominent exception.  (This is<br>

mostly handled by repurposing TAI THAM LETTER LOW HA, which is not used<br>

in Lao.  Their relationship is like U+11034 BRAHMI LETTER LLA and<br>

U+11075 BRAHMI LETTER OLD LETTER LLA.)  On close reading of the TUS,<br>

perhaps we also need to disunify U+1A58 TAI THAM SIGN MAI KANG LAI<br>

depending on how it may be positioned relative to a following syllable<br>

with a preposed vowel.  (It was originally proposed as two separate<br>

characters, distinguished by shape rather than positioning.)  We may<br>

need some monstrosities such as 'INVISIBLE MAI SAM' (though I'd rather<br>

use CGJ). <br>

<br>

However, I am having a hard time persuading people that there is a<br>

defined encoding for combinations of characters that rendering engines<br>

should respect.  What I regard as the basic definition of the encoding<br>

of text is contained in the approved proposals, rather than in TUS or<br>

any emanation thereof.<br>

<br>

What should I call the specification of the encoding of text, as<br>

opposed to the encoding of characters?  Would it be suitable to refer<br>

to it as 'text encoding'?<br></blockquote><div><br></div><div>How about  "text representation"?   See table 12-3 and the text around it (TUS chap 12. p.464).  </div><div>Or, would 'rendering rules' work better?  </div><div><br></div><div>Jungshik</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

I am trying to work out what in the way of Tai Tham text encoding is<br>

laid down by the TUS and its emanations, such as the Unicode Character<br>

Database. It is significant that the Indic syllabic category is<br>

informative and by policy does not reflect sequencing requirements.<br>

What I am left with is the general properties of marks, the principle<br>

of canonical equivalence (which is still widely flouted) and the<br>

specific text in the Tai Tham section.<br>

<br>

Now, extracting specifications are a bit tricky.  For example, consider<br>

"*Tone Marks*. Tai Tham has two combining tone marks, U+1A75 tai tham<br>

sign tone-1 and U+1A76 tai tham sign tone-2, which are used in Tai Lue<br>

and in Northern Thai. These are rendered above the vowel over the base<br>

consonant."  In modern Tai Khuen, what I take to be TONE-1 is rendered<br>

to the right of the larger vowels over the base consonant, such as<br>

VOWEL SIGN I.  Should I therefore conclude that what I have taken to be<br>

TONE-1 is something else?  That would be ridiculous.  We also have the<br>

statement in TUS Section 2.11 that "all sequences of character codes<br>

are permitted".<br>

<br>

I think I can extract some meaning from the text in the same section:<br>

<br>

"Tone marks are represented in logical order fol-<br>

lowing the vowel over the base consonant or consonant stack. If there<br>

is no vowel over a base consonant, then the tone is rendered directly<br>

over the consonant; this is the same way tones are treated in the Thai<br>

script."<br>

<br>

Consider the word ᨠᩮᩬᩥ᩵ᩁ <HIGH KA, SIGN E, SIGN OA BELOW, SIGN I,<br>

TONE-1> in a typical Northern Thai style.  The central stack, from top<br>

to bottom, is TONE-1, SIGN I, HIGH KA, SIGN OA BELOW.  If there were 'no<br>

vowel over the base consonant', then TONE-1 would be rendered directly<br>

over the base consonant, which is not how it is written.  Therefore the<br>

term 'vowel' refers to a vowel character rather than a complete<br>

phonetic vowel.  Therefore the logical order of the marks above and<br>

below is either <SIGN OA BELOW, SIGN I, TONE-1>, as in the<br>

proposals, or <SIGN I, TONE-1, SIGN OA>.  The USE insists on <SIGN I,<br>

SIGN OA, TONE-1>!  (The USE order could be corrected by its override<br>

method.)<br>

<br>

By contrast, there is some useful text on the position of U+1A7B TAI<br>

THAM SIGN MAI SAM in character code sequences.<br>

<br>

In summary, my main two questions are:<br>

<br>

Is 'encoding of text' the correct phrase for the definition of the<br>

correct arrangement?  Is it appropriate to submit a proposal for the<br>

standardisation of Tai Tham text encoding?<br>

<br>

Richard.<br>

<br>

<br>

<br>

</blockquote></div></div>