Tai Tham Text Encoding

Fri Aug 26 03:28:55 CDT 2022

On Sat, Jul 23, 2022 at 9:17 AM Richard Wordingham via Unicode <
unicode at corp.unicode.org> wrote:

> Most characters for writing words in the Tai Tham script in normal
> texts have been encoded, though there are a few exceptions, of which
> TAI THAM LETTER LAO LOW HA is the most prominent exception.  (This is
> mostly handled by repurposing TAI THAM LETTER LOW HA, which is not used
> in Lao.  Their relationship is like U+11034 BRAHMI LETTER LLA and
> U+11075 BRAHMI LETTER OLD LETTER LLA.)  On close reading of the TUS,
> perhaps we also need to disunify U+1A58 TAI THAM SIGN MAI KANG LAI
> depending on how it may be positioned relative to a following syllable
> with a preposed vowel.  (It was originally proposed as two separate
> characters, distinguished by shape rather than positioning.)  We may
> need some monstrosities such as 'INVISIBLE MAI SAM' (though I'd rather
> use CGJ).
>
> However, I am having a hard time persuading people that there is a
> defined encoding for combinations of characters that rendering engines
> should respect.  What I regard as the basic definition of the encoding
> of text is contained in the approved proposals, rather than in TUS or
> any emanation thereof.
>
> What should I call the specification of the encoding of text, as
> opposed to the encoding of characters?  Would it be suitable to refer
> to it as 'text encoding'?
>

How about  "text representation"?   See table 12-3 and the text around it
(TUS chap 12. p.464).
Or, would 'rendering rules' work better?

Jungshik

> I am trying to work out what in the way of Tai Tham text encoding is
> laid down by the TUS and its emanations, such as the Unicode Character
> Database. It is significant that the Indic syllabic category is
> informative and by policy does not reflect sequencing requirements.
> What I am left with is the general properties of marks, the principle
> of canonical equivalence (which is still widely flouted) and the
> specific text in the Tai Tham section.
>
> Now, extracting specifications are a bit tricky.  For example, consider
> "*Tone Marks*. Tai Tham has two combining tone marks, U+1A75 tai tham
> sign tone-1 and U+1A76 tai tham sign tone-2, which are used in Tai Lue
> and in Northern Thai. These are rendered above the vowel over the base
> consonant."  In modern Tai Khuen, what I take to be TONE-1 is rendered
> to the right of the larger vowels over the base consonant, such as
> VOWEL SIGN I.  Should I therefore conclude that what I have taken to be
> TONE-1 is something else?  That would be ridiculous.  We also have the
> statement in TUS Section 2.11 that "all sequences of character codes
> are permitted".
>
> I think I can extract some meaning from the text in the same section:
>
> "Tone marks are represented in logical order fol-
> lowing the vowel over the base consonant or consonant stack. If there
> is no vowel over a base consonant, then the tone is rendered directly
> over the consonant; this is the same way tones are treated in the Thai
> script."
>
> Consider the word ᨠᩮᩬᩥ᩵ᩁ <HIGH KA, SIGN E, SIGN OA BELOW, SIGN I,
> TONE-1> in a typical Northern Thai style.  The central stack, from top
> to bottom, is TONE-1, SIGN I, HIGH KA, SIGN OA BELOW.  If there were 'no
> vowel over the base consonant', then TONE-1 would be rendered directly
> over the base consonant, which is not how it is written.  Therefore the
> term 'vowel' refers to a vowel character rather than a complete
> phonetic vowel.  Therefore the logical order of the marks above and
> below is either <SIGN OA BELOW, SIGN I, TONE-1>, as in the
> proposals, or <SIGN I, TONE-1, SIGN OA>.  The USE insists on <SIGN I,
> SIGN OA, TONE-1>!  (The USE order could be corrected by its override
> method.)
>
> By contrast, there is some useful text on the position of U+1A7B TAI
> THAM SIGN MAI SAM in character code sequences.
>
> In summary, my main two questions are:
>
> Is 'encoding of text' the correct phrase for the definition of the
> correct arrangement?  Is it appropriate to submit a proposal for the
> standardisation of Tai Tham text encoding?
>
> Richard.
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20220826/64f4ad82/attachment.htm>