Combining Class of Thai Nonspacing_Marks
richard.wordingham at ntlworld.com
Mon Apr 3 14:19:43 CDT 2017
On Mon, 3 Apr 2017 14:12:51 +0700
"Gerriet M. Denkmann" <gerrietm at icloud.com> wrote:
> The Combining Class is used for normalisation of strings.
> Normalisation of strings is important for filenames in filesystems.
> As far as I know, a Thai consonant (Lo, Other_Letter) can have
> several Nonspacing_Marks. This cluster of nonspacing marks can
> contain at most one top/bottom vowel and at most one tone/other mark.
> There is no syntactically meaning in the order of these nonspacing
You're confusing the modern Thai language with the Thai script. It
seems that the Lao-style usage of NIKHAHIT as a vowel is known from
older Thai writing, and when used this way it could of course take a
tone mark. It also seems that the pressure to have both MAITAIKHU and
a tone mark on a consonant has been accepted for at least one minority
> So: All top/bottom vowels should have Combining Class 103, all
> tone/other marks have Combining Class 107.
> Is there a reason for having top vowels or other-marks with Combining
> Class 0, Not_Reordered?
It does one make one wonder if someone hated Thais. It would have been
a lot simpler, and have worked better, if the combining classes
for Latin diacritics had been used. As it is, one common combination
of vowel below and mark above was catered for - SARA U/UU with tone
mark. The system doesn't even cater for SARA U + THANTHAKHAT, as in
พันธุ์ทิพย์ 'Phanthip'. The use of values peculiar to Thai (103 and
107) does not help when minority languages use Latin diacritics, such
as U+0331 COMBINING MACRON BELOW and U+0303 COMBINING TILDE for Pattani
Malay. The viramas that were recognised were given combining class 9;
YAMAKKAN and THANTHAKHAT were overlooked.
One of the looming problem is that several languages use a combination
of PHINTHU and SARA I - both orders are used, though they are not
More information about the Unicode