Combining Class of Thai Nonspacing_Marks

Gerriet M. Denkmann gerrietm at icloud.com
Mon Apr 3 02:12:51 CDT 2017


The Combining Class is used for normalisation of strings.
Normalisation of strings is important for filenames in filesystems.

As far as I know, a Thai consonant (Lo, Other_Letter) can have several Nonspacing_Marks.
This cluster of nonspacing marks can contain at most one top/bottom vowel and at most one tone/other mark.
There is no syntactically meaning in the order of these nonspacing marks.

So: All top/bottom vowels should have Combining Class 103, all tone/other marks have Combining Class 107.

Is there a reason for having top vowels or other-marks with Combining Class 0, Not_Reordered?

With the current choice of Combining Class both consonant + mark + top vowel and consonant + top vowel + mark are normalised, so that one can have two files with these (identically looking, but different) names, which is rather confusing.

Here a list of all nonspacing marks in the Thai script:

top vowels (Combining Class 0, Not_Reordered):  ← this seems to be wrong; should be 103
THAI CHARACTER MAI HAN-AKAT	ั
THAI CHARACTER SARA I	ิ
THAI CHARACTER SARA II	ี
THAI CHARACTER SARA UE	ึ
THAI CHARACTER SARA UEE	ื

bottom vowels (Combining Class 103):
THAI CHARACTER SARA U	ุ
THAI CHARACTER SARA UU	ู

tone-marks (Combining Class 107):
THAI CHARACTER MAI EK	่
THAI CHARACTER MAI THO	้
THAI CHARACTER MAI TRI	๊
THAI CHARACTER MAI CHATTAWA	๋

other-marks (Combining Class 0, Not_Reordered): ← this seems to be wrong, should be 107
THAI CHARACTER MAITAIKHU	็
THAI CHARACTER THANTHAKHAT	์
THAI CHARACTER NIKHAHIT	ํ
THAI CHARACTER YAMAKKAN	๎

other-marks (Combining Class 9, Virama)
THAI CHARACTER PHINTHU	ฺ

Gerriet.




More information about the Unicode mailing list