Combining Class of Thai Nonspacing_Marks
Gerriet M. Denkmann
gerrietm at icloud.com
Mon Apr 3 02:12:51 CDT 2017
The Combining Class is used for normalisation of strings.
Normalisation of strings is important for filenames in filesystems.
As far as I know, a Thai consonant (Lo, Other_Letter) can have several Nonspacing_Marks.
This cluster of nonspacing marks can contain at most one top/bottom vowel and at most one tone/other mark.
There is no syntactically meaning in the order of these nonspacing marks.
So: All top/bottom vowels should have Combining Class 103, all tone/other marks have Combining Class 107.
Is there a reason for having top vowels or other-marks with Combining Class 0, Not_Reordered?
With the current choice of Combining Class both consonant + mark + top vowel and consonant + top vowel + mark are normalised, so that one can have two files with these (identically looking, but different) names, which is rather confusing.
Here a list of all nonspacing marks in the Thai script:
top vowels (Combining Class 0, Not_Reordered): ← this seems to be wrong; should be 103
THAI CHARACTER MAI HAN-AKAT ั
THAI CHARACTER SARA I ิ
THAI CHARACTER SARA II ี
THAI CHARACTER SARA UE ึ
THAI CHARACTER SARA UEE ื
bottom vowels (Combining Class 103):
THAI CHARACTER SARA U ุ
THAI CHARACTER SARA UU ู
tone-marks (Combining Class 107):
THAI CHARACTER MAI EK ่
THAI CHARACTER MAI THO ้
THAI CHARACTER MAI TRI ๊
THAI CHARACTER MAI CHATTAWA ๋
other-marks (Combining Class 0, Not_Reordered): ← this seems to be wrong, should be 107
THAI CHARACTER MAITAIKHU ็
THAI CHARACTER THANTHAKHAT ์
THAI CHARACTER NIKHAHIT ํ
THAI CHARACTER YAMAKKAN ๎
other-marks (Combining Class 9, Virama)
THAI CHARACTER PHINTHU ฺ
More information about the Unicode