Combining Class of Thai Nonspacing_Marks
asmusf at ix.netcom.com
Mon Apr 3 11:40:05 CDT 2017
On 4/3/2017 12:12 AM, Gerriet M. Denkmann wrote:
> The Combining Class is used for normalisation of strings.
> Normalisation of strings is important for filenames in filesystems.
The same issues apply to network identifiers.
> As far as I know, a Thai consonant (Lo, Other_Letter) can have several Nonspacing_Marks.
> This cluster of nonspacing marks can contain at most one top/bottom vowel and at most one tone/other mark.
> There is no syntactically meaning in the order of these nonspacing marks.
> So: All top/bottom vowels should have Combining Class 103, all tone/other marks have Combining Class 107.
> Is there a reason for having top vowels or other-marks with Combining Class 0, Not_Reordered?
> With the current choice of Combining Class both consonant + mark + top vowel and consonant + top vowel + mark are normalised, so that one can have two files with these (identically looking, but different) names, which is rather confusing.
It is not possible to construct a set of secure network identifiers
based on simply
a) ensuring the string is in NFC
b) otherwise allowing all of the Thai characters (insofar as the they
are PVALID in IDNA 2008 [RFC5892]).
Considerable attention to allowable contexts is required. There is a
group in Thailand working on this, but their results have not yet been
Similar work for Khmer and Lao can be found here:
> Here a list of all nonspacing marks in the Thai script:
> top vowels (Combining Class 0, Not_Reordered): ← this seems to be wrong; should be 103
> THAI CHARACTER MAI HAN-AKAT ั
> THAI CHARACTER SARA I ิ
> THAI CHARACTER SARA II ี
> THAI CHARACTER SARA UE ึ
> THAI CHARACTER SARA UEE ื
> bottom vowels (Combining Class 103):
> THAI CHARACTER SARA U ุ
> THAI CHARACTER SARA UU ู
> tone-marks (Combining Class 107):
> THAI CHARACTER MAI EK ่
> THAI CHARACTER MAI THO ้
> THAI CHARACTER MAI TRI ๊
> THAI CHARACTER MAI CHATTAWA ๋
> other-marks (Combining Class 0, Not_Reordered): ← this seems to be wrong, should be 107
> THAI CHARACTER MAITAIKHU ็
> THAI CHARACTER THANTHAKHAT ์
> THAI CHARACTER NIKHAHIT ํ
> THAI CHARACTER YAMAKKAN ๎
> other-marks (Combining Class 9, Virama)
> THAI CHARACTER PHINTHU ฺ
More information about the Unicode