Combining Class of Thai Nonspacing_Marks
Gerriet M. Denkmann
gerrietm at icloud.com
Tue Apr 4 22:00:25 CDT 2017
> On 4 Apr 2017, at 00:00,Asmus Freytag <asmusf at ix.netcom.com> wrote:
> It is not possible to construct a set of secure network identifiers based on simply
> a) ensuring the string is in NFC
> b) otherwise allowing all of the Thai characters (insofar as the they are PVALID in IDNA 2008 [RFC5892]).
> Considerable attention to allowable contexts is required. There is a group in Thailand working on this, but their results have not yet been made public.
Maybe this: Proposal for the Thai Script Root Zone Label Generation Rulesets <https://www.icann.org/en/system/files/files/proposal-thai-lgr-15dec16-en.pdf>
But the rules for Root Zone Labels are (rightly) much more restricted than what I want:
Any two strings which look (almost?) identical should be normalised into some canonical form.
Reason: not to have identical looking filenames in a filesystem.
With the current rules of normalisation there could be 8 different filenames all looking identical to “กินครึ่งทิ้งครึ่ง”.
- both NIKHAHIT + Sara Aa and Sara Am should be normalised into the same string (whatever this is)
- both top-vowel + tone-mark and tone-mark + top-vowel should be normalised into the same string (whatever this is).
If, as Richard Wordingham wrote: "Unicode combining classes cannot be changed. All that can be done is
to enforce the order of characters in normalised text.” then the Unicode Normalisation algorithms should be updated.
More information about the Unicode