Why does the spelling (capitalization) of decomposition types differ in DerivedDecompositionType.txt from UnicodeData.txt?
Erik Carvalhal Miller
ecm.unicode at gmail.com
Fri Feb 21 13:37:47 CST 2025
On Tue, Feb 18, 2025 at 1:47 PM Asmus Freytag via Unicode
<unicode at corp.unicode.org> wrote:
> The only rule that matters is that any of the values in
> PropertyValueAliases.txt, when matched without regard to case, hyphens,
> or underscore, matches all the other ones for the same property value.
>
> For character names, spaces also don't count (but there are 2-3 odd
> exceptional names that need to be handled specially).
One such exception is HANGUL JUNGSEONG O-E (U+1180), in which the
hyphen‐minus is considered significant, lest that character name
collide with HANGUL JUNGSEONG OE (U+116C). Hyphen‐minus is also
significant in character names when it precedes or follows a space, as
in TIBETAN LETTER -A (U+0F60) [cf. TIBETAN LETTER A (U+0F68)].
Additionally, there is a rule that the strings “CHARACTER”, “LETTER”,
and “DIGIT” are to be ignored in character‐name matching for
determining uniqueness, with a legacy exception for CANCEL (U+0018)
and CANCEL CHARACTER (U+0094), both of which are character aliases
rather than character names per se but inhabit that same
character‐name namespace. (However, as I pointed out in L2/24-073
[https://www.unicode.org/L2/L2024/24073-char-namespace.pdf], the
“CHARACTER”/“LETTER”/“DIGIT” rule and its exception are given
inconsistent treatment in the current text of the Standard.)
More information about the Unicode
mailing list