Why does the spelling (capitalization) of decomposition types differ in DerivedDecompositionType.txt from UnicodeData.txt?

Peter Constable pgcon6 at msn.com
Thu Feb 20 10:57:08 CST 2025


PropertyValueAliases.txt has this entry:

dt ; Nb                               ; Nobreak                          ; nb

What doesn't seem clear from UAX #44 is whether an alias could be added that is equivalent under name matching rules to an existing alias.

-----Original Message-----
From: Unicode <unicode-bounces at corp.unicode.org> On Behalf Of Asmus Freytag via Unicode
Sent: Tuesday, February 18, 2025 11:44 AM
To: unicode at corp.unicode.org
Subject: Re: Why does the spelling (capitalization) of decomposition types differ in DerivedDecompositionType.txt from UnicodeData.txt?

The spellings are equivalent under the naming rules. That's all that formally matters. Fixing this now, would break any literal-minded parsers for whichever file is changed, while not making a formal difference.

There are enough other idiosyncrasies in the way these files are organized, that this one is far from the worst.

The only rule that matters is that any of the values in PropertyValueAliases.txt, when matched without regard to case, hyphens, or underscore, matches all the other ones for the same property value.

For character names, spaces also don't count (but there are 2-3 odd exceptional names that need to be handled specially).

A./

On 2/18/2025 8:04 AM, Phil Smith III via Unicode wrote:
> This sounds interesting, but with no links or other references is a bit opaque. Can you add more information?
>
> -----Original Message-----
> From: Unicode <unicode-bounces at corp.unicode.org> On Behalf Of prospero 
> via Unicode
> Sent: Monday, February 17, 2025 3:11 PM
> To: unicode at corp.unicode.org
> Subject: Why does the spelling (capitalization) of decomposition types differ in DerivedDecompositionType.txt from UnicodeData.txt?
>
> For example, "Nobreak" in DerivedDecompositionType.txt vs "noBreak" in UnicodeData.txt. If the former is derived from the latter, shouldn't the spelling be identical?
>
>




More information about the Unicode mailing list