Canonical block names: spaces vs. underscores
doug at ewellic.org
Sat May 28 10:51:55 CDT 2016
Philippe Verdy wrote:
> However it must be clear that these aliases are case-sensitive by
> default ("Arabic_Presentation_Forms_A" is not the same as
> "Arabic_presentation_forms_A" but is the same as "Arabic
> Presentation_Forms A), unless the block names property is normatively
> said to be case-insensitive (in that case the followings are also
> aliases: "arabic_pf_a", "arabic pf a"). But adding case insensitivity
> has a cost, which is much higher than *only* allowing basic
> replacements of spaces and underscores [...]
UAX #44 says:
> 5.9.2 Matching Character Names
> UAX44-LM2. Ignore case, whitespace, underscore ('_'), and all medial
> hyphens except the hyphen in U+1180 HANGUL JUNGSEONG O-E.
> 5.9.3 Matching Symbolic Values
> UAX44-LM3. Ignore case, whitespace, underscore ('_'), hyphens, and any
> initial prefix string "is".
I read the words "ignore case" in these two rules to mean that case
should be ignored.
Doug Ewell | http://ewellic.org | Thornton, CO
More information about the Unicode