Canonical block names: spaces vs. underscores

Doug Ewell doug at
Sat May 28 10:51:55 CDT 2016

Philippe Verdy wrote:

> However it must be clear that these aliases are case-sensitive by
> default ("Arabic_Presentation_Forms_A" is not the same as
> "Arabic_presentation_forms_A" but is the same as "Arabic
> Presentation_Forms A), unless the block names property is normatively
> said to be case-insensitive (in that case the followings are also
> aliases: "arabic_pf_a", "arabic pf a"). But adding case insensitivity
> has a cost, which is much higher than *only* allowing basic
> replacements of spaces and underscores [...]

UAX #44 says:

> 5.9.2 Matching Character Names
> UAX44-LM2. Ignore case, whitespace, underscore ('_'), and all medial
> hyphens except the hyphen in U+1180 HANGUL JUNGSEONG O-E.
> 5.9.3 Matching Symbolic Values
> UAX44-LM3. Ignore case, whitespace, underscore ('_'), hyphens, and any
> initial prefix string "is".

I read the words "ignore case" in these two rules to mean that case 
should be ignored.

Doug Ewell | | Thornton, CO ���� 

More information about the Unicode mailing list