Canonical block names: spaces vs. underscores

Mathias Bynens mathias at qiwi.be
Thu May 26 12:05:05 CDT 2016


> On 26 May 2016, at 17:47, Mark Davis ☕️ <mark at macchiato.com> wrote:
> 
> The canonical property and property value formats are in the *Alias* files.

Thanks for confirming!

Any chance the canonical names can be used in `Blocks.txt` as well, for consistency? This would simplify scripts that parse the Unicode database text files.

> On 26 May 2016, at 18:03, Ken Whistler <kenwhistler at att.net> wrote:
> 
> […] "canonical block name" is not a defined term in the standard.

I didn’t mean to imply it was — it’s just an English word. I meant “canonical” as in “without loose matching applied”.

> See the matching rules in UAX #44:
> 
> http://www.unicode.org/reports/tr44/#Matching_Rules
> 
> and in particular, the matching rule for symbolic values, which applies in this case:
> 
> http://www.unicode.org/reports/tr44/#UAX44-LM3

I know about loose matching, having recently implemented it (https://github.com/mathiasbynens/unicode-loose-match).

> For enumerated properties, and especially for catalog properties such as Block and Script,
> the value of the property may be multi-word, and the best form to use in one context might
> not be exactly (as in binary string equality exact) the same as in another.

That makes sense, but shouldn’t it be consistent throughout the Unicode database text files?


More information about the Unicode mailing list