Canonical block names: spaces vs. underscores
mathias at qiwi.be
Thu May 26 13:48:48 CDT 2016
> On 26 May 2016, at 20:07, Ken Whistler <kenwhistler at att.net> wrote:
> Well, let's take an example. The entry in Blocks.txt for the Arabic Presentation Forms-A block is:
> FB50..FDFF; Arabic Presentation Forms-A
> The entry for that block in PropertyValueAliases.txt is:
> blk; Arabic_PF_A ; Arabic_Presentation_Forms_A ; Arabic_Presentation_Forms-A
> So then which would it be? Should Blocks.txt be changed to the long preferred alias:
> FB50..FDFF; Arabic_Presentation_Forms_A
> or to the abbreviated preferred alias:
> FB50..FDFF; Arabic_PF_A
> which would be more consistent with the XML attribute and with most regex usage?
This sounds like a strawman argument (?). The long preferred alias definitely seems more suitable for a ‘canonical’ name.
> I suppose a proposal to the UTC to further modify the UCD handling of block names
> could change this situation. But I'm not convinced that we shouldn't just leave
> things as they stand -- for stability. And then live with the complications required
> for scripts or other parsing algorithms that actually need to deal with Blocks.txt to
> either parse out block ranges (its main function) or to get usable block names
> (its subsidiary function).
Perhaps the “Note:” in the commented header in `Blocks.txt` could be extended to point out that the ~~canonical block names~~, nay, ++preferred block aliases++ are listed in `PropertyValueAliases.txt`? That would’ve been enough to avoid the question that spawned this thread.
More information about the Unicode