"(in 6429)" in allkeys.txt

Whistler, Ken ken.whistler at sap.com
Tue Mar 11 17:34:20 CDT 2014


> I agree that a clarification in the text would be better than
> a comment in allkeys.txt. But I also think just changing "(in 6429)"
> to "(in ISO 6429)" would be enough.
> 
> (Strange as it might seem for list regulars not everyone immediately
> makes the right association from this four-digit number. :-)

Ah, I see what the interpretation problem was. Yes, that is
a straightforward kind of improvement -- easily enough done.
Look for a change the next time the file is updated. (It will not
be immediately changed, pending other review comments.)
 
> This is somewhat besides the point, but since you say the file is
> machine-generated I wonder about something I found in the draft version
> http://www.unicode.org/Public/UCA/7.0.0/allkeys-7.0.0d5.txt
> where a comment says
> 
> # Tertiary weight range:  0002..001F (30)
> 
> even though the highest used tertiary weight actually is 001E.
> Isn't this comment automatically made?

The ranges for primary and secondary weights change with every
new repertoire addition to the input, so they are always
calculated dynamically. By contrast, the tertiary weight range
is hard-coded in the generation, and never changes. If you look at:

http://www.unicode.org/reports/tr10/#Tertiary_Weight_Table

you can see all those pre-defined, fixed values. It is true that
0x001F is not actually assigned as a tertiary weight for any
particular character, but it is internally set aside as a MAX_TERTIARY
sentinel value, before the first secondary weight of 0x0020.
Note that the tertiary weight 0x0007 is not actually used in
the weighting, either (for historical reasons). At any rate,
the entire range 0x0002..0x001F is considered fixed and "used"
for tertiaries, so that is what is always displayed in the summary
printed at the top of allkeys.txt.

--Ken




More information about the Unicode mailing list