Comparing Raw Values of the Age Property

Markus Scherer via Unicode unicode at unicode.org
Mon May 22 17:10:02 CDT 2017


On Mon, May 22, 2017 at 2:44 PM, Richard Wordingham via Unicode <
unicode at unicode.org> wrote:

> Given two raw values of the Age property, defined in UCD file
> DerivedAge.txt, how is a computer program supposed to compare them?
> Apart from special handling for the value "Unassigned" and its short
> alias "NA", one used to be able to compare short values against short
> values and long values against long values by simple string
> comparison.  However, now we are coming to Version 10.0 of Unicode,
> this no longer works - "1.1" < "10.0" < "2.0".
>

This is normal for numbers, and for multi-field version numbers.
If you want numeric sorting, then you need to either use a collator with
that option, or parse the versions into tuples of integers and sort those.

There are some possibilities - the values appear in order in
> PropertyValueAliases.txt and in DerivedAge.txt.


You should not rely on the order of values in data files, unless the file
explicitly states that order matters.

Can one rely on the FULL STOP being the field
> divider,


I think so. Dots are extremely common for version numbers. I see no reason
for Unicode to use something else.

and can one rely on there never being any grouping characters
> in the short values?


I don't know what "grouping characters" you have in mind.

I think the format is pretty self-evident.

markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20170522/f81ab8d6/attachment.html>


More information about the Unicode mailing list