Comparing Raw Values of the Age Property

Richard Wordingham via Unicode unicode at unicode.org
Tue May 23 15:48:40 CDT 2017


On Tue, 23 May 2017 05:29:33 -0700
Asmus Freytag via Unicode <unicode at unicode.org> wrote:

> On 5/23/2017 4:04 AM, Janusz S. Bien via Unicode wrote:
> > Quote/Cytat - Manuel Strehl via Unicode <unicode at unicode.org> (Tue
> > 23 May 2017 11:33:24 AM CEST):
> >  
> >> The rising standard in the world of web development (and others)
> >> is called
> >> »Semantic Versioning« [1], that many projects adhere to or
> >> sometimes must
> >> actively explain, why they don't.
> >>
> >> The structure of a »semantic version« string is a set of three
> >> integers, MAJOR.MINOR.PATCH, where the »sematics« part lies in a
> >> kind of contract between author and user, when to increment which
> >> part. 
> >
> > Perhaps I am missing something, but I don't understand this thread.
> > Cf.  
> 
> You are not missing anything, the OP is being obtuse. We just didn't 
> want to run the search for him. :)

The object is to generate code *now* that, up to say Unicode Version 23.0,
can work out, from the UCD files DerivedAge.txt and
PropertyValueAliases.txt, whether an arbitrary code point was included
by some Unicode version identified by a Unicode version identified by a
value of the property Age.  One needs this capability to implement
the regular expressions of the form \p{Age=xxx}.  This requires a scheme
for determining which of two values of the property identifies the
earlier version of Unicode.

What TUS 9.0, its appendices and annexes is lacking is a clear
statement such as, "The short values for the Age property are of the
form "m.n", with the first field corresponding to the major version,
and the second field corresponding to the minor version. There is no
need for a third version field, because new characters are never
assigned in update versions of the standard."  Conveniently, this
almost true statement is included in Section 5.14 of the proposed
update to UAX#44 (in Draft 12 to be precise.  It's not quite true, for
there is also the short value NA for Unassigned.  Is there any way of
formally recording this oversight?

With this proposed change, to compare two values, all one has to do
is compare the short names of the values, for one knows what form they
will be in.

> > Version numbers for the Unicode Standard consist of three fields, 
> > denoting the major version, the minor version, and the update
> > version, respectively.

Yes, but 4.0.1 is not a value of the property Age; the last field is
redundant.  Oddly enough, ICU understands the regular expression
\p{age=4.0.1}, but not \p{age=V2_1}
(http://demo.icu-project.org/icu-bin/redemo).  Ah well, it's only a
recommendation that regular expression engines understand both short
names and long names of values of properties.

Richard.



More information about the Unicode mailing list