Apparent discrepanccy between FAQ and Age.txt

Whistler, Ken ken.whistler at sap.com
Tue Jun 10 14:04:57 CDT 2014


Karl Williamson noted:

> The FAQ http://www.unicode.org/faq/private_use.html#sentinels
> says that the last 2 code points on the planes except BMP were made
> noncharacters in TUS 3.1.  DerivedAge.txt gives 2.0 for these.
> 

The *concept* of noncharacter was not invented until Unicode 3.1,
so it could not have formally been applied to anything before
then. Before Unicode 3.1, some code points had been referred to as
"not a character", but it took a while for the UTC to rationalize the
details systematically. Unicode 3.1 was the first version to
formally introduce Noncharacter_Code_Point as a property and
apply it to FFFE/FFFF (as well as the other noncharacters).

Unicode 2.0 introduced the concept of Unicode scalar value and
established the framework of definitions and conformance clauses
now familiar in Chapter 3 (although it was pretty rough around
the edges back then). It also documented UTF-8 (although at that
point it was in an annex still), and that *required* a mapping between
the UTF-16 and UTF-8 form of 0xnFFFE and 0xnFFFF on each
plane. The Age value derives from that. U+FFFE and U+FFFF themselves were
given Age=1.1 because they were part of Unicode 1.1 before
Unicode 2.0 formally documented the addition of the rest of
the planes. Earlier still, when Unicode was still
trying to be a pure 16-bit encoding, FFFE and FFFF were simply outside
the codespace.

Incidentally, the property Age wasn't introduced until Unicode 3.2,
to technically speaking it didn't exist before then, either. However,
assignments of Age values were derived retroactively backwards to
Version 1.1 for parceling out the initial assignments as of Unicode 3.2.
Note also that although the majority of the repertoire in Unicode 1.1
actually was already assigned as of Unicode 1.0, no attempt was made
to assign Age=1.0 to any characters, because of the churn and
renaming that occurred as a result of the Unicode 1.0  and ISO 10646-1
merger effort back in the early 1990's.

--Ken






More information about the Unicode mailing list