The Unicode Standard and ISO

Richard Wordingham via Unicode unicode at
Fri Jun 8 03:06:46 CDT 2018

On Fri, 8 Jun 2018 05:32:51 +0200 (CEST)
Marcel Schneider via Unicode <unicode at> wrote:

> Thank you for confirming. All witnesses concur to invalidate the
> statement about uniqueness of ISO/IEC 10646 ‐ Unicode synchrony. —
> After being invented in its actual form, sorting was standardized
> simultaneously in ISO/IEC 14651 and in Unicode Collation Algorithm,
> the latter including practice‐oriented extra features. 

The UCA contains features essential for respecting canonical
equivalence.  ICU works hard to avoid the extra effort involved,
apparently even going to the extreme of implicitly declaring that
Vietnamese is not a human language. (Some contractions are not
supported by ICU!)  The synchronisation is manifest in the DUCET
collation, which seems to make the effort to ensure that some canonical
equivalent will sort the same way under ISO/IEC 14651.

> Since then,
> these two standards are kept in synchrony uninterruptedly.

But the consortium has formally dropped the commitment to DUCET in
CLDR.  Even when restricted to strings of assigned characters, the CLDR
and ICU no longer make the effort to support the DUCET collation.
Indeed, I'm not even sure that the DUCET is a tailoring of the root CLDR
collation, even when restricted to assigned characters.  Tailorings
tend to have odd side effects; fortunately, they rarely if ever matter.
CLDR root is a rewrite with modifications of DUCET; it has changes that
are prohibited as 'tailorings'!


More information about the Unicode mailing list