Minimal Implementation of Unicode Collation Algorithm

Richard Wordingham via Unicode unicode at unicode.org
Mon Dec 4 07:30:22 CST 2017


May a collation algorithm that always compares all strings as equal be a
compliant implementation of the Unicode Collation Algorithm (UTS #10)?
If not, by which clause is it not compliant?  Formally, this algorithm
would require that all weights be zero.

Would an implementation that supported no characters be compliant?

It used to be that for an implementation to be claimed as compliant, it
also had to pass a specific conformance test.  This requirement has now
been abandoned, perhaps because the Default Unicode Collation Element
Table (DUCET) is incompatible with the CLDR Collation Algorithm.

The compatibility issues are that the DUCET weighting of U+FFFE is
incompatible with the CLDR Collation algorithm, and it seems that the
ICU implementation will not work if well-formedness condition WF5 is not
met.  Meeting WF5 without changing the collation would require about a
thousand extra entries in the table - the CLDR root collation just adds
the six changes (plus a consequent four entries for FCD closure)
desirable for natural language, and accepts the consequent changes for
unlikely strings.

Richard.


More information about the Unicode mailing list