Minimal Implementation of Unicode Collation Algorithm

Markus Scherer via Unicode unicode at unicode.org
Mon Dec 4 14:48:11 CST 2017


On Mon, Dec 4, 2017 at 5:30 AM, Richard Wordingham via Unicode <
unicode at unicode.org> wrote:

> May a collation algorithm that always compares all strings as equal be a
> compliant implementation of the Unicode Collation Algorithm (UTS #10)?
> If not, by which clause is it not compliant?  Formally, this algorithm
> would require that all weights be zero.
>

I think so. The algorithm would be equivalent to an implementation of the
UCA with a degenerate CET that maps every character to a Completely
Ignorable Collation Element.

Would an implementation that supported no characters be compliant?
>

I guess so. I assume that would mean that the CET maps nothing, and that
the implementation does implement the implicit weighting of Han characters
and unassigned (here: unmapped) code points. It would also have to do NFD
first.

It used to be that for an implementation to be claimed as compliant, it
> also had to pass a specific conformance test.  This requirement has now
> been abandoned, perhaps because the Default Unicode Collation Element
> Table (DUCET) is incompatible with the CLDR Collation Algorithm.
>

The DUCET is missing some things that are needed by the CLDR Collation
Algorithm, but that has nothing to do with UCA compliance.

The simple fact is that tailorings are common, and it has to be possible to
conform to the algorithm without forbidding tailorings.

markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20171204/05725f4f/attachment.html>


More information about the Unicode mailing list