ISO 14651/14652 vs Unicode sorting

Ilya Zakharevich nospam-abuse at ilyaz.org
Thu May 28 03:42:25 CDT 2020


I have been informed that according to the tables distributed with ISO
14651/14652, the following strings should be sorted in this order:

>   foobar
>   foo baz

Moreover, this is how glibc (and, as a corollary, all utilities) do
this in European locales on contemporary Linuxes.

I checked COBUILT, American Heritage, and Le Petit Robert II — and it
seems that they do indeed use this (brain damaged?) order.  (Although
not, apparently, Le Petit Robert I — which SEEMS TO HAVE compound
words tackled at the end of the main record.)

However, this definitely contradicts what
  https://icu4c-demos-7hxm2n5zgq-uc.a.run.app/icu-bin/collation.html
does with the default locale, and with `en´.

So what is the intended behavior: of ICU, or of ISO?!

Thanks,
Ilya


More information about the Unicode mailing list