Sorting notation

Philippe Verdy verdy_p at wanadoo.fr
Sun Feb 23 13:49:24 CST 2014


OK, I ignored these resets only for simplicity, the question was not about
a full set of rules to build a collation; but a small subset of rules that
could be used.

It seems surprisng that Michael Everson asks the question, when he already
knows so much about Unicode algorithms (but may be less about notations
used in CLDR data)

The CLDR also has several competing notations for specifying collations so
that may be the purpose of his question. I don't think that all notations
need an explicit reset at start (it can be implicit for the first element
in a chain of relations)



2014-02-14 17:26 GMT+01:00 Markus Scherer <markus.icu at gmail.com>:

> You need a reset point to say where in the UCA/CLDR universe this rule
> chain goes.
> http://www.unicode.org/reports/tr35/tr35-collation.html#Orderings
>
> The default collation puts lowercase first. Normally you reset to a
> lowercase character and tailor variations to that, otherwise the few
> characters you tailor are inconsistent with the rest of Unicode.
> Implementations like ICU provide parametric settings (no need for rules) to
> specify uppercase first.
> http://www.unicode.org/reports/tr35/tr35-collation.html#Setting_Options
>
> You should only reorder characters that the default order does not already
> have where you need them. For example, reset at each base letter, unless
> you want to reorder them relative to each other's default order.
> http://www.unicode.org/charts/collation/
>
> See also http://cldr.unicode.org/index/cldr-spec/collation-guidelines
> especially about "Minimal Rules".
>
> You can try out collation rules and settings at
> http://demo.icu-project.org/icu-bin/locexp?_=root&d_=en&x=col
>
> Best regards,
> markus
> --
> Google Internationalization Engineering
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140223/c937b234/attachment.html>


More information about the Unicode mailing list