Calculating Sorting Weights in a Keyboard Definition

Richard Wordingham via CLDR-Users cldr-users at unicode.org
Mon Aug 19 18:50:53 CDT 2019


Is there a tie-break rule for competing reorder elements?

For example, suppose the applicable reorders element is:

<reorders>
<reorder from="ab" order = "10 10"/>
<reorder from="bc" order = "20 20"/>
</reorders>

Are the primary weights for "abc" <10, 10, 20> or <10, 20, 20>, or
something else, and why?

Am I correct to assume that in general the sorting is not a
Unicode-compliant process?  In particular, is string matching a
comparison of mere codepoint strings or of equivalence classes under
canonical equivalence?

Are the literal values of the 'from' attribute to be immune to
replacement by canonically equivalent strings?

Is sorting applied at every input operation, or can the system delay
until a suitable point?

Having sorted that characters, is the sorting algorithm reapplied based
on the new orders?  Compulsorily?  Optionally?

I can imagine a set of rules that would cause some strings to be
rearranged whenever the sorting process is applied.

I've seen suggestion that the 'before' and 'after' values can be
regular expressions.  Is this merely an unofficial shorthand
that defines a finite 'language', i.e. a finite set of strings?

Richard.


More information about the CLDR-Users mailing list