question about identifying CLDR coverage % for Amharic

Richard Wordingham richard.wordingham at ntlworld.com
Wed Mar 1 17:24:58 CST 2017


On Fri, 24 Feb 2017 21:42:54 +0000
Richard Wordingham <richard.wordingham at ntlworld.com> wrote:


> I notice a very similar file lo.xml.  When did Laos haul up the white
> flag and more or less adopt the modern Thai collation order for Lao?

As there has been no answer to this question, I presume the surrender
has not happened.  As my ticket submission was rejected as spam, would
someone kindly file a ticket along these lines:

==Lao collation is not linguistically correct==

The file collation/lo.xml contains the reckless falsehood "The root
collation order is valid for this language".

If phonetic Lao syllables were represented by single characters, Lao
collation would be a simple lexicographic order. It is therefore unable
to use anything but primary weights.

A Lao syllable may be considered to be composed of onset + vowel + coda
+ tone; the onset and vowel may be interleaved (as in Thai), and the
tone is represented by a mark following the onset and no later than
immediately after the vowel. There are two basic schemes ordering for
single syllables:

1) <onset-weight><coda-weight><vowel-weight><tone-weight>
2) <onset-weight><vowel-weight><coda-weight><tone-weight>

The first is the one most commonly used; the second is closer to the
CLDR default.

Unlike Thai, the vowel weighting for compound vowel symbols is not
composed from the individual vowels. For example, part of the ordering
is:

ເກະ < ເກ < ໂກະ < ໂກ < ເກາະ

However, the current collation yields
ເກ < ເກະ < ເກາະ < ໂກ < ໂກະ

This ordering is manifestly wrong.

I suggest that the reckless comment be amended to something like, "The
root collation is of some utility in sorting this language; accurate
collation appears to require large tables". 

Yours faithfully,

Richard Wordingham.



More information about the CLDR-Users mailing list