collation tailoring using before
Richard Wordingham via CLDR-Users
cldr-users at unicode.org
Thu Aug 10 10:00:14 CDT 2017
On Wed, 9 Aug 2017 16:23:44 +0700
Martin Hosken via CLDR-Users <cldr-users at unicode.org> wrote:
> I am trying to tailor (for the sake of argument) \u0300 to be primary
> ignorable and have a secondary collation key less than that of a
> primary character (a).
>
> I tried:
>
> &[before 2][first primary ignorable] << \u0300
>
> But then I get CEs of this form:
>
> a [2900.0500.0500]
> \u0300 [0000.8000.0500]
>
> I'm wondering how I can get \u0300 [0000.0400.0500].
What your declared goal would result in is
a << á < áb << ab
The assumption is that no-one would want this, which is why the
collation is denigrated as ill-formed. (Now DUCET is ill-formed,
though that's not why ICU doesn't support it.)
If what you want is
á << a < áb << ab
then the Pinyin collation provides an example:
<cr><![CDATA[
&[before 2]a<<ā<<<Ā<<á<<<Á<<ǎ<<<Ǎ<<à<<<À
&[before 2]e<<ē<<<Ē<<é<<<É<<ě<<<Ě<<è<<<È
&e<<ê̄<<<Ê̄<<ế<<<Ế<<ê̌<<<Ê̌<<ề<<<Ề
&[before 2]i<<ī<<<Ī<<í<<<Í<<ǐ<<<Ǐ<<ì<<<Ì
&[before 2]m<<m̄<<<M̄<<ḿ<<<Ḿ<<m̌<<<M̌<<m̀<<<M̀
&[before 2]n<<n̄<<<N̄<<ń<<<Ń<<ň<<<Ň<<ǹ<<<Ǹ
&[before 2]o<<ō<<<Ō<<ó<<<Ó<<ǒ<<<Ǒ<<ò<<<Ò
&[before 2]u<<ū<<<Ū<<ú<<<Ú<<ǔ<<<Ǔ<<ù<<<Ù
&U<<ǖ<<<Ǖ<<ǘ<<<Ǘ<<ǚ<<<Ǚ<<ǜ<<<Ǜ<<ü<<<Ü
]]></cr>
This gives us
ā << a < āp << ap
Richard.
More information about the CLDR-Users
mailing list