collation tailoring using before

Richard Wordingham via CLDR-Users cldr-users at unicode.org
Thu Aug 10 10:00:14 CDT 2017


On Wed, 9 Aug 2017 16:23:44 +0700
Martin Hosken via CLDR-Users <cldr-users at unicode.org> wrote:

> I am trying to tailor (for the sake of argument) \u0300 to be primary
> ignorable and have a secondary collation key less than that of a
> primary character (a).
> 
> I tried:
> 
> &[before 2][first primary ignorable] << \u0300
> 
> But then I get CEs of this form:
> 
> a	[2900.0500.0500]
> \u0300	[0000.8000.0500]
> 
> I'm wondering how I can get \u0300 [0000.0400.0500].

What your declared goal would result in is

a << á < áb << ab

The assumption is that no-one would want this, which is why the
collation is denigrated as ill-formed.  (Now DUCET is ill-formed,
though that's not why ICU doesn't support it.)

If what you want is

á << a < áb << ab

then the Pinyin collation provides an example:

<cr><![CDATA[
                &[before 2]a<<ā<<<Ā<<á<<<Á<<ǎ<<<Ǎ<<à<<<À
                &[before 2]e<<ē<<<Ē<<é<<<É<<ě<<<Ě<<è<<<È
                &e<<ê̄<<<Ê̄<<ế<<<Ế<<ê̌<<<Ê̌<<ề<<<Ề
                &[before 2]i<<ī<<<Ī<<í<<<Í<<ǐ<<<Ǐ<<ì<<<Ì
                &[before 2]m<<m̄<<<M̄<<ḿ<<<Ḿ<<m̌<<<M̌<<m̀<<<M̀
                &[before 2]n<<n̄<<<N̄<<ń<<<Ń<<ň<<<Ň<<ǹ<<<Ǹ
                &[before 2]o<<ō<<<Ō<<ó<<<Ó<<ǒ<<<Ǒ<<ò<<<Ò
                &[before 2]u<<ū<<<Ū<<ú<<<Ú<<ǔ<<<Ǔ<<ù<<<Ù
                &U<<ǖ<<<Ǖ<<ǘ<<<Ǘ<<ǚ<<<Ǚ<<ǜ<<<Ǜ<<ü<<<Ü
           ]]></cr>


This gives us

ā << a < āp << ap

Richard.



More information about the CLDR-Users mailing list