Compatibility decomposition for Hebrew and Greek final letters

Eli Zaretskii eliz at gnu.org
Fri Feb 20 09:28:36 CST 2015


> Date: Fri, 20 Feb 2015 15:01:34 +0000
> From: Richard Wordingham <richard.wordingham at ntlworld.com>
> 
> > Sorry, I don't think I follow: what is "processing for search orders"
> > to which you allude here?
> 
> The examples in the CLDR root locale and in DUCET are the massive sets
> of 'contractions' of consonants with vowels written before the
> associated consonant in the scripts where spacing characters are stored
> in the order written, namely Thai, Lao, Tai Viet and, soon, New Tai
> Lue.  When customised collations are applied, there are enormous sets
> for Burmese (in CLDR) and New Tai Lue (not published in CLDR).  The
> latter two have 'logical order exception' final consonants.  (The
> exception here is that the logical order of characters in a word is not
> the order one wants for sorting.)

OK, thanks for explaining that.  Still, the DUCET data is not
insignificant.

> > I'm not talking about localized features, like for "å" to match "aa"
> > in Danish locales.  I'm talking about matching strings that are
> > equivalent under canonical and compatibility decompositions.
> 
> Nor was I.  I was talking about the user interface - commands, menus
> and messages.

Ah, that's easy (for now): Emacs doesn't have a localized UI.
Everything in the UI is in US English.  So this would be Someone
Else's Problem.

> > As for user sophistication, AFAIR, Microsoft Word finds "²" when you
> > search for "2" by default, so it sounds like Word considers all users
> > sophisticated enough for that.  I think that's a solid enough
> > precedent to follow.
> 
> But what switches the match off?

I'm not sure there _is_ a switch in Word.  But my point is different:
the above example means an editor should have the capability of
matching such strings; whether it can or cannot be switched off is a
separate issue (in Emacs, I don't imagine users will settle for not
being able to switch it off and on as they see fit).


More information about the Unicode mailing list