Compatibility decomposition for Hebrew and Greek final letters

Eli Zaretskii eliz at gnu.org
Fri Feb 20 02:04:32 CST 2015


> Date: Thu, 19 Feb 2015 22:02:57 +0000
> From: Richard Wordingham <richard.wordingham at ntlworld.com>
> 
> > First, collation data is overkill for search,
> > since the order information is not required, so the weights are simply
> > wasting storage.
> 
> The big waste is not in text-dependent storage, but in the
> processing for search orders that bear little relationship to
> alphabetical order.

Sorry, I don't think I follow: what is "processing for search orders"
to which you allude here?

> > Second, people do want to find, e.g., "²" when they
> > search for "2" etc.  I'm not saying that they _always_ want that, but
> > sometimes they do.  There's no reason a sophisticated text editor
> > shouldn't support such a feature, under user control.
> 
> I think one problem is disbelief in the existence of enough
> sophisticated users to matter.  I gather it can be quite hard to obtain
> a Swedish interface for editing Thai.

I'm not talking about localized features, like for "å" to match "aa"
in Danish locales.  I'm talking about matching strings that are
equivalent under canonical and compatibility decompositions.

As for user sophistication, AFAIR, Microsoft Word finds "²" when you
search for "2" by default, so it sounds like Word considers all users
sophisticated enough for that.  I think that's a solid enough
precedent to follow.


More information about the Unicode mailing list