Compatibility decomposition for Hebrew and Greek final letters
eliz at gnu.org
Fri Feb 20 02:13:41 CST 2015
> From: Philippe Verdy <verdy_p at wanadoo.fr>
> Date: Fri, 20 Feb 2015 04:47:52 +0100
> Cc: jcb+unicode at inf.ed.ac.uk, unicode Unicode Discussion <unicode at unicode.org>
> Sorry, I disagree. First, collation data is overkill for search,
> since the order information is not required, so the weights are simply
> wasting storage. Second, people do want to find, e.g., "²" when they
> search for "2" etc. I'm not saying that they _always_ want that, but
> sometimes they do. There's no reason a sophisticated text editor
> shouldn't support such a feature, under user control.
> The weights or the collation strings do not need to be stored. Even database
> engines or plain-text search engines on the web provide now collation
> algorithms for searching or sorting data, so that you don't need to store it in
> your tables... It is not overkill, as good implementations of collation are
> efefctively used in high-permance database servers (and many users of these
> databases do not realize that collation is effectively used.
I'm talking specifically about Emacs. Emacs provides locale-dependent
collation, but it relies on the underlying platform libraries to do
the work, it doesn't itself load the DUCET database, or anything
similar to it. By contrast, Emacs does have an efficient-storage
implementation of the UCD, and by virtue of that, accessing
decomposition data and performing normalization is at my fingertips.
So I'd like to avoid loading DUCET, and doing so just for the sake of
a few characters mentioned in this thread doesn't sound justified;
it's much easier to have a small database of additional equivalences.
> There are also good text editors implementing collation searches.
Could you mention their names, please?
More information about the Unicode