Compatibility decomposition for Hebrew and Greek final letters

Thu Feb 19 21:47:52 CST 2015

2015-02-19 21:17 GMT+01:00 Eli Zaretskii <eliz at gnu.org>:

> > From: Philippe Verdy <verdy_p at wanadoo.fr>
> > Date: Thu, 19 Feb 2015 20:31:07 +0100
> > Cc: Julian Bradfield <jcb+unicode at inf.ed.ac.uk>,
> >       unicode Unicode Discussion <unicode at unicode.org>
> >
> > The decompositions are not needed for plain text searches, that can use
> the
> > collation data (with the collation data, you can unify at the primary
> level
> > differences such as capitalisation and ignore diacritics, or transform
> some
> > base groups of letters into a single entry, or make some significant
> primary
> > difference when there are diacritics (for example in German equating
> 'ae' and
> > 'ä' at the primary level).
>
> Sorry, I disagree.  First, collation data is overkill for search,
> since the order information is not required, so the weights are simply
> wasting storage.  Second, people do want to find, e.g., "²" when they
> search for "2" etc.  I'm not saying that they _always_ want that, but
> sometimes they do.  There's no reason a sophisticated text editor
> shouldn't support such a feature, under user control.
>

The weights or the collation strings do not need to be stored. Even
database engines or plain-text search engines on the web provide now
collation algorithms for searching or sorting data, so that you don't need
to store it in your tables... It is not overkill, as good implementations
of collation are efefctively used in high-permance database servers (and
many users of these databases do not realize that collation is effectively
used.
There are also good text editors implementing collation searches.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150220/21e1890a/attachment.html>