Compatibility decomposition for Hebrew and Greek final letters

"Martin J. Dürst" duerst at it.aoyama.ac.jp
Thu Feb 19 20:50:17 CST 2015


On 2015/02/20 05:17, Eli Zaretskii wrote:
>> From: Philippe Verdy <verdy_p at wanadoo.fr>
>> Date: Thu, 19 Feb 2015 20:31:07 +0100
>> Cc: Julian Bradfield <jcb+unicode at inf.ed.ac.uk>,
>> 	unicode Unicode Discussion <unicode at unicode.org>
>>
>> The decompositions are not needed for plain text searches, that can use the
>> collation data (with the collation data, you can unify at the primary level
>> differences such as capitalisation and ignore diacritics, or transform some
>> base groups of letters into a single entry, or make some significant primary
>> difference when there are diacritics (for example in German equating 'ae' and
>> 'ä' at the primary level).
>
> Sorry, I disagree.  First, collation data is overkill for search,
> since the order information is not required, so the weights are simply
> wasting storage.  Second, people do want to find, e.g., "²" when they
> search for "2" etc.  I'm not saying that they _always_ want that, but
> sometimes they do.  There's no reason a sophisticated text editor
> shouldn't support such a feature, under user control.

Well, for cased scripts, search is usually case-insensitive, but case 
conversions aren't given by compatibility decompositions.

If the question isn't "Why are there equivalences useful for search that 
are not covered by compatibility decompositions?", but "Why doesn't 
Unicode provide some data for final/non-final Hebrew letter 
correspondence?", maybe the answer is that it hasn't been seen as a need 
up to now because it's so easy to figure out.

Regards,   Martin.



More information about the Unicode mailing list