Compatibility decomposition for Hebrew and Greek final letters

Eli Zaretskii eliz at gnu.org
Thu Feb 19 05:30:22 CST 2015


> From: Michael Everson <everson at evertype.com>
> Date: Thu, 19 Feb 2015 11:21:19 +0000
> 
> On 19 Feb 2015, at 10:55, Eli Zaretskii <eliz at gnu.org> wrote:
> 
> > Does anyone know why does the UCD define compatibility decompositions
> > for Arabic initial, medial, and final forms, but doesn't do the same
> > for Hebrew final letters, like U+05DD HEBREW LETTER FINAL MEM?  Or for
> > that matter, for U+03C2 GREEK SMALL LETTER FINAL SIGMA?
> > 
> > The relevant application where this would matter is text search, where
> > these letters might be folded to the same code point for the purposes
> > of comparison.
> 
> Such comparisons happen at a different level, I think. 

Sorry, I'm not sure I follow: different from what?

In any case, regardless of the level, if there's no data to support
such "folding", how can applications implement it (except by inventing
its own data)?

Also, perhaps there are some deep linguistic reasons why such folding
might be inappropriate, and that's why the UCD doesn't define such
decompositions?

Thanks.


More information about the Unicode mailing list