Character folding in text editors

Doug Ewell doug at ewellic.org
Sat Feb 20 15:43:15 CST 2016


Eli Zaretskii wrote:

> What about language-independent character-folding: where in the
> Unicode database is the data for that?

The OP kind of alluded to that: there is no such thing really as 
language-independent character folding.

About the closest approximation you can get using Unicode data alone 
(not CLDR) is to normalize to NFD, then ignore the combining diacritics. 
But that still doesn't work for a character like ø, which doesn't 
decompose to o + anything, and more importantly, it still won't meet 
expectations because of the n/ñ and o/ö/ø language-dependency problems.

As Mark and Philippe said, the real solution is to use CLDR, because 
that is where language-dependent information like this lives.

--
Doug Ewell | http://ewellic.org | Thornton, CO ���� 



More information about the Unicode mailing list