Character folding in text editors

Eli Zaretskii eliz at gnu.org
Sun Feb 21 10:28:56 CST 2016


> From: Mark Davis ☕️ <mark at macchiato.com>
> Date: Sun, 21 Feb 2016 11:47:28 +0100
> Cc: Unicode Public <unicode at unicode.org>
> 
> If you don't use ICU, you can also use the CLDR data directly, but you'll
> have to parse it yourself. You'd start with the root locale, then add in
> the mappings from the children (eg de.xml). The parsing is not trivial, but
> since you are only looking for equivalences (not ordering), it is somewhat
> simpler. 

What about using allkeys.txt from the UCA database?  Is that
equivalent to the root locale in CLDR, as far as equivalence for
searching is concerned?  If not, how do these two differ?  (I've read
http://www.unicode.org/reports/tr35/tr35-collation.html#Root_Collation,
but it left me not sure whether what it describes affects search
matches when secondary weights are ignored.)

Also, what is the consensus here about using UCA's decomps.txt for
folding characters when ignoring secondary and tertiary weights?


More information about the Unicode mailing list