Character folding in text editors

Janusz S. Bien jsbien at mimuw.edu.pl
Sat Feb 20 11:11:03 CST 2016


Quote/Cytat - Elias Mårtenson <lokedhs at gmail.com> (Sat 20 Feb 2016  
11:23:13 AM CET):

> Hello Unicode,
>
> I have been involved in a rather long discussion on the Emacs-devel mailing
> list[1] concerning the right way to do character folding and we've reached
> a point where input from Unicode experts would be welcome.
>
> The problem is the implementation of equivalence when searching for
> characters. For example, if I have a buffer containing the following
> characters (both using the precomposed and canonical forms):
>
>     o ö ø ó n ñ
>
> The character folding feature in Emacs allows a search for "o" to mach some
> or even all of these characters. The discussion on the mailing list has
> circulated around both the fact that the correct behaviour here is
> locale-dependent, and also on the correct way to implement this matching
> absent any locale-specific exceptions.

What about just using the POSIX equivalent classes in regular expression?

 From

http://www.regular-expressions.info/posixbrackets.html

A POSIX locale can define character equivalents that indicate that  
certain characters should be considered as identical for sorting. In  
French, for example, accents are ignored when ordering words. élève  
comes before être which comes before événement. é and ê are all the  
same as e, but l comes before t which comes before v. With the locale  
set to French, a POSIX-compliant regular expression engine matches e,  
é, è and ê when you use the collating sequence [=e=] in the bracket  
expression [[=e=]].

Regards

Janusz
(an Emacs user)


-- 
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)
Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsbien at uw.edu.pl, jsbien at mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



More information about the Unicode mailing list