"textels"

Janusz S. Bien jsbien at mimuw.edu.pl
Fri Sep 16 10:57:44 CDT 2016


Quote/Cytat - Eric Muller <eric.muller at efele.net> (pią, 16 wrz 2016,  
17:47:27):

> On 9/16/2016 8:30 AM, Janusz S. Bien wrote:
>> Quote/Cytat - Eric Muller <eric.muller at efele.net> (pią, 16 wrz  
>> 2016, 17:03:54):
>>
>>> On 9/16/2016 6:52 AM, Janusz S. Bień wrote:
>>>> (when working on a corpus of historical Polish we
>>>> noticed some cases where standard Unicode equivalence was not
>>>> convenient).
>>>
>>> I'm very interested to know more about those cases.
>>
>> For our search engine we were unable to use compatibility  
>> equivalence "out of the box" for splitting the ligature because it  
>> also converted long s to short s while we wanted to preserve the  
>> distinction.
>
> I am interested in the problems with *canonical* equivalence. I  
> thought that you were talking about those before.

I apologize for the confusion, that was my fault. I tend to answer too  
quickly and not precisely enough :-(

On the other hand I'm not sure canonical equivalence is always what I  
want and expect, but I don't have specific examples at hand.

Regards

Janusz

-- 
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)
Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsbien at uw.edu.pl, jsbien at mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



More information about the Unicode mailing list