Aw: Re: Normalization Generics (NFx, NFKx, NFxy)

Harriet Riddle harjitmoe at outlook.com
Mon Dec 14 08:22:59 CST 2020


Marius Spix via Unicode wrote:
> I understand that:
> [:toCaseFold=s:] = [sSſ]
> [:toCaseFold=ς:] = [σςΣ]
> But can someone explain me the following?
> [:toCaseFold=ı:] = [ı]
> [:toCaseFold=i:] = [iI]
> [:toCaseFold=ß:] = []
> Why is it not:
> [:toCaseFold=ı:] = [iIı]
> [:toCaseFold=i:] = [iIı]
> [:toCaseFold=ß:] = [ßẞ]
> ?
>


ß is often changed to SS in uppercase; the ẞ is a relatively new 
addition as an encoded character and is not consistently used.  So 
PREUSSEN and Preußen are casings of the same word, for example.  I think 
ẞ might have been added after ß's casefolding was already defined, but 
I'm not sure so don't quote me on that.

"I" cannot casefold to *both* "i" and "ı", it has to casefold to one of 
them.  Not sure about "ı" not casefolding the same as "I", but I don't 
suppose there really exists any "good" locale-independent solution for 
case insensitivity of "I".

— Har.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20201214/cc4afd8b/attachment.htm>


More information about the Unicode mailing list