Aw: Re: Normalization Generics (NFx, NFKx, NFxy)
Harriet Riddle
harjitmoe at outlook.com
Mon Dec 14 08:22:59 CST 2020
Marius Spix via Unicode wrote:
> I understand that:
> [:toCaseFold=s:] = [sSſ]
> [:toCaseFold=ς:] = [σςΣ]
> But can someone explain me the following?
> [:toCaseFold=ı:] = [ı]
> [:toCaseFold=i:] = [iI]
> [:toCaseFold=ß:] = []
> Why is it not:
> [:toCaseFold=ı:] = [iIı]
> [:toCaseFold=i:] = [iIı]
> [:toCaseFold=ß:] = [ßẞ]
> ?
>
ß is often changed to SS in uppercase; the ẞ is a relatively new
addition as an encoded character and is not consistently used. So
PREUSSEN and Preußen are casings of the same word, for example. I think
ẞ might have been added after ß's casefolding was already defined, but
I'm not sure so don't quote me on that.
"I" cannot casefold to *both* "i" and "ı", it has to casefold to one of
them. Not sure about "ı" not casefolding the same as "I", but I don't
suppose there really exists any "good" locale-independent solution for
case insensitivity of "I".
— Har.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20201214/cc4afd8b/attachment.htm>
More information about the Unicode
mailing list