Normalization Generics (NFx, NFKx, NFxy)
Zach Lym
indolering at gmail.com
Sat Dec 12 20:23:23 CST 2020
> The more general rule is that:
> NFC(X) = NFC(Y) if and only if NFD(X) = NFD(Y).
> I.e. you can always replace one canonical form with the other in
> equivalence comparisons. (As long as you apply the same one to both
> sides, of course, but which one is up to you.)
Yes, and a careful reading of the standard will show that this is the
case. But we don't live in a world where people have time to read the
standard. Oh dear, I included the wrong link in my citation! It
should have been:
https://lwn.net/ml/linux-fsdevel/20190206084752.nwjkeiixjks34vao@pali/
At any rate, someone suggested using NFC, but this objection came up:
>> Is there any case where
>> NFC(x) == NFC(y) && NFD(x) != NFD(y) , or
>> NFC(x) != NFC(y) && NFD(x) == NFD(y)
>
>This is good question. And I think we should get definite answer for it
>prior inclusion of normalization into kernel.
Which was simply never followed up on. This is a feature that was
included after years of debate and developed in an open process. If
even Linux can't get this one right, then we need to do a better job
at explaining Unicode.
> > I would instead like to propose normalization form generics for use in
> > pseudo code definitions:
> >
> > NFx = NFD|NFC
> > NFKx = NFKD|NFKC
> > NFxy = NFD|NFC|NFKD|NFKC
>
> I would prefer the last one to be:
> NF(K)x = NFD|NFC|NFKD|NFKC; or perhaps
> NF[K]x = NFD|NFC|NFKD|NFKC; to look a bit more like ABNF.
I don't care for NFxy either, but I strongly prefer sticking to C
programming conventions.
More information about the Unicode
mailing list