Normalization Generics (NFx, NFKx, NFxy)

Sławomir Osipiuk sosipiuk at gmail.com
Fri Dec 11 23:58:41 CST 2020


On Fri, Dec 11, 2020 at 11:49 PM Zach Lym via Unicode
<unicode at unicode.org> wrote:
>
> > A string X is a canonical caseless match for a string Y if and only if:
> > NFD(toCasefold(NFD(X))) = NFD(toCasefold(NFD(Y)))
>
> The W3C Canonical Case Fold Normalization algorithm claims to be
> compatible with [D145], but uses NFC in the last step
> [w3c-charmod-norm], leading to an apparent contradiction.  Even though
> Unicode explains that "case folding is closed under canonical
> normalization" it took me a long time to find that passage and
> convince myself that the W3C and Unicode matching algorithms are
> equivalent.

The more general rule is that:
NFC(X) = NFC(Y) if and only if NFD(X) = NFD(Y).
I.e. you can always replace one canonical form with the other in
equivalence comparisons. (As long as you apply the same one to both
sides, of course, but which one is up to you.)

> I would instead like to propose normalization form generics for use in
> pseudo code definitions:
>
>     NFx = NFD|NFC
>     NFKx = NFKD|NFKC
>     NFxy = NFD|NFC|NFKD|NFKC

I would prefer the last one to be:
NF(K)x = NFD|NFC|NFKD|NFKC; or perhaps
NF[K]x = NFD|NFC|NFKD|NFKC; to look a bit more like ABNF.

Sławomir Osipiuk



More information about the Unicode mailing list