Compatibility normalization (was: RE: Unicode encoding philosophy)

Doug Ewell doug at ewellic.org
Wed Oct 11 17:37:16 CDT 2023


Kent Karlsson wrote:

>> Letʼs consider an equation that youʼll probably recognize, font
>> support willing: 𝐸 = 𝑚𝑐².  Thanks to the power of Unicode, we could
>> use it in the same plain‐text document as, say, ℰ = 𝐦𝕔² while
>> keeping both
>
> That's not really a proper way of representing math expressions.
> For one thing, compatibility normalisation would ruin them (true,
> one is not supposed to apply that, which I agree with, but it
> sometimes is anyway).

I see this claim from time to time, and not only from Kent: we must not use character (sequence) X, or must not use it in contrast with character (sequence) Y which is compatibility-equivalent to X, because some random, unknown process might surreptitiously apply NFKC or NFKD to the text, obliterating the distinction.

Can Kent, or anyone else, please identify a *specific* program or process that does this?

If there are no attested, real-world examples of processes actually applying NFKC or NFKD behind the user’s back (which would indeed be evil), I’m likely to write this off as an urban myth.

--
Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org




More information about the Unicode mailing list