RE: “plain text styling”…

Doug Ewell doug at ewellic.org
Thu Jan 12 11:26:14 CST 2023


Kent Karlsson replied to Asmus Freytag:

>> The main exception to that was mathematical notation, and we opted to
>> make a principled exception, precisely because semantic mapping to
>> highly specific shapes for an individual symbol is or should not be
>> the task of "styling”.
>
> 1. That styling(!) is lost when doing normalizing to NFKD or NFKC.

To be fair, a lot of content may be lost when normalizing to NFKD and NFKC. For example, superscript and subscript digits are normalized to Basic Latin, so 2³ becomes 23. I think it is safe to say that is an important semantic change, and UAX #15 agrees:

“Normalization Forms KC and KD must not be blindly applied to arbitrary text. Because they erase many formatting distinctions, they will prevent round-trip conversion to and from many legacy character sets, and unless supplanted by formatting markup, they may remove distinctions that are important to the semantics of the text. It is best to think of these Normalization Forms as being like uppercase or lowercase mappings: useful in certain contexts for identifying core meanings, but also performing modifications to the text that may not always be appropriate. They can be applied more freely to domains with restricted character sets.”

--
Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org




More information about the Unicode mailing list