Combining characters

Asmus Freytag asmusf at ix.netcom.com
Sun Dec 14 16:02:41 CST 2025


On 12/14/2025 9:57 AM, Doug Ewell via Unicode wrote:
> Normalization (NFC or NFD, not NFK*) for characters like this comes into play only when the character exists as both a precomposed unitary character and a combining sequence. When there is only one or the other, normalization to NFC or NFD yields the same result, and is thus a no-op, and not particularly adventurous.

This is actually incorrect. (And Doug actually knows better :) ).

It would be correct for a sequence of a base character with */single 
/*combining mark, but as soon as you have two or more combining marks, 
their order is defined by NFC. The idea is that that if two combining 
marks don't interact (such as by stacking), different orders could 
result in the same display and normalization enforces a preferred ordering.

To make matters more complex, some combining marks are defined to not 
reorder. Those can be in any order defined by the author and could lead 
to duplicate encoding for the same display. The reasons behind 
supporting that are a bit complex, but generally it's done for scripts 
other than Latin.

But in general, */canonical reordering/* is a thing and is part of 
normalization.

A./
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20251214/d5e1fd14/attachment.htm>


More information about the Unicode mailing list