Combining characters
Asmus Freytag
asmusf at ix.netcom.com
Sun Dec 14 16:02:41 CST 2025
On 12/14/2025 9:57 AM, Doug Ewell via Unicode wrote:
> Normalization (NFC or NFD, not NFK*) for characters like this comes into play only when the character exists as both a precomposed unitary character and a combining sequence. When there is only one or the other, normalization to NFC or NFD yields the same result, and is thus a no-op, and not particularly adventurous.
This is actually incorrect. (And Doug actually knows better :) ).
It would be correct for a sequence of a base character with */single
/*combining mark, but as soon as you have two or more combining marks,
their order is defined by NFC. The idea is that that if two combining
marks don't interact (such as by stacking), different orders could
result in the same display and normalization enforces a preferred ordering.
To make matters more complex, some combining marks are defined to not
reorder. Those can be in any order defined by the author and could lead
to duplicate encoding for the same display. The reasons behind
supporting that are a bit complex, but generally it's done for scripts
other than Latin.
But in general, */canonical reordering/* is a thing and is part of
normalization.
A./
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20251214/d5e1fd14/attachment.htm>
More information about the Unicode
mailing list