Combining Characters
Jacob Moody
moody at posixcafe.org
Fri Dec 19 10:32:57 CST 2025
On 12/14/25 17:54, Martin J. Dürst via Unicode wrote:
>
>> The only case I can see where things could get weird would be if there
>> suddenly became some weird case where, e.g., the Jovians insisted that the
>> combining backslash must appear before the letter and not after it (and
>> it’s been a few years since I had to really look at the rules and this
>> might be possible with the existing combining character classes anyway).
>
> Because of the way we have optimized normalization in Ruby (caching
> normalization results for runs of a base character followed by
> modifiers), that wasn't exactly true when we upgraded to Unicode 16.0.0.
> See the "Normalization Behavior" entry at
> https://www.unicode.org/versions/Unicode16.0.0/#Migration.
I also ran in to some issues with exactly this with my implementation for 9front[0].
Took me a bit to figure out what was going on, unfortunately I had first written my
implementation for v15 so at the time I wasn't sure if I had somehow overfit my
code to 15 or something had changed.
>
> New scripts introduced in 16.0.0 (Kirat Rai, Tulu-Tigalari, and Gurung
> Khema) contained combining marks that had combining class 0 and were
> also base characters combining with other combining marks (or even with
> themselves). That was something we hadn't taken account of in our
> implementation previously (because it was not needed).
>
I do wish the documents on migration[1] had explicitly explained that these
new characters have ccc=0 conjoiners, it may imply it when discussing them,
and maybe I'm still a bit green on the details to put 2 and 2 together
but it would have saved me some time.
On the topic I did find the suggested resolution of using the quickcheck value a bit strange,
as far as I know use of quickcheck was not strictly required for normalziation prior
to this update. Or well, my v15 implementation did not use it and passed all the normalization
tests. I guess as an upside I found that with these changes and the inclusion of quickcheck hangul
no longer needed to be special cased.
Thanks,
Jacob Moody
[0] https://github.com/9front/9front/blob/front/sys/src/libc/ucd/runenorm.c
[1] https://www.unicode.org/reports/tr15/tr15-56.html#Contexts_Care
More information about the Unicode
mailing list