Combining Characters

Jacob Moody moody at posixcafe.org
Fri Dec 19 10:32:57 CST 2025


On 12/14/25 17:54, Martin J. Dürst via Unicode wrote:
> 
>> The only case I can see where things could get weird would be if there
>> suddenly became some weird case where, e.g., the Jovians insisted that the
>> combining backslash must appear before the letter and not after it (and
>> it’s been a few years since I had to really look at the rules and this
>> might be possible with the existing combining character classes anyway).
> 
> Because of the way we have optimized normalization in Ruby (caching 
> normalization results for runs of a base character followed by 
> modifiers), that wasn't exactly true when we upgraded to Unicode 16.0.0.
> See the "Normalization Behavior" entry at 
> https://www.unicode.org/versions/Unicode16.0.0/#Migration.

I also ran in to some issues with exactly this with my implementation for 9front[0].
Took me a bit to figure out what was going on, unfortunately I had first written my
implementation for v15 so at the time I wasn't sure if I had somehow overfit my
code to 15 or something had changed.
> 
> New scripts introduced in 16.0.0 (Kirat Rai, Tulu-Tigalari, and Gurung 
> Khema) contained combining marks that had combining class 0 and were 
> also base characters combining with other combining marks (or even with 
> themselves). That was something we hadn't taken account of in our 
> implementation previously (because it was not needed).
> 

I do wish the documents on migration[1] had explicitly explained that these
new characters have ccc=0 conjoiners, it may imply it when discussing them,
and maybe I'm still a bit green on the details to put 2 and 2 together
but it would have saved me some time.

On the topic I did find the suggested resolution of using the quickcheck value a bit strange,
as far as I know use of quickcheck was not strictly required for normalziation prior
to this update. Or well, my v15 implementation did not use it and passed all the normalization
tests. I guess as an upside I found that with these changes and the inclusion of quickcheck hangul
no longer needed to be special cased.

Thanks,
Jacob Moody

[0] https://github.com/9front/9front/blob/front/sys/src/libc/ucd/runenorm.c
[1] https://www.unicode.org/reports/tr15/tr15-56.html#Contexts_Care


More information about the Unicode mailing list