Question about Normalization in Unicode 16.0.0

Martin J. Dürst duerst at it.aoyama.ac.jp
Tue Apr 22 03:09:40 CDT 2025


Hello Markus,

Many thanks for your quick answer. For whatever reason, my original mail 
was removed; I added it again below.

I have looked at all the links below, in particular
https://www.unicode.org/reports/tr15/#Contexts_Care, and also
looked at the test cases.

What my mail was suggesting was that Unicode add a test case with a 
longer sequence in order to make sure that implementations treat such 
longer sequences correctly, too (even if they shoudn't appear in actual 
text). In the current tests, the sequence U+16121 U+16121 appears only 
twice, in a single line, as a result of normalization.

Regards,    Martin.

On 2025-04-21 09:31, Markus Scherer wrote:
> Hi Martin,
> 
> In the 16.0 alpha
> <https://www.unicode.org/review/pri497/pri497-background.html> & beta
> <https://www.unicode.org/versions/beta-16.0.0.html>, we had prominent
> notices for characters with unusual combinations of normalization
> properties, with a permanent writeup here:
> https://www.unicode.org/reports/tr15/#Contexts_Care
> 
> We also did add several relevant test cases in NormalizationTest.txt.
> 
> Viele Grüße,
> markus
> 

On 2025-04-20 15:58, Martin J. Dürst wrote:
 > Dear Unicoders,
 >
 > At the recent RubyKaigi (https://rubykaigi.org/2025/), I helped upgrade
 > Ruby from Unicode 15.1.0 to 16.0.0. The main issue there was new cases
 > that were not yet handled by our implementation of Normalization.
 >
 > I just want to check my understanding of these new cases. Although the
 > following (eleven horizontal bars on top of a character) is completely
 > hypothetical, it is my understanding that e.g. the sequence of
 > U+1611E U+16121 U+16121 U+16121 U+16121 U+16121 should be normalized to
 > U+16121 U+16121 U+16121 U+16121 U+16121 U+1611E. This would be expressed
 > in Ruby with a test such as the following:
 >
 > def test_gurung_khema
 >    assert_equal "\u{16121 16121 16121 16121 16121 1611E}",
 >         "\u{1611E 16121 16121 16121 16121 16121}".unicode_normalize(:nfc)
 > end
 >
 > It would be good if a few examples like this would be added to the
 > NormalizationTest.txt file in the future. I can help with this if needed.
 >
 > Regards,   Martin.



More information about the Unicode mailing list