Question Regarding UCD Draft Files and GraphemeBreakTest Discrepancy

Naoto Sato naoto.sato at oracle.com
Fri Mar 21 16:24:31 CDT 2025


Hello,

I have a question regarding the draft version of the UCD files 
(https://www.unicode.org/Public/draft/ucd/). I’m not sure if this is the 
appropriate place for such inquiries, so please forgive me if it is not.

While testing the draft "emoji-data.txt" 
(https://www.unicode.org/Public/draft/ucd/emoji/emoji-data.txt), I 
encountered a failing test case in GraphemeBreakTest:

÷ 2701 × 200D × 2701 ÷  #  ÷ [0.2] UPPER BLADE SCISSORS (ExtPict) × 
[9.0] ZERO WIDTH JOINER (ZWJ) × [11.0] UPPER BLADE SCISSORS (ExtPict) ÷ 
[0.3]

This test case assumes that U+2701 is classified as 
Extended_Pictographic. However, the latest emoji-data.txt does not 
include it, whereas version 16.0 did. Additionally, the web version of 
the test 
(https://www.unicode.org/Public/draft/ucd/auxiliary/GraphemeBreakTest.html#s23) 
also indicates that U+2701 is an Extended_Pictographic, leading to an 
inconsistency.

This discrepancy is causing our test to fail. Could you clarify whether 
this is an issue or an expected change?

Thanks,
Naoto


More information about the Unicode mailing list