Question Regarding UCD Draft Files and GraphemeBreakTest Discrepancy
Naoto Sato
naoto.sato at oracle.com
Fri Mar 21 16:24:31 CDT 2025
Hello,
I have a question regarding the draft version of the UCD files
(https://www.unicode.org/Public/draft/ucd/). I’m not sure if this is the
appropriate place for such inquiries, so please forgive me if it is not.
While testing the draft "emoji-data.txt"
(https://www.unicode.org/Public/draft/ucd/emoji/emoji-data.txt), I
encountered a failing test case in GraphemeBreakTest:
÷ 2701 × 200D × 2701 ÷ # ÷ [0.2] UPPER BLADE SCISSORS (ExtPict) ×
[9.0] ZERO WIDTH JOINER (ZWJ) × [11.0] UPPER BLADE SCISSORS (ExtPict) ÷
[0.3]
This test case assumes that U+2701 is classified as
Extended_Pictographic. However, the latest emoji-data.txt does not
include it, whereas version 16.0 did. Additionally, the web version of
the test
(https://www.unicode.org/Public/draft/ucd/auxiliary/GraphemeBreakTest.html#s23)
also indicates that U+2701 is an Extended_Pictographic, leading to an
inconsistency.
This discrepancy is causing our test to fail. Could you clarify whether
this is an issue or an expected change?
Thanks,
Naoto
More information about the Unicode
mailing list