Possibly incorrect line break tests?

thin.crew1671 at railgunlabs.com thin.crew1671 at railgunlabs.com
Wed Sep 3 11:41:40 CDT 2025


In LineBreakTest.txt, there are test cases that indicate there should *not* be a break after U+0308, however, the LB rule cited does not appear to apply and it would appear that there *should* be a break. For example:

× 000A ÷ 0308 × 23E9 ÷ #  × [0.3] <LINE FEED (LF)> (LF_NotEastAsian) ÷ [5.03] COMBINING DIAERESIS (CM1_NotEastAsian_CM) × [28.0] BLACK RIGHT-POINTING DOUBLE TRIANGLE (AL) ÷ [0.3]

LB28 states "Do not break between alphabetics (“at”)" with the following break rule:

(AL | HL) × (AL | HL)

However, in the aforementioned test case, neither U+000A nor U+0308 has break class AL or HL (they have break class LF and CM). Yet rule 28.0 is cited as the reason for not breaking between U+0308 and U+23E9. It would appear that there _should_ be a break here.

Likewise, for the test:

× 200B ÷ 0308 × 0024 ÷ #  × [0.3] ZERO WIDTH SPACE (ZW_NotEastAsian) ÷ [8.0] COMBINING DIAERESIS (CM1_NotEastAsian_CM) × [24.03] DOLLAR SIGN (PR_NotEastAsian) ÷ [0.3]

LB24 states "Do not break between alphabetics (“at”)" with the following break rule:

(PR | PO) × (AL | HL)
(AL | HL) × (PR | PO)

However, neither U+200B nor U+0308 has break class PR, PO, AL, or HL (they have break class ZW and CM). Yet rule 24.03 is cited as the reason for not breaking between U+0308 and U+0024. It would appear that there _should_ be a break here.

In total, I have collected ~80 test cases from LineBreakTest.txt that exhibit this same pattern.

I'm wondering if these test cases were meant to have a hyphen character because then they'd respect rule LB20a which states "Do not break after a word-initial hyphen". This rule has the definition:

( sot | BK | CR | LF | NL | SP | ZW | CB | GL ) ( HY | [\u2010] ) × AL

So, for example, test case:

× 000A ÷ 0308 × 23E9 ÷ #  LF ÷ CM × AL  (incorrect?)

would become:

× 000A ÷ 0308 ÷ 002D × 23E9 ÷ #  LF ÷ CM ÷ HY × AL  (correct)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20250903/ea47633b/attachment.htm>


More information about the Unicode mailing list