Break Rules

Sat Nov 30 01:04:41 CST 2019

Hey everyone,

I'm looking at ICU's text segmentation implementation and have noticed that
the break rules used internally in ICU are not the same as the rules
included in the CLDR. They appear to contain slightly different syntax in
some cases while others are just straight up different (or missing
entirely). ICU also appears to contain "title" break rules which are not
present in the CLDR data set. Some questions:

   1. Why does ICU use a different set of break rules than what's specified
   in the CLDR?
   2. Can the title break rules be contributed back to CLDR?
   3. It doesn't look like ICU passes all the Unicode line break tests. Why?

Thanks!

-Cameron
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20191129/8aea9ec6/attachment.html>