Potential contradiction between the WordBreak test data and UAX #29

Tom Hacohen tom at osg.samsung.com
Wed Nov 23 06:04:30 CST 2016

On 23/11/16 11:45, Daniel Bünzli wrote:
> On Wednesday 23 November 2016 at 12:28, Tom Hacohen wrote:
>> I took a look at the ICU sources, and they explicitly mention this case,
>> so it seems I was mistaken with interpreting the intention of the UAX. I
>> still find it confusing, but based on this thread, it seems to just be me.
> It's not only you, I also sometimes get confused by it (see for example [1] and subsequent messages). Maybe the operational model could be clarified a bit.

The comment I quoted from the ICU sources clarifies the intention. Maybe 
a comment similar to one would be helpful?

Also, thinking about it a bit more, the operational order makes sense 
when you consider the CR LF case and extended characters, however it is 
still not obvious from the wording.

Thanks again.


More information about the Unicode mailing list