Potential contradiction between the WordBreak test data and UAX #29

Tom Hacohen tom at osg.samsung.com
Wed Nov 23 06:04:30 CST 2016


On 23/11/16 11:45, Daniel Bünzli wrote:
> On Wednesday 23 November 2016 at 12:28, Tom Hacohen wrote:
>> I took a look at the ICU sources, and they explicitly mention this case,
>> so it seems I was mistaken with interpreting the intention of the UAX. I
>> still find it confusing, but based on this thread, it seems to just be me.
>
> It's not only you, I also sometimes get confused by it (see for example [1] and subsequent messages). Maybe the operational model could be clarified a bit.

The comment I quoted from the ICU sources clarifies the intention. Maybe 
a comment similar to one would be helpful?

Also, thinking about it a bit more, the operational order makes sense 
when you consider the CR LF case and extended characters, however it is 
still not obvious from the wording.

Thanks again.

--
Tom.



More information about the Unicode mailing list