Potential contradiction between the WordBreak test data and UAX #29
Tom Hacohen
tom at osg.samsung.com
Wed Nov 23 06:04:30 CST 2016
On 23/11/16 11:45, Daniel Bünzli wrote:
> On Wednesday 23 November 2016 at 12:28, Tom Hacohen wrote:
>> I took a look at the ICU sources, and they explicitly mention this case,
>> so it seems I was mistaken with interpreting the intention of the UAX. I
>> still find it confusing, but based on this thread, it seems to just be me.
>
> It's not only you, I also sometimes get confused by it (see for example [1] and subsequent messages). Maybe the operational model could be clarified a bit.
The comment I quoted from the ICU sources clarifies the intention. Maybe
a comment similar to one would be helpful?
Also, thinking about it a bit more, the operational order makes sense
when you consider the CR LF case and extended characters, however it is
still not obvious from the wording.
Thanks again.
--
Tom.
More information about the Unicode
mailing list