Potential contradiction between the WordBreak test data and UAX #29

Daniel Bünzli daniel.buenzli at erratique.ch
Wed Nov 23 05:45:04 CST 2016

On Wednesday 23 November 2016 at 12:28, Tom Hacohen wrote:
> I took a look at the ICU sources, and they explicitly mention this case,
> so it seems I was mistaken with interpreting the intention of the UAX. I 
> still find it confusing, but based on this thread, it seems to just be me.

It's not only you, I also sometimes get confused by it (see for example [1] and subsequent messages). Maybe the operational model could be clarified a bit. 

I also think it would be better if the UAX29 didn't use ignore rules at all, so that going from rules to implementation is more straightforward --- though I understand it may make the spec harder to maintain.



[1] http://www.unicode.org/mail-arch/unicode-ml/y2016-m06/0088.html

