Potential contradiction between the WordBreak test data and UAX #29

Philippe Verdy verdy_p at wanadoo.fr
Wed Nov 23 05:20:44 CST 2016


2016-11-23 12:00 GMT+01:00 Tom Hacohen <tom at osg.samsung.com>:

>
> Also take another look at http://www.unicode.org/reports
> /tr29/#Grapheme_Cluster_and_Format_Rules specifically the table that
> shows another way of writing the ignore rule. This again shows my
> understanding of rule 4 is correct.
>
> Specially look at the following equivalence:
> X Y × Z W       ⇒       X (Extend | Format)* Y (Extend | Format)* × Z
> (Extend | Format)* W
>

This expansion does not occur before rule WB4; it cannot be used to
transform rules WB1 to WB3c; this is explicitly stated in the algorithm.
And because the rule WB3c handles your case, you are misinterpreting the
specs as if it was applying there too...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161123/c5f58725/attachment.html>


More information about the Unicode mailing list