Potential contradiction between the WordBreak test data and UAX #29
Philippe Verdy
verdy_p at wanadoo.fr
Wed Nov 23 05:20:44 CST 2016
2016-11-23 12:00 GMT+01:00 Tom Hacohen <tom at osg.samsung.com>:
>
> Also take another look at http://www.unicode.org/reports
> /tr29/#Grapheme_Cluster_and_Format_Rules specifically the table that
> shows another way of writing the ignore rule. This again shows my
> understanding of rule 4 is correct.
>
> Specially look at the following equivalence:
> X Y × Z W ⇒ X (Extend | Format)* Y (Extend | Format)* × Z
> (Extend | Format)* W
>
This expansion does not occur before rule WB4; it cannot be used to
transform rules WB1 to WB3c; this is explicitly stated in the algorithm.
And because the rule WB3c handles your case, you are misinterpreting the
specs as if it was applying there too...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161123/c5f58725/attachment.html>
More information about the Unicode
mailing list