UAX #29 6.2

Zack Newman via Unicode unicode at unicode.org
Fri Mar 6 21:36:31 CST 2020


According to 6.2, "thus ignoring Extend is sufficient to disallow breaking
within a grapheme cluster." However the sequence of Unicode scalar values
(U+0600, U+0020) is considered a single grapheme cluster due to rule GB9,
but the sequence is parsed into two words according to 4.1.1. While it
would be ideal to not have sequences of Unicode scalar values that can be
parsed into more words than grapheme clusters, I think it's more
understandable if section 6.2 didn't explicitly state that this isn't
possible.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20200306/446f3dc7/attachment.html>


More information about the Unicode mailing list