Clarification on Annex 29, GB12–13
Don Hosek
don.hosek at gmail.com
Wed Mar 30 22:16:23 CDT 2022
Annex 29 says:
> Do not break within emoji flag sequences. That is, do not break between regional indicator (RI) symbols if there is an odd number of RI characters before the break point.
> GB12 sot (RI RI)* RI × RI
> GB13 [^RI] (RI RI)* RI × RI
This would seem to indicate that any even number of RI tags should be treated as a single grapheme so given, e.g., 🇦🇹🇦🇺🇦🇶 this should be a single grapheme rather than the expected three. There is no test in https://www.unicode.org/Public/14.0.0/ucd/auxiliary/GraphemeBreakTest.txt that would enforce this however. Or is this just a case of my misreading the spec and there is an implicit ÷ after each pair of RI characters? (if the latter, it might be helpful for future implementors to have a note to that effect).
-dh
More information about the Unicode
mailing list