Clarification on Annex 29, GB12–13

Don Hosek don.hosek at gmail.com
Wed Mar 30 22:16:23 CDT 2022


Annex 29 says:
> Do not break within emoji flag sequences. That is, do not break between regional indicator (RI) symbols if there is an odd number of RI characters before the break point.
> GB12	sot (RI RI)* RI	×	RI
> GB13	[^RI] (RI RI)* RI	×	RI

This would seem to indicate that any even number of RI tags should be treated as a single grapheme so given, e.g., 🇦🇹🇦🇺🇦🇶 this should be a single grapheme rather than the expected three. There is no test in https://www.unicode.org/Public/14.0.0/ucd/auxiliary/GraphemeBreakTest.txt that would enforce this however. Or is this just a case of my misreading the spec and there is an implicit ÷ after each pair of RI characters? (if the latter, it might be helpful for future implementors to have a note to that effect).

-dh


More information about the Unicode mailing list