<div dir="ltr">This is a misunderstanding of the way the break rules are meant to be applied.<div><br></div><div>The rules test for the presence (÷) or absence (×) of a boundary at a single location in the subject text. When there is an extended context, as in GB12 or GB13, the rules do not imply anything about boundaries, or the lack thereof, within that context. Although it is the case for other rules with context, like WB6 and 7, or the various sentence break rules, that there aren't boundaries within the context.</div><div><br></div><div>This can all get pretty confusing.</div><div><br></div><div>  -- Andy</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Mar 31, 2022 at 7:28 AM Don Hosek via Unicode <<a href="mailto:unicode@corp.unicode.org">unicode@corp.unicode.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Annex 29 says:<br>

> Do not break within emoji flag sequences. That is, do not break between regional indicator (RI) symbols if there is an odd number of RI characters before the break point.<br>

> GB12  sot (RI RI)* RI ×       RI<br>

> GB13  [^RI] (RI RI)* RI       ×       RI<br>

<br>

This would seem to indicate that any even number of RI tags should be treated as a single grapheme so given, e.g., 🇦🇹🇦🇺🇦🇶 this should be a single grapheme rather than the expected three. There is no test in <a href="https://www.unicode.org/Public/14.0.0/ucd/auxiliary/GraphemeBreakTest.txt" rel="noreferrer" target="_blank">https://www.unicode.org/Public/14.0.0/ucd/auxiliary/GraphemeBreakTest.txt</a> that would enforce this however. Or is this just a case of my misreading the spec and there is an implicit ÷ after each pair of RI characters? (if the latter, it might be helpful for future implementors to have a note to that effect).<br>

<br>

-dh<br>

</blockquote></div>