Possible bug in formal grammar for extended grapheme cluster

Ah! That explains why

pcre2grep -u '^\X{1}$'

matches with



Thanks for the feedback. You're correct about this; that is a holdover from an earlier version of the document when there was a more basic treatment of RI sequences.

There is already an action to modify these. There is a placeholder review note about that just above


It’s possible I’m missing something, but the formal grammar/regular
expression given for extended grapheme clusters appears to have a bug
in it.

The bug is here:

    RI-Sequence := Regional_Indicator+

If the formal grammar is intended to exactly match the rules given the
the “Grapheme Cluster Boundary Rules” section below it as-is, then
this should be

    RI-Sequence := Regional_Indicator Regional_Indicator

since as given it would cause any number of RI characters to coalesce
into a single grapheme cluster, instead of pairs of characters. That
is, the text U+1F1EC U+1F1E7 U+1F1EA U+1F1FA would represent one
grapheme cluster instead of the correct two.

