Pure Regular Expression Engines and Literal Clusters

Richard Wordingham via Unicode unicode at unicode.org
Thu Oct 10 16:54:35 CDT 2019


On Tue, 8 Oct 2019 15:25:34 +0100
Richard Wordingham via Unicode <unicode at unicode.org> wrote:

> An example UTS#18 gives for matching a literal cluster can be
> simplified to, in its notation:
> 
> [c \q{ch}]
> 
> This is interpreted as 'match against "ch" if possible, otherwise
> against "c".  Thus the strings "ca" and "cha" would both match the
> expression
> 
> [c \q{ch}]a
> 
> while "chh" but not "ch" would match against
> 
> [c \q{ch}]h
> 
> Or have I got this wrong?

After comparing this with the Perl behaviour of /(:?ch|c)
and /(:?ch|c)h, I've come to the conclusion that I've got the
interpretation wrong.  The former may match "ch" or "c", and I
conclude that the only funny meaning of \q is to indicate a preference
for the sequence of two characters - if the engine yields all matches,
it has no meaning.

This greatly simplifies matters.

Richard.


More information about the Unicode mailing list