Pure Regular Expression Engines and Literal Clusters
Richard Wordingham via Unicode
unicode at unicode.org
Thu Oct 10 16:54:35 CDT 2019
On Tue, 8 Oct 2019 15:25:34 +0100
Richard Wordingham via Unicode <unicode at unicode.org> wrote:
> An example UTS#18 gives for matching a literal cluster can be
> simplified to, in its notation:
>
> [c \q{ch}]
>
> This is interpreted as 'match against "ch" if possible, otherwise
> against "c". Thus the strings "ca" and "cha" would both match the
> expression
>
> [c \q{ch}]a
>
> while "chh" but not "ch" would match against
>
> [c \q{ch}]h
>
> Or have I got this wrong?
After comparing this with the Perl behaviour of /(:?ch|c)
and /(:?ch|c)h, I've come to the conclusion that I've got the
interpretation wrong. The former may match "ch" or "c", and I
conclude that the only funny meaning of \q is to indicate a preference
for the sequence of two characters - if the engine yields all matches,
it has no meaning.
This greatly simplifies matters.
Richard.
More information about the Unicode
mailing list