Pure Regular Expression Engines and Literal Clusters

Markus Scherer via Unicode unicode at unicode.org
Fri Oct 11 16:35:33 CDT 2019


On Fri, Oct 11, 2019 at 12:05 PM Richard Wordingham via Unicode <
unicode at unicode.org> wrote:

> On Thu, 10 Oct 2019 15:23:00 -0700
> Markus Scherer via Unicode <unicode at unicode.org> wrote:
>
> > [c \q{ch}]h should work like (ch|c)h. Note that the order matters in
> > the alternation -- so this works equivalently if longer strings are
> > sorted first.
>
> Thanks for answering the question.
>
> Does conformance UTS#18 to level 2 mandate the choice of matching
> substring? This would appear to prohibit compliance to POSIX rules,
> where the length of overall match counts.
>

We just had a discussion this week. Mark will revise the proposed update.

The idea is currently to specify properties-of-strings (and I think a
range/class with "clusters") behaving like an alternation where the longest
strings are first, and leaving it up to the regex engine exactly what that
means.

In general, UTS #18 offers a lot of things that regex implementers may or
may not adopt.

If you have specific ideas, please send them as PRI feedback.
(Discussion on the list is good and useful, but does not guarantee that it
gets looked at when it counts.)

Best regards,
markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20191011/170c7911/attachment.html>


More information about the Unicode mailing list