Pure Regular Expression Engines and Literal Clusters

Richard Wordingham via Unicode unicode at unicode.org
Sat Oct 12 17:37:05 CDT 2019


On Sat, 12 Oct 2019 21:36:45 +0200
Hans Åberg via Unicode <unicode at unicode.org> wrote:

> > On 12 Oct 2019, at 14:17, Richard Wordingham via Unicode
> > <unicode at unicode.org> wrote:
> > 
> > But remember that 'having longer first' is meaningless for a
> > non-deterministic finite automaton that does a single pass through
> > the string to be searched.  
> 
> It is possible to identify all submatches deterministically in linear
> time without backtracking — I a made an algorithm for that.

That's impressive, as the number of possible submatches for a*(a*)a* is
quadratic in the string length.

> A selection among different submatches then requires additional rules.

Regards,

Richard.



More information about the Unicode mailing list