Regular Expressions and Canonical Equivalence

Philippe Verdy verdy_p at
Sun May 17 09:45:18 CDT 2015

2015-05-16 22:33 GMT+02:00 Richard Wordingham <
richard.wordingham at>:

> I'm not at all sure what your example string is.  I ran my program to
> watch its progression with input \u0323\u0323\u0302\u0302, which does
> not match the pattern, and attach the outputs for your scorn.  I have
> added comments started by #.

Sorry for not commenting it, this is the internal tricks and outputs of
your program, and your added comments does not allow me to interpret what
all this means, i.e. the exact role of the notations with sequences or "L"
or "R" or "N", and what the "=>" notation means (I suppose this is noting
an advance rule and that the left-hand side is the state before, the
right-hand-side is the state after, but I don't see where is the condition
(the character or character class to match, or an error condition). You've
only "explained" partly the NDE and ODE comments and the "!" when it is

Is that really what your regexp engine outputs as its internally generated
parser tables (only "friendly" serialized as a "readable" text) ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Unicode mailing list