Regexes, Canonical Equivalence and Backtracking of Input

Richard Wordingham richard.wordingham at ntlworld.com
Mon May 18 16:14:11 CDT 2015


On Mon, 18 May 2015 22:56:47 +0200
Philippe Verdy <verdy_p at wanadoo.fr> wrote:

> Isn't it possible for your basic substitution to transform \uf073
> into a character class [\uf071\uf072\uf073] that the regexp considers
> as a single entity to check ?
> In that case, backtracking for matching \u0F73*\u0F72 is simpler:
>  [\uF071\uF072\uF073]*\u0F72, as it just requires backtracking only
> one character class (instead of one character).

I'm still waiting for your explanation of how your scheme for European
diacritics (as used in SE Asia) would work.  This thread is intended for
the idea of using the regex to decide which character to take as the
next character from the input trace.  In the other thread, I'm still not
sure whether you're working with traces or strings.

Richard.


More information about the Unicode mailing list