Regexes, Canonical Equivalence and Backtracking of Input

Richard Wordingham richard.wordingham at
Mon May 18 15:46:44 CDT 2015

On Mon, 18 May 2015 22:40:21 +0300
Eli Zaretskii <eliz at> wrote:

> > Date: Mon, 18 May 2015 19:35:45 +0100
> > From: Richard Wordingham <richard.wordingham at>
> > 
> > Mark Davis has published an algorithm to generate all strings
> > canonically equivalent to a Unicode string
> Where can I find the description of that algorithm?

Section 5 of .  There's a lot of detail
missing, and its easy to overlook the Hangul sylables.  The complete
code is rather more complicated than it looks from the wording,
especially if you want successive candidates on successive calls.  You
also need to include the legal permutations of the non-starters - the
code as given only delivers the FCD canonical equivalents.

On further thought, I also think its actually unnecessary for this


More information about the Unicode mailing list