Regexes, Canonical Equivalence and Backtracking of Input

Richard Wordingham richard.wordingham at ntlworld.com
Mon May 18 15:46:44 CDT 2015


On Mon, 18 May 2015 22:40:21 +0300
Eli Zaretskii <eliz at gnu.org> wrote:

> > Date: Mon, 18 May 2015 19:35:45 +0100
> > From: Richard Wordingham <richard.wordingham at ntlworld.com>
> > 
> > Mark Davis has published an algorithm to generate all strings
> > canonically equivalent to a Unicode string
> 
> Where can I find the description of that algorithm?

Section 5 of http://unicode.org/notes/tn5/ .  There's a lot of detail
missing, and its easy to overlook the Hangul sylables.  The complete
code is rather more complicated than it looks from the wording,
especially if you want successive candidates on successive calls.  You
also need to include the legal permutations of the non-starters - the
code as given only delivers the FCD canonical equivalents.

On further thought, I also think its actually unnecessary for this
application.

Richard.


More information about the Unicode mailing list