Internationalised Computer Science Exercises

Richard Wordingham via Unicode unicode at
Mon Feb 5 15:37:30 CST 2018

On Thu, 1 Feb 2018 19:20:04 +0000
Richard Wordingham via Unicode <unicode at> wrote:

> A regular trace expression of the form
> [:ccc=1:][:ccc=2:]…[:ccc=n:]
> seems to require 2^n states in your scheme.  As I effectively only
> apply the regex to NFD input strings, I use fewer states.  However,
> the efficiency of my scheme depends on the order of the commuting
> factors - reverse order would require the 2^n states.

I've overstated the compactness of my scheme.  Firstly, I split the
state for an optionally final matched character into two states
according to whether it is to be the final character or not.
Secondly, the DFA for a Unicode character is quite large.  I've
kept it simple and identify most states by the matched Unicode
character, which means I have nearly a million states, whereas I could
probably whittle it down to more like a thousand or so, at a vast
increase in complexity.


More information about the Unicode mailing list