Internationalised Computer Science Exercises
Richard Wordingham via Unicode
unicode at unicode.org
Mon Feb 5 15:37:30 CST 2018
On Thu, 1 Feb 2018 19:20:04 +0000
Richard Wordingham via Unicode <unicode at unicode.org> wrote:
> A regular trace expression of the form
>
> [:ccc=1:][:ccc=2:]…[:ccc=n:]
>
> seems to require 2^n states in your scheme. As I effectively only
> apply the regex to NFD input strings, I use fewer states. However,
> the efficiency of my scheme depends on the order of the commuting
> factors - reverse order would require the 2^n states.
I've overstated the compactness of my scheme. Firstly, I split the
state for an optionally final matched character into two states
according to whether it is to be the final character or not.
Secondly, the DFA for a Unicode character is quite large. I've
kept it simple and identify most states by the matched Unicode
character, which means I have nearly a million states, whereas I could
probably whittle it down to more like a thousand or so, at a vast
increase in complexity.
Richard.
More information about the Unicode
mailing list