Why is pattern-matching of NULs slow?

Hans Åberg haberg-1 at telia.com
Fri Apr 8 08:21:43 CDT 2022


> On 8 Apr 2022, at 13:22, Roger L Costello via Unicode <unicode at corp.unicode.org> wrote:
> 
> "Flex" is a tool for tokenizing a string. The Flex manual says this:
> 
>    Pattern-matching of NULs is substantially slower
>    than matching other characters.
> 
> Is this peculiar to Flex or is pattern-matching NULs slow in all pattern-matching tools?

The underlying DFA algorithm treats all symbols equally. So it must have something to do with its implementation.

> Why would pattern-matching NULs be slower than pattern-matching other characters?

One chooses ones set of symbols, for example, for Unicode one can chose to convert to UTF-8 byte classes instead of using code points.

So perhaps in Flex, NUL is not a part of that symbol set and treated specially. Just a wild guess.





More information about the Unicode mailing list