Why is pattern-matching of NULs slow?
Hans Åberg
haberg-1 at telia.com
Fri Apr 8 08:21:43 CDT 2022
> On 8 Apr 2022, at 13:22, Roger L Costello via Unicode <unicode at corp.unicode.org> wrote:
>
> "Flex" is a tool for tokenizing a string. The Flex manual says this:
>
> Pattern-matching of NULs is substantially slower
> than matching other characters.
>
> Is this peculiar to Flex or is pattern-matching NULs slow in all pattern-matching tools?
The underlying DFA algorithm treats all symbols equally. So it must have something to do with its implementation.
> Why would pattern-matching NULs be slower than pattern-matching other characters?
One chooses ones set of symbols, for example, for Unicode one can chose to convert to UTF-8 byte classes instead of using code points.
So perhaps in Flex, NUL is not a part of that symbol set and treated specially. Just a wild guess.
More information about the Unicode
mailing list