Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8
Alastair Houghton via Unicode
unicode at unicode.org
Tue May 23 04:17:06 CDT 2017
On 23 May 2017, at 07:10, Jonathan Coxhead via Unicode <unicode at unicode.org> wrote:
> On 18/05/2017 1:58 am, Alastair Houghton via Unicode wrote:
>> On 18 May 2017, at 07:18, Henri Sivonen via Unicode <unicode at unicode.org>
>>> the decision complicates U+FFFD generation when validating UTF-8 by state machine.
>> It *really* doesn’t. Even if you’re hell bent on using a pure state machine approach, you need to add maybe two additional error states (two-trailing-bytes-to-eat-then-fffd and one-trailing-byte-to-eat-then-fffd) on top of the states you already have. The implementation complexity argument is a *total* red herring.
> Heh. A state machine with N+2 states is, a fortiori, more complex than one with N states. So I think your argument is self-contradictory.
You’re being overly pedantic (and in this case, actually, the cyclomatic complexity of the state machine wouldn’t increase). In any case, Henri is complaining that it’s too difficult to implement; it isn’t. You need two extra states, both of which are trivial.
The point I was making was that this is not a strong argument against the proposed change, *even if* we were treating it as a requirement, which it isn’t.
More information about the Unicode