Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8
Jonathan Coxhead via Unicode
unicode at unicode.org
Tue May 23 01:10:09 CDT 2017
On 18/05/2017 1:58 am, Alastair Houghton via Unicode wrote:
> On 18 May 2017, at 07:18, Henri Sivonen via Unicode <unicode at unicode.org> wrote:
>> the decision complicates U+FFFD generation when validating UTF-8 by state machine.
> It *really* doesn’t. Even if you’re hell bent on using a pure state machine approach, you need to add maybe two additional error states (two-trailing-bytes-to-eat-then-fffd and one-trailing-byte-to-eat-then-fffd) on top of the states you already have. The implementation complexity argument is a *total* red herring.
Heh. A state machine with N+2 states is, /a fortiori/, more complex
than one with N states. So I think your argument is self-contradictory.
> Alastair.
~ʝ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20170522/7c3a77af/attachment.html>
More information about the Unicode
mailing list