Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Hans Åberg via Unicode unicode at unicode.org
Wed May 17 16:21:54 CDT 2017


> On 17 May 2017, at 23:18, Doug Ewell <doug at ewellic.org> wrote:
> 
> Hans Åberg wrote:
> 
>>> Far from solving the stated problem, it would introduce a new one:
>>> conversion from the "bad data" Unicode code points, currently
>>> well-defined, would become ambiguous.
>> 
>> Actually not: just translate the invalid UTF-8 sequences into invalid
>> UTF-32.
> 
> Far from solving the stated problem, it would introduce TWO new ones...

There is no good solution to the problem of illegal UTF-8 sequences, as the intent of those is not known.





More information about the Unicode mailing list