Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Asmus Freytag (c) via Unicode unicode at unicode.org
Tue May 23 13:20:23 CDT 2017


On 5/23/2017 10:45 AM, Markus Scherer wrote:
> On Tue, May 23, 2017 at 7:05 AM, Asmus Freytag via Unicode 
> <unicode at unicode.org <mailto:unicode at unicode.org>> wrote:
>
>     So, if the proposal for Unicode really was more of a "feels right"
>     and not a "deviate at your peril" situation (or necessary escape
>     hatch), then we are better off not making a RECOMMEDATION that
>     goes against collective practice.
>
>
> I think the standard is quite clear about this:
>
>     Although a UTF-8 conversion process is required to never consume
>     well-formed subsequences as part of its error handling for
>     ill-formed subsequences, such a process is not otherwise
>     constrained in how it deals with any ill-formed subsequence
>     itself. An ill-formed subsequence consisting of more than one code
>     unit could be treated as a single error or as multiple errors.
>
>
And why add a recommendation that changes that from completely up to the 
implementation (or groups of implementations) to something where one way 
of doing it now has to justify itself?

If the thread has made one thing clear is that there's no consensus in 
the wider community that one approach is obviously better. When it comes 
to ill-formed sequences, all bets are off. Simple as that.

Adding a "recommendation" this late in the game is just bad standards 
policy.

A./


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20170523/2e00ac33/attachment.html>


More information about the Unicode mailing list