Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Karl Williamson via Unicode unicode at
Tue May 23 15:57:24 CDT 2017

On 05/23/2017 12:20 PM, Asmus Freytag (c) via Unicode wrote:
> On 5/23/2017 10:45 AM, Markus Scherer wrote:
>> On Tue, May 23, 2017 at 7:05 AM, Asmus Freytag via Unicode 
>> <unicode at <mailto:unicode at>> wrote:
>>     So, if the proposal for Unicode really was more of a "feels right"
>>     and not a "deviate at your peril" situation (or necessary escape
>>     hatch), then we are better off not making a RECOMMEDATION that
>>     goes against collective practice.
>> I think the standard is quite clear about this:
>>     Although a UTF-8 conversion process is required to never consume
>>     well-formed subsequences as part of its error handling for
>>     ill-formed subsequences, such a process is not otherwise
>>     constrained in how it deals with any ill-formed subsequence
>>     itself. An ill-formed subsequence consisting of more than one code
>>     unit could be treated as a single error or as multiple errors.
> And why add a recommendation that changes that from completely up to the 
> implementation (or groups of implementations) to something where one way 
> of doing it now has to justify itself?
> If the thread has made one thing clear is that there's no consensus in 
> the wider community that one approach is obviously better. When it comes 
> to ill-formed sequences, all bets are off. Simple as that.
> Adding a "recommendation" this late in the game is just bad standards 
> policy.
> A./

Unless I misunderstand, you are missing the point.  There is already a 
recommendation listed in TUS, and that recommendation appears to have 
been added without much thought.  There is no proposal to add a 
recommendation "this late in the game".

More information about the Unicode mailing list