Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8
Karl Williamson via Unicode
unicode at unicode.org
Tue May 23 15:57:24 CDT 2017
On 05/23/2017 12:20 PM, Asmus Freytag (c) via Unicode wrote:
> On 5/23/2017 10:45 AM, Markus Scherer wrote:
>> On Tue, May 23, 2017 at 7:05 AM, Asmus Freytag via Unicode
>> <unicode at unicode.org <mailto:unicode at unicode.org>> wrote:
>> So, if the proposal for Unicode really was more of a "feels right"
>> and not a "deviate at your peril" situation (or necessary escape
>> hatch), then we are better off not making a RECOMMEDATION that
>> goes against collective practice.
>> I think the standard is quite clear about this:
>> Although a UTF-8 conversion process is required to never consume
>> well-formed subsequences as part of its error handling for
>> ill-formed subsequences, such a process is not otherwise
>> constrained in how it deals with any ill-formed subsequence
>> itself. An ill-formed subsequence consisting of more than one code
>> unit could be treated as a single error or as multiple errors.
> And why add a recommendation that changes that from completely up to the
> implementation (or groups of implementations) to something where one way
> of doing it now has to justify itself?
> If the thread has made one thing clear is that there's no consensus in
> the wider community that one approach is obviously better. When it comes
> to ill-formed sequences, all bets are off. Simple as that.
> Adding a "recommendation" this late in the game is just bad standards
Unless I misunderstand, you are missing the point. There is already a
recommendation listed in TUS, and that recommendation appears to have
been added without much thought. There is no proposal to add a
recommendation "this late in the game".
More information about the Unicode