Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8
Mark Davis ☕️ via Unicode
unicode at unicode.org
Thu Aug 3 19:34:15 CDT 2017
FYI, the UTC retracted the following.
the section on "Best Practices for Using FFFD" in section "3.9 Encoding
Forms" of TUS per the recommendation in L2/17-168
<http://www.unicode.org/cgi-bin/GetMatchingDocs.pl?L2/17-168>, for Unicode
On Wed, May 24, 2017 at 3:56 PM, Karl Williamson via Unicode <
unicode at unicode.org> wrote:
> On 05/24/2017 12:46 AM, Martin J. Dürst wrote:
>> On 2017/05/24 05:57, Karl Williamson via Unicode wrote:
>>> On 05/23/2017 12:20 PM, Asmus Freytag (c) via Unicode wrote:
>> Adding a "recommendation" this late in the game is just bad standards
>> Unless I misunderstand, you are missing the point. There is already a
>>> recommendation listed in TUS,
>> That's indeed correct.
>> and that recommendation appears to have
>>> been added without much thought.
>> That's wrong. There was a public review issue with various options and
>> with feedback, and the recommendation has been implemented and in use
>> widely (among else, in major programming language and browsers) without
>> problems for quite some time.
> Could you supply a reference to the PRI and its feedback?
> The recommendation in TUS 5.2 is "Replace each maximal subpart of an
> ill-formed subsequence by a single U+FFFD."
> And I agree with that. And I view an overlong sequence as a maximal
> ill-formed subsequence that should be replaced by a single FFFD. There's
> nothing in the text of 5.2 that immediately follows that recommendation
> that indicates to me that my view is incorrect.
> Perhaps my view is colored by the fact that I now maintain code that was
> written to parse UTF-8 back when overlongs were still considered legal
> input. An overlong was a single unit. When they became illegal, the code
> still considered them a single unit.
> I can understand how someone who comes along later could say C0 can't be
> followed by any continuation character that doesn't yield an overlong,
> therefore C0 is a maximal subsequence.
> But I assert that my interpretation is just as valid as that one. And
> perhaps more so, because of historical precedent.
> It appears to me that little thought was given to the fact that these
> changes would cause overlongs to now be at least two units instead of one,
> making long existing code no longer be best practice. You are effectively
> saying I'm wrong about this. I thought I had been paying attention to
> PRI's since the 5.x series, and I don't remember anything about this. If
> you have evidence to the contrary, please give it. However, I would have
> thought Markus would have dug any up and given it in his proposal.
>> There is no proposal to add a
>>> recommendation "this late in the game".
>> True. The proposal isn't for an addition, it's for a change. The "late in
>> the game" however, still applies.
>> Regards, Martin.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode