Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8
Karl Williamson via Unicode
unicode at unicode.org
Wed May 24 17:56:39 CDT 2017
On 05/24/2017 12:46 AM, Martin J. Dürst wrote:
> On 2017/05/24 05:57, Karl Williamson via Unicode wrote:
>> On 05/23/2017 12:20 PM, Asmus Freytag (c) via Unicode wrote:
>>> Adding a "recommendation" this late in the game is just bad standards
>> Unless I misunderstand, you are missing the point. There is already a
>> recommendation listed in TUS,
> That's indeed correct.
>> and that recommendation appears to have
>> been added without much thought.
> That's wrong. There was a public review issue with various options and
> with feedback, and the recommendation has been implemented and in use
> widely (among else, in major programming language and browsers) without
> problems for quite some time.
Could you supply a reference to the PRI and its feedback?
The recommendation in TUS 5.2 is "Replace each maximal subpart of an
ill-formed subsequence by a single U+FFFD."
And I agree with that. And I view an overlong sequence as a maximal
ill-formed subsequence that should be replaced by a single FFFD.
There's nothing in the text of 5.2 that immediately follows that
recommendation that indicates to me that my view is incorrect.
Perhaps my view is colored by the fact that I now maintain code that was
written to parse UTF-8 back when overlongs were still considered legal
input. An overlong was a single unit. When they became illegal, the
code still considered them a single unit.
I can understand how someone who comes along later could say C0 can't be
followed by any continuation character that doesn't yield an overlong,
therefore C0 is a maximal subsequence.
But I assert that my interpretation is just as valid as that one. And
perhaps more so, because of historical precedent.
It appears to me that little thought was given to the fact that these
changes would cause overlongs to now be at least two units instead of
one, making long existing code no longer be best practice. You are
effectively saying I'm wrong about this. I thought I had been paying
attention to PRI's since the 5.x series, and I don't remember anything
about this. If you have evidence to the contrary, please give it.
However, I would have thought Markus would have dug any up and given it
in his proposal.
>> There is no proposal to add a
>> recommendation "this late in the game".
> True. The proposal isn't for an addition, it's for a change. The "late
> in the game" however, still applies.
> Regards, Martin.
More information about the Unicode