Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Henri Sivonen via Unicode unicode at unicode.org
Thu May 18 01:18:48 CDT 2017


On Thu, May 18, 2017 at 2:41 AM, Asmus Freytag via Unicode
<unicode at unicode.org> wrote:
> On 5/17/2017 2:31 PM, Richard Wordingham via Unicode wrote:
>
> There's some sort of rule that proposals should be made seven days in
> advance of the meeting.  I can't find it now, so I'm not sure whether
> the actual rule was followed, let alone what authority it has.
>
> Ideally, proposals that update algorithms or properties of some significance
> should be required to be reviewed in more than one pass. The procedures of
> the UTC are a bit weak in that respect, at least compared to other standards
> organizations. The PRI process addresses that issue to some extent.

What action should I take to make proposals to be considered by the UTC?

I'd like to make two:

 1) Substantive: Reverse the decision to modify U+FFFD best practice
when decoding UTF-8. (I think the decision lacked a truly compelling
reason to change something that has a number of prominent
implementations and the decision complicates U+FFFD generation when
validating UTF-8 by state machine. Aesthetic considerations in error
handling shouldn't outweigh multiple prominent implementations and
shouldn't introduce implementation complexity.)

 2) Procedural: To be considered in the future, proposals to change
what the standard suggests or requires implementations to do should
consider different implementation strategies and discuss the impact of
the change in the light of the different implementation strategies (in
the matter at hand, I think the proposal should have included a
discussion of the impact on UTF-8 validation state machines) and
should include a review of what prominent implementations, including
major browser engines, operating system libraries, and standard
libraries of well-known programming languages, already do. (The more
established the presently specced behavior is among prominent
implementations, the more compelling reason should be required to
change the spec. An implementation hosted by the Consortium itself
shouldn't have special weight compared to other prominent
implementations.)

-- 
Henri Sivonen
hsivonen at hsivonen.fi
https://hsivonen.fi/


More information about the Unicode mailing list