Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Shawn Steele via Unicode unicode at
Tue May 30 12:05:23 CDT 2017

> I think nobody is debating that this is *one way* to do things, and that some code does it.

Except that they sort of are.  The premise is that the "old language was wrong", and the "new language is right."  The reason we know the old language was wrong was that there was a bug filed against an implementation because it did not conform to the old language.  The response to the application bug was to change the standard's recommendation.

If this language is adopted, then the opposite is going to happen:  Bugs will be filed against applications that conform to the old recommendation and not the new recommendation.  They will say "your code could be better, it is not following the recommendation."  Eventually that will escalate to some level that it will need to be considered, however, regardless of the improvements, it will be a "breaking change".

Changing code from one recommendation to another will change behavior.  For applications or SDKs with enough visibility, that will break *someone* because that's how these things work.  For applications that choose not to change, in response to some RFP, someone's going to say "you don't fully conform to Unicode, we'll go with a different vendor."  Not saying that these things make sense, that's just the way the world works.

In some situations, one form is better, in some cases another form is better.  If the intent is truly that there is not "one way to do things," then the language should reflect that.


More information about the Unicode mailing list