Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Wed May 31 14:24:04 CDT 2017

> For implementations that emit FFFD while handling text conversion and repair (ie, converting ill-formed
> UTF-8 to well-formed), it is best for interoperability if they get the same results, so that indices within the
> resulting strings are consistent across implementations for all the correct characters thereafter.

That seems optimistic :)

If interoperability is the goal, then it would seem to me that changing the recommendation would be contrary to that goal.  There are systems that will not or cannot change to a new recommendation.  If such systems are updated, then adoption of those systems will likely take some time.

In other words, I cannot see where “consistency across implementations” would be achievable anytime in the near future.

It seems to me that being able to use a data stream of ambiguous quality in another application with predictable results, then that stream should be “repaired” prior to being handed over.  Then both endpoints would be using the same set of FFFDs, whether that was single or multiple forms.

-Shawn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20170531/ef9601eb/attachment.html>