Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Wed May 31 01:08:37 CDT 2017

On Fri, 26 May 2017 21:41:49 +0000
Shawn Steele via Unicode <unicode at unicode.org> wrote:

> I totally get the forward/backward scanning in sync without decoding
> reasoning for some implementations, however I do not think that the
> practices that benefit those should extend to other applications that
> are happy with a different practice.

> In either case, the bad characters are garbage, so neither approach
> is "better" - except that one or the other may be more conducive to
> the requirements of the particular API/application.

There's a potential issue with input methods that indirectly edit the
backing store.  For example, GTK input methods (e.g. function
gtk_im_context_delete_surrounding()) can delete an amount of text
specified in characters, not storage units.  (Deletion by storage
units is not available in this interface.)  This might cause utter
confusion or worse if the backing store starts out corrupt.  A corrupt
backing store is normally manually correctable if most of the text is
ASCII.

Richard.