Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Tue May 16 03:00:13 CDT 2017

On Tue, 16 May 2017 10:01:03 +0300
Henri Sivonen via Unicode <unicode at unicode.org> wrote:

> Even so, I think even changing a recommendation of "best practice"
> needs way better rationale than "feels right" or "ICU already does it"
> when a) major browsers (which operate in the most prominent
> environment of broken and hostile UTF-8) agree with the
> currently-recommended best practice and b) the currently-recommended
> best practice makes more sense for implementations where "UTF-8
> decoding" is actually mere "UTF-8 validation".

There was originally an attempt to prescribe rather than to recommend
the interpretation of ill-formed 8-bit Unicode strings.  It may even
briefly have been an issued prescription, until common sense prevailed.
I do remember a sinking feeling when I thought I would have to change
my own handling of bogus UTF-8, only to be relieved later when it
became mere best practice.  However, it is not uncommon for coding
standards to prescribe 'best practice'.

Richard.