Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Richard Wordingham via Unicode unicode at unicode.org
Tue May 16 03:00:13 CDT 2017


On Tue, 16 May 2017 10:01:03 +0300
Henri Sivonen via Unicode <unicode at unicode.org> wrote:

> Even so, I think even changing a recommendation of "best practice"
> needs way better rationale than "feels right" or "ICU already does it"
> when a) major browsers (which operate in the most prominent
> environment of broken and hostile UTF-8) agree with the
> currently-recommended best practice and b) the currently-recommended
> best practice makes more sense for implementations where "UTF-8
> decoding" is actually mere "UTF-8 validation".

There was originally an attempt to prescribe rather than to recommend
the interpretation of ill-formed 8-bit Unicode strings.  It may even
briefly have been an issued prescription, until common sense prevailed.
I do remember a sinking feeling when I thought I would have to change
my own handling of bogus UTF-8, only to be relieved later when it
became mere best practice.  However, it is not uncommon for coding
standards to prescribe 'best practice'.

Richard.


More information about the Unicode mailing list