Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8
Asmus Freytag via Unicode
unicode at unicode.org
Mon May 15 15:49:05 CDT 2017
On 5/15/2017 11:33 AM, Henri Sivonen via Unicode wrote:
>>> ICU uses UTF-16 as its in-memory Unicode representation, so ICU isn't
>>> representative of implementation concerns of implementations that use
>>> UTF-8 as their in-memory Unicode representation.
>>> ICU, etc.) that are stuck with UTF-16 as their in-memory
>>> representation, which makes concerns of such implementation very
>>> relevant, I think the Unicode Consortium should acknowledge that
>>> UTF-16 was, in retrospect, a mistake
>> You may think that. There are those of us who do not.
> My point is:
> The proposal seems to arise from the "UTF-16 as the in-memory
> representation" mindset. While I don't expect that case in any way to
> go away, I think the Unicode Consortium should recognize the serious
> technical merit of the "UTF-8 as the in-memory representation" case as
> having significant enough merit that proposals like this should
> consider impact to both cases equally despite "UTF-8 as the in-memory
> representation" case at present appearing to be the minority case.
> That is, I think it's wrong to view things only or even primarily
> through the lens of the "UTF-16 as the in-memory representation" case
> that ICU represents.
UTF-16 has some nice properties and there's not need to brand it a
"mistake". UTF-8 has different nice properties, but there's equally not
reason to treat it as more special than UTF-16.
The UTC should adopt a position of perfect neutrality when it comes to
assuming in-memory representation, in other words, not make assumptions
that optimizing for any encoding form will benefit implementers.
UTC, where ICU is strongly represented, needs to guard against basing
encoding/properties/algorithm decisions (edge cases mostly), solely or
primarily on the needs of a particular implementation that happens to be
chosen by the ICU project.
More information about the Unicode