Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Doug Ewell via Unicode unicode at unicode.org
Wed May 17 17:31:56 CDT 2017


Richard Wordingham wrote:

> So it was still a legal way for a non-UTF-8-compliant process!

Anything is possible if you are non-compliant. You can encode U+263A
with 9,786 FF bytes followed by a terminating FE byte and call that
"UTF-8," if you are willing to be non-compliant enough.

> Note for example that a compliant implementation of full upper-casing
> shall convert the canonically equivalent strings <U+1FB3 GREEK SMALL
> LETTER ALPHA WITH YPOGEGRAMMENI, U+0313 COMBINING COMMA ABOVE> and
> <U+1F00 GREEK SMALL LETTER ALPHA WITH PSILI, U+0345 COMBINING GREEK
> YPOGEGRAMMENI> to the canonically inequivalent strings <U+0391 GREEK
> CAPITAL LETTER ALPHA, U+0399 GREEK CAPITAL LETTER IOTA, U+0313> and
> <U+1F08 GREEK CAPITAL LETTER ALPHA WITH PSILI, 0399 GREEK CAPITAL
> LETTER IOTA>. A compliant Unicode process may not assume that this is
> the right thing to do. (Or are some compliant Unicode processes
> required to incorrectly believe that they are doing something they
> mustn't do?)

I'm afraid I don't get the analogy.
 
--
Doug Ewell | Thornton, CO, US | ewellic.org




More information about the Unicode mailing list