Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8
Karl Williamson via Unicode
unicode at unicode.org
Tue May 30 17:32:34 CDT 2017
On 05/30/2017 02:30 PM, Doug Ewell via Unicode wrote:
> L2/17-168 says:
>
> "For UTF-8, recommend evaluating maximal subsequences based on the
> original structural definition of UTF-8, without ever restricting trail
> bytes to less than 80..BF. For example: <C0 AF> is a single maximal
> subsequence because C0 was originally a lead byte for two-byte
> sequences."
>
> When was it ever true that C0 was a valid lead byte? And what does that
> have to do with (not) restricting trail bytes?
Until TUS 3.1, it was legal for UTF-8 parsers to treat the sequence
<C0 AF> as U+002F.
More information about the Unicode
mailing list