Running out of code points, redux (was: Re: Feedback on the proposal...)
Richard Wordingham via Unicode
unicode at unicode.org
Thu Jun 1 20:45:29 CDT 2017
On Thu, 1 Jun 2017 17:10:54 -0700
Ken Whistler via Unicode <unicode at unicode.org> wrote:
> Well, working from the *current* specification:
>
> FC 80 80 80 80 80
> and
> FF FF FF FF FF FF
>
> are equal trash, uninterpretable as *anything* in UTF-8.
>
> By definition D39b, either sequence of bytes, if encountered by an
> conformant UTF-8 conversion process, would be interpreted as a
> sequence of 6 maximal subparts of an ill-formed subsequence.
There is a very good argument that 0xFC and 0xFF are not code units
(D77) - they are not used in the representation of any Unicode scalar
values. By that argument, you have 5 maximal subparts and seven
garbage bytes.
Richard.
More information about the Unicode
mailing list