Running out of code points, redux (was: Re: Feedback on the proposal...)

Thu Jun 1 16:39:12 CDT 2017

On Thu, 01 Jun 2017 12:54:45 -0700
Doug Ewell via Unicode <unicode at unicode.org> wrote:

> Richard Wordingham wrote:
> 
> > even supporting 6-byte patterns just in case 20.1 bits eventually
> > turn out not to be enough,  
> 
> Oh, gosh, here we go with this.

You were implicitly invited to argue that there was no need to handle
5 and 6 byte invalid sequences. 

> What will we do if 31 bits turn out not to be enough?

A compatible extension of UTF-16 to unbounded length has already been
designed.  Prefix bytes 0xFF can be used to extend the length for UTF-8
by 8 bytes at a time.  Extending UTF-32 is not beyond the wit of man,
and we know that UTF-16 could have been done better if the need had
been foreseen.

While it seems natural to hold a Unicode scalar value in a single
machine word of some length, this is not necessary, just highly
convenient.

In short, it won't be a big problem intrinsically.  The UCD may get a
bit unwieldy, which may be a problem for small systems without Internet
access.

Richard.