Running out of code points, redux (was: Re: Feedback on the proposal...)

Mon Jun 5 07:37:16 CDT 2017

On Mon, 5 Jun 2017 13:08:06 +0900
"Martin J. Dürst via Unicode" <unicode at unicode.org> wrote:

> On 2017/06/02 04:54, Doug Ewell via Unicode wrote:
> > Richard Wordingham wrote:
> >   
> >> even supporting 6-byte patterns just in case 20.1 bits eventually
> >> turn out not to be enough,  
> 
> Sorry to be late with this, but if 20.1 bits turn out to not be
> enough, what about 21 bits?
> 
> That would still limit UTF-8 to four bytes, but would almost double
> the code space. Assuming (conservatively) that it will take about a
> century to fill up all 17 (well, actually 15, because two are
> private) planes, this would give us another century.

It all depends on how the lead byte is parsed.  With a block-if
construct ignorant of the original design or a look-up table, it may be
simplest to treat F5 onwards as out and out errors and not expect any
trailing bytes.  Code handling attempts at 6-byte code points
was the most complex case.  Of course, one **might** want to handle a
list of mostly small positive integers, at which point the old UTF-8
design might be useful.

Richard.