Question about Perl5 extended UTF-8 design

Thu Nov 5 09:57:16 CST 2015

Hi,

Several of us are wondering about the reason for reserving bits for the 
extended UTF-8 in perl5.  I'm asking you because you are the apparent 
author of the commits that did this.

To refresh your memory, in perl5 UTF-8, a start byte of 0xFF causes the 
length of the sequence of bytes that comprise a single character to be 
13 bytes.  This allows code points up to 2**72 - 1 to be represented. 
If the length had been instead 12 bytes, code points up to 2**66 - 1 
could be represented, which is enough to represent any code point 
possible in a 64-bit word.

The comments indicate that these extra bits are "reserved".  So we're 
wondering what potential use you had thought of for these bits.

Thanks

Karl Williamson