Question about Perl5 extended UTF-8 design
public at khwilliamson.com
Thu Nov 5 09:57:16 CST 2015
Several of us are wondering about the reason for reserving bits for the
extended UTF-8 in perl5. I'm asking you because you are the apparent
author of the commits that did this.
To refresh your memory, in perl5 UTF-8, a start byte of 0xFF causes the
length of the sequence of bytes that comprise a single character to be
13 bytes. This allows code points up to 2**72 - 1 to be represented.
If the length had been instead 12 bytes, code points up to 2**66 - 1
could be represented, which is enough to represent any code point
possible in a 64-bit word.
The comments indicate that these extra bits are "reserved". So we're
wondering what potential use you had thought of for these bits.
More information about the Unicode