Question about Perl5 extended UTF-8 design
Karl Williamson
public at khwilliamson.com
Thu Nov 5 09:57:16 CST 2015
Hi,
Several of us are wondering about the reason for reserving bits for the
extended UTF-8 in perl5. I'm asking you because you are the apparent
author of the commits that did this.
To refresh your memory, in perl5 UTF-8, a start byte of 0xFF causes the
length of the sequence of bytes that comprise a single character to be
13 bytes. This allows code points up to 2**72 - 1 to be represented.
If the length had been instead 12 bytes, code points up to 2**66 - 1
could be represented, which is enough to represent any code point
possible in a 64-bit word.
The comments indicate that these extra bits are "reserved". So we're
wondering what potential use you had thought of for these bits.
Thanks
Karl Williamson
More information about the Unicode
mailing list