Custom characters (was: Re: Private Use Area in Use)
richard.wordingham at ntlworld.com
Thu Jun 4 14:36:26 CDT 2015
On Thu, 04 Jun 2015 14:39:27 +0000
David Starner <prosfilaes at gmail.com> wrote:
> On Thu, Jun 4, 2015 at 6:09 AM John <idou747 at gmail.com> wrote:
> > Mostly just a matter of upgrading the character size.
> Which totally blows any concern with text size out of the water.
> Using 30 bytes to define certain very rare characters and 1 byte to
> define ASCII is way better then using 8 bytes to define all
The character size can be increased to 64 bits in such a way that no
new surrogates are required, current UTF-8 text remains UTF-8, current
UTF-16 text remains UTF-16 and current UTF-32 remains UTF-32, the
extended UTF-8 still has 8-bit code units, the extended UTF-16 still has
16-bit units, and the extended UTF-32 still has 32-bit code units. In
fact, the character size can be made unbounded.
The trick is to extend UTF-8 indefinitely, and then for UTF-16 and
UTF-32 repeat the idea of the UTF-8 scheme using sequences of two or
more low surrogates (or two or more high surrogates - one must chose)
much as UTF-8 uses bytes. Tom Bishop publicised the idea.
More information about the Unicode