Corrigendum #9

Richard Wordingham richard.wordingham at ntlworld.com
Mon Jun 2 18:33:58 CDT 2014


On Mon, 2 Jun 2014 15:09:21 -0700
David Starner <prosfilaes at gmail.com> wrote:

> So certain programs can't use noncharacters internally because some
> people want to interchange them? That doesn't seem like what
> noncharacters should be used for.

Much as I don't like their uninvited use, it is possible to pass them
and other undesirables through most applications by a slight bit of
recoding at the application's boundaries.  Using 99 = (3 + 32 + 64) PUA
characters, one can ape UTF-16 surrogates and encode:

32 × 64 pairs for lone surrogates
 1 × 64 pairs to replace some of the PUA characters
 1 × 35 pairs to replace the rest of the PUA characters
 1 ×  4 pairs for incoming FFFC to FFFF
 1 × 32 pairs for the other BMP non-characters
 1 × 32 pairs for the supplementary plane non-characters.

This then frees up non-characters for the application's use.

Richard.



More information about the Unicode mailing list