Corrigendum #9
Richard Wordingham
richard.wordingham at ntlworld.com
Mon Jun 2 18:33:58 CDT 2014
On Mon, 2 Jun 2014 15:09:21 -0700
David Starner <prosfilaes at gmail.com> wrote:
> So certain programs can't use noncharacters internally because some
> people want to interchange them? That doesn't seem like what
> noncharacters should be used for.
Much as I don't like their uninvited use, it is possible to pass them
and other undesirables through most applications by a slight bit of
recoding at the application's boundaries. Using 99 = (3 + 32 + 64) PUA
characters, one can ape UTF-16 surrogates and encode:
32 × 64 pairs for lone surrogates
1 × 64 pairs to replace some of the PUA characters
1 × 35 pairs to replace the rest of the PUA characters
1 × 4 pairs for incoming FFFC to FFFF
1 × 32 pairs for the other BMP non-characters
1 × 32 pairs for the supplementary plane non-characters.
This then frees up non-characters for the application's use.
Richard.
More information about the Unicode
mailing list