Private Use areas

Thu Aug 23 04:31:34 CDT 2018

On Wed, 22 Aug 2018 11:58:58 +0200
Philippe Verdy via Unicode <unicode at unicode.org> wrote:

> For now there's still no way to have variant sequences unless they are
> registered and standardized by Unicode but registration should be not
> needed (forbidden) for sequences containing PUV.

I believe this scheme is no worse than hack encodings that using Latin
character codes for other characters.  These schemes often work.
(Indeed, the currently best method of getting Tai Tham displayed as rich
text that I can find is to use a transliteration-type encoding and a
special font, though I can now get pretty close using the proper
character codes in the order laid down in the proposals.)

The major problems I can see with appropriating variation sequences
are:
(1) It might be restricted to base characters - I have no
experimental evidence on whether this would happen.  Fonts can happily
convert base characters to combining characters, though this works
best if Latin line-breaking rules take effect.

(2) The appropriated variation sequence might be assigned a meaning -
but this is no worse than the general ambiguity of PUA characters.

(3) Some base characters get special treatment.  For example, I had
to change my transliteration scheme because hyphen-minus is treated
specially by MS Edge - I was using it as a digraph disjunctor - and
so clusters were not being formed.  In this case, I would have come
unstuck as soon as line-wrapping started, so it was a bad choice anyway.

Or are there significant renderers that deliberately ignore variation
selectors in unregistered, unstandardised variation sequences?  I don't
recall any problems from when we were discussing variation
sequences for chess pieces.

For supplementing a script, it might be best to start at
VARIATION-SELECTOR-256, and work down if need be with specialist
characters.

Richard.