Private Use areas

Ken Whistler via Unicode unicode at unicode.org
Tue Aug 21 13:03:41 CDT 2018


On 8/21/2018 7:56 AM, Adam Borowski via Unicode wrote:
> On Mon, Aug 20, 2018 at 05:17:21PM -0700, Ken Whistler via Unicode wrote:
>> On 8/20/2018 5:04 PM, Mark E. Shoulson via Unicode wrote:
>>> Is there a block of RTL PUA also?
>> No.
> Perhaps there should be?

This is a periodic suggestion that never goes anywhere--for good reason. 
(You can search the email archives and see that it keeps coming up.)

Presuming that this question was asked in good faith...

>
> What about designating a part of the PUA to have a specific property?

The problem with that is that assigning *any* non-default property to 
any PUA code point would break existing implementations' assumptions 
about PUA character properties and potentially create havoc with 
existing use.

> Only certain properties matter enough:

That is an un-demonstrated assertion that I don't think you have thought 
through sufficiently.

> * wide
> * RTL

RTL is not some binary counterpart of LTR. There are 23 values of 
Bidi_Class, and anyone who wanted to implement a right-to-left script in 
PUA might well have to make use of multiple values of Bidi_Class. Also, 
there are two major types of strong right-to-leftness: Bidi_Class=R and 
Bidi_Class=AL. Should a "RTL PUA" zone favor Arabic type behavior or 
non-Arabic type behavior?

> * combining

Also not a binary switch. Canonical_Combining_Class is a numeric value, 
and any value but ccc=0 for a PUA character would break normalization. 
Then for the General_Category, there are three types of "marks" that 
count as combining: gc=Mn, gc=Mc, gc=Me. Which of those would be favored 
in any PUA assignment?

> as most others are better represented in the font itself.

Really? Suppose someone wants to implement a bicameral script in PUA. 
They would need case mappings for that, and how would those be "better 
represented in the font itself"? Or how about digits? Would numeric 
values for digits be "better represented in the font itself"? How about 
implementation of punctuation? Would segmentation properties and 
behavior be "better represented in the font itself"?

>
> This could be done either by parceling one of existing PUA ranges: planes 15
> and 16 are virtually unused thus any damage would be negligible;

That is simply an assertion -- and not the kind of assertion that the 
UTC tends to accept on spec. I rather suspect that there are multiple 
participants on this email list, for example, who *do* have 
implementations making extensive use of Planes 15/16 PUA code points for 
one thing or another.

>   or perhaps
> by allocating a new range elsewhere.
See:

https://www.unicode.org/policies/stability_policy.html

The General_Category property value Private_Use (Co) is immutable: the 
set of code points with that value will never change.

That guarantee has been in place since 1996, and is a rule that binds 
the UTC. So nope, sorry, no more PUA ranges.
> Meow!

Grrr! ;-)

As I see it, the only feasible way for people to get specialized 
behavior for PUA ranges involves first ceasing to assume that somehow 
they can jawbone the UTC into *standardizing* some ranges for some 
particular use or another. That simply isn't going to happen. People who 
assume this is somehow easy, and that the UTC are a bunch of boneheads 
who stand in the way of obvious solutions, do not -- I contend -- 
understand the complicated interplay of character properties, stability 
guarantees, and implementation behavior baked into system support 
libraries for the Unicode Standard.

The way forward for folks who want to do this kind thing is:

1. Define a *protocol* for reliable interchange of custom character 
property information about PUA code points.

2. Convince more than one party to actually *use* that protocol to 
define sets of interchangeable character property definitions.

3. Convince at least one implementer to support that protocol to create 
some relevant interchangeable *behavior* for those PUA characters.

And if the goal for #3 is to get some *system* implementer to support 
the protocol in widespread software, then before starting any of #1, #2, 
or #3, you had better start instead with:

0. Create a consortium (or other ongoing organization) with a 10-year 
time horizon and participation by at least one major software 
implementer, to define, publicize, and advocate for support of the 
protocol. (And if you expect a major software implementer to 
participate, you might need to make sure you have a business case 
defined that would warrant such a 10-year effort!)

--Ken

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180821/98582cf9/attachment.html>


More information about the Unicode mailing list