Encoding character information for characters of a Private Use Area use (from Re: UCD in XML or in CSV?)

William_J_G Overington via Unicode unicode at unicode.org
Mon Sep 3 04:26:38 CDT 2018


Janusz S. Bien wrote:

> Last but not least, let me remind that the thread was started by a question what is the most convenient way to describe the properties of PUA characters.

>From what I have learned during the time period of the discussion it seems to me that using JSON would be a good idea.

http://www.unicode.org/mail-arch/unicode-ml/y2018-m08/0144.html

http://www.unicode.org/mail-arch/unicode-ml/y2018-m08/0145.html

It appears that all that is needed is to define an object named PUAINFO and then put the name PUAINFO inside quotation marks and then define the object in whatever JSON way one chooses to do it.

For example, one could have an array of values, one or more of which could be a string listing a PUA (Private Use Area) code point or a range of PUA code points. For examples, "$E001" and "$E100..$E17F", together with strings containing other information.

One such string, maybe the first after the colon, whether or not within an array, could be a description of the particular Private Use Area use that the particular file supports. 

Using JSON would mean that the format would be independent of any particular programming language and could be designed to be straightforwardly read by humans as well.

>From reading the documents I think that the structure may start as follows, though I am not congruently sure of the matter at this time.

{"PUAINFO":

There are then various ways to proceed, such as for example having everything in one array, or for example having many names each of which has data.

Having many names each of which has data may well look more elegant in a print out and be more easily read by humans, yet having everything in one array in a known order may mean that getting the format implemented in software applications might be easier and thus more likely to happen.  

Whichever way it is done, then provided it is done rigorously, a format which becomes implemented widely in applications would be a contribution of lasting value.  

William Overington

Monday 3 September 2018




More information about the Unicode mailing list