Private Use areas

Mark E. Shoulson via Unicode unicode at unicode.org
Fri Aug 31 15:11:44 CDT 2018


On 08/28/2018 04:26 AM, William_J_G Overington via Unicode wrote:
> Hi
>   
> Mark E. Shoulson wrote:
>   
>> I'm not sure what the advantage is of using circled characters instead of plain old ascii.
>   
> My thinking is that "plain old ascii" might be used in the text encoded in the file. Sometimes a file containing Private Use Area characters is a mix of regular Unicode Latin characters with just a few Private Use Area characters mixed in with them. So my suggestion of using circled characters is for disambiguation purposes. The circled characters in the PUAINFO sequence would not be displayed if a special software program were being used to read in the text file, then act upon the information that is encoded using the circled characters.

What if circled characters are used in the text encoded in the file?  
They're characters too, people use them and all.  Whenever you designate 
some characters to be used in a way outside their normal meaning, you 
have the problem of how to use them *with* their normal meaning.  So 
there are various escaping schemes and all.  So in XML, all characters 
have their normal meanings—except <, >, and &, which mean something 
special and change the interpretations of other nearby characters (so 
"bold" is a word in English that appears in the text, but "<bold>" is 
part of an instruction to the renderer that doesn't appear in the 
text.)  And the price is that those three characters have to be 
expressed differently (< > &).  I don't really see what you 
gain by branding some large swath of unicode ("circled characters") as 
"special" and not meaning their usual selves, and for that matter making 
these hard-to-type characters *necessary* for using your scheme, when 
you could do something like what XML does, and say "everything between < 
and > is to be interpreted specially, and there, these characters have 
the following meanings" and then have some other way of expressing those 
two reserved characters.  (not saying you need to do it XML's way, but 
something like that: reserve a small number of characters that have to 
be escaped, not some huge chunk.)
>   
> My thinking is that using this method just adds some encoded information at the start of the text file and does not require the whole document to become designated as a file conformant to a particular markup format.

That's another way of saying that this is a markup format which accepts 
a large variety of plain texts.  Because you ARE talking about making a 
"particular markup format," just a different and new one.

I guess there's not even any reason for me to argue the point, though, 
since it is up to you how to design your markup language, and you can 
take advice (or not) from anyone you like.  Draw up some design, find 
some interested people, start a discussion, and work it out.  (but not 
here; this list is for discussing Unicode.)

~mark


More information about the Unicode mailing list