Private Use areas
Mark E. Shoulson via Unicode
unicode at unicode.org
Fri Aug 31 15:11:44 CDT 2018
On 08/28/2018 04:26 AM, William_J_G Overington via Unicode wrote:
> Mark E. Shoulson wrote:
>> I'm not sure what the advantage is of using circled characters instead of plain old ascii.
> My thinking is that "plain old ascii" might be used in the text encoded in the file. Sometimes a file containing Private Use Area characters is a mix of regular Unicode Latin characters with just a few Private Use Area characters mixed in with them. So my suggestion of using circled characters is for disambiguation purposes. The circled characters in the PUAINFO sequence would not be displayed if a special software program were being used to read in the text file, then act upon the information that is encoded using the circled characters.
What if circled characters are used in the text encoded in the file?
They're characters too, people use them and all. Whenever you designate
some characters to be used in a way outside their normal meaning, you
have the problem of how to use them *with* their normal meaning. So
there are various escaping schemes and all. So in XML, all characters
have their normal meanings—except <, >, and &, which mean something
special and change the interpretations of other nearby characters (so
"bold" is a word in English that appears in the text, but "<bold>" is
part of an instruction to the renderer that doesn't appear in the
text.) And the price is that those three characters have to be
expressed differently (< > &). I don't really see what you
gain by branding some large swath of unicode ("circled characters") as
"special" and not meaning their usual selves, and for that matter making
these hard-to-type characters *necessary* for using your scheme, when
you could do something like what XML does, and say "everything between <
and > is to be interpreted specially, and there, these characters have
the following meanings" and then have some other way of expressing those
two reserved characters. (not saying you need to do it XML's way, but
something like that: reserve a small number of characters that have to
be escaped, not some huge chunk.)
> My thinking is that using this method just adds some encoded information at the start of the text file and does not require the whole document to become designated as a file conformant to a particular markup format.
That's another way of saying that this is a markup format which accepts
a large variety of plain texts. Because you ARE talking about making a
"particular markup format," just a different and new one.
I guess there's not even any reason for me to argue the point, though,
since it is up to you how to design your markup language, and you can
take advice (or not) from anyone you like. Draw up some design, find
some interested people, start a discussion, and work it out. (but not
here; this list is for discussing Unicode.)
More information about the Unicode