Private Use areas
Mark E. Shoulson via Unicode
unicode at unicode.org
Mon Aug 27 19:44:57 CDT 2018
But there's nothing wrong with proposing a higher-level protocol;
indeed, that's what Ken Whistler was saying: you need a protocol to
transmit this information. It's metadata, so it will perforce be a
higher-level protocol of some kind, whether transmitting actually
out-of-band or reserving a piece of the file for metadata. That's
fine. I'm not sure what the advantage is of using circled characters
instead of plain old ascii. You have to set off your reserved area
somehow, and I don't think using circled chars is the least obtrusive
way to do it. You could use XML; that would be pretty well-suited to
the task, but maybe it's overkill. If all you need is to reference some
"standard" PUA interpretation (per James Kass' take on this, not William
Overington's), then just a header like "[PUA00001]" would work just
fine. (Compare emacs with things like "-*- encoding: utf-8 -*-" or
For larger chunks of meta-info, XML might be a good choice, but even
then, it could be an XML *header* to an otherwise ordinary text file.
Yes, you'd have to delimit it somehow, and probably have a top header (a
"magic number") to signal the protocol, but that's doable. For
applications not supporting this protocol, such a setup is probably
easier for the eye to skip past (even if it's long) than a bunch of
A protocol like that is outside of Unicode's scope (just like XML is),
but it's certainly something you could write up and try to standardize
and get used, with or without the support of ISO. People are coming up
with file formats all the time (and if you really want to used circled
characters, go ahead. That's something for you to consider in the
design phase of the project).
On 08/27/2018 05:20 PM, Rebecca Bettencourt via Unicode wrote:
> > That sounds like a non-conformant use of characters in
> the U+24xx block.
> Well, you are an expert on these things and I do not
> understand as to with what it would be non-conformant.
> A conformant process must interpret ⓅⓊⒶⒹⒶⓉⒶ as the characters ⓅⓊⒶⒹⒶⓉⒶ
> and not as a signal to process what follows as anything other than
> plain text.
> What you are proposing is a higher-level protocol, whether you realize
> it or not. Unfortunately your higher-level protocol has a serious flaw
> in that it cannot represent the string "ⓅⓊⒶⒹⒶⓉⒶ". Also, seeing a bunch
> of circled alphanumeric characters in a document ⓘⓢ◯ⓕⓐⓡ◯ⓕⓡⓞⓜ◯ⓤⓝⓞⓑⓣⓡⓤⓢⓘⓥⓔ.
> There are plenty of already-existing higher-level protocols (you
> mentioned one: XML) that could be used to provide information about
> PUA characters, and they are all much better suited to that purpose
> than what you are proposing.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode