Private Use areas

Mark E. Shoulson via Unicode unicode at unicode.org
Mon Aug 27 19:44:57 CDT 2018


But there's nothing wrong with proposing a higher-level protocol; 
indeed, that's what Ken Whistler was saying: you need a protocol to 
transmit  this information.  It's metadata, so it will perforce be a 
higher-level protocol of some kind, whether transmitting actually 
out-of-band or reserving a piece of the file for metadata.  That's 
fine.  I'm not sure what the advantage is of using circled characters 
instead of plain old ascii.  You have to set off your reserved area 
somehow, and I don't think using circled chars is the least obtrusive 
way to do it.  You could use XML; that would be pretty well-suited to 
the task, but maybe it's overkill.  If all you need is to reference some 
"standard" PUA interpretation (per James Kass' take on this, not William 
Overington's), then just a header like "[PUA00001]" would work just 
fine.  (Compare emacs with things like "-*- encoding: utf-8 -*-" or 
whatever.)

For larger chunks of meta-info, XML might be a good choice, but even 
then, it could be an XML *header* to an otherwise ordinary text file.  
Yes, you'd have to delimit it somehow, and probably have a top header (a 
"magic number") to signal the protocol, but that's doable.  For 
applications not supporting this protocol, such a setup is probably 
easier for the eye to skip past (even if it's long) than a bunch of 
circled letters.

A protocol like that is outside of Unicode's scope (just like XML is), 
but it's certainly something you could write up and try to standardize 
and get used, with or without the support of ISO. People are coming up 
with file formats all the time (and if you really want to used circled 
characters, go ahead.  That's something for you to consider in the 
design phase of the project).

~mark


On 08/27/2018 05:20 PM, Rebecca Bettencourt via Unicode wrote:
>
>             > That sounds like a non-conformant use of characters in
>             the U+24xx block.
>
>             Well, you are an expert on these things and I do not
>             understand as to with what it would be non-conformant.
>
>
> A conformant process must interpret ⓅⓊⒶⒹⒶⓉⒶ as the characters ⓅⓊⒶⒹⒶⓉⒶ 
> and not as a signal to process what follows as anything other than 
> plain text.
>
> What you are proposing is a higher-level protocol, whether you realize 
> it or not. Unfortunately your higher-level protocol has a serious flaw 
> in that it cannot represent the string "ⓅⓊⒶⒹⒶⓉⒶ". Also, seeing a bunch 
> of circled alphanumeric characters in a document ⓘⓢ◯ⓕⓐⓡ◯ⓕⓡⓞⓜ◯ⓤⓝⓞⓑⓣⓡⓤⓢⓘⓥⓔ.
>
> There are plenty of already-existing higher-level protocols (you 
> mentioned one: XML) that could be used to provide information about 
> PUA characters, and they are all much better suited to that purpose 
> than what you are proposing.
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180827/4e09141f/attachment.html>


More information about the Unicode mailing list