Use of tag characters in a private encoding - is it valid please?
Peter Constable
pgcon6 at msn.com
Wed May 1 19:47:05 CDT 2024
A “private agreement” can be as simple as one party saying, “Use [such-and-such] font to view this content,” and another party using that font to view the content. There doesn’t even need to be any direct interaction between the two parties.
Peter
From: Unicode <unicode-bounces at corp.unicode.org> on behalf of Erik Carvalhal Miller via Unicode <unicode at corp.unicode.org>
Date: Tuesday, April 30, 2024 at 9:29 PM
To: William_J_G Overington <wjgo_10009 at btinternet.com>
Cc: unicode at corp.unicode.org <unicode at corp.unicode.org>
Subject: Re: Use of tag characters in a private encoding - is it valid please?
On Mon, Apr 29, 2024 at 2:13 PM William_J_G Overington via Unicode <unicode at corp.unicode.org<mailto:unicode at corp.unicode.org>> wrote:
> I consider that the phrase "private agreement" in The Unicode Standard is, well, not. the whole situation, as it is perfectly possible for on person to produce and publish a document declaring some meanings and/or glyphs. So while for anyone else to apply those meanings and/or glyphs does imply at least a tacit, temporary, like watching a science fiction movie suspension of disbelief, sort of agreement, it is not the almost formal contractual situation that The Unicode Standard could be reasonably thought to be writing about.
>
> https://www.unicode.org/versions/Unicode15.0.0/ch23.pdf page 23 of the PDF document
The section you cite does not support the obligation of an “almost formal contractual situation”. One of Unicodeʼs online FAQ pages (https://www.unicode.org/faq/private_use.html) has this to say:
>> Q: What does "private agreement among cooperating parties" mean?
>>
>> A "private agreement" simply refers to the fact that agreement about the interpretation of some set of private-use characters is done privately, outside the context of the standard. The Unicode Standard does not specify any particular interpretation for any private-use character. There is no implication that a private agreement necessarily has any contractual or other legal status—it is simply an agreement between two or more parties about how a particular set of private-use characters should be interpreted.
>>
>> Q: How would I define a private agreement?
>>
>> One can share, or even publish, documentation containing particular assignments for private-use characters, their glyphs, and other relevant information about their interpretation. One can then ask others to use those private-use characters as documented. One can create appropriate fonts and IMEs, or request that others do so.
On Mon, Apr 29, 2024 at 2:13 PM William_J_G Overington via Unicode <unicode at corp.unicode.org<mailto:unicode at corp.unicode.org>> wrote this too:
> A font with visible glyphs for tag characters will be helpful for composing sequences and could also be useful for finding the meaning of sequences that are not supported by any font available to the particular end user.
>
> > since in this case itʼs not likely that the PUA character would even be recognized as an emoji, the fallback you saw is the best‐case scenario one can expect in the absence of a private‐use agreement.
>
> Well, I was not restricting myself to emoji in applying the technique of using U+10FFFD followed by a sequence of tag characters of which the final one is a CANCEL TAG. Emoji sometimes, yet other things too.
That same chapter you linked to, in §23.9 (“Tag Characters”), specifies two usages for tag characters: (1) the now‐deprecated language tagging that was their original purpose and (2) emoji tag sequences, as further specified in UTS #51 (as I brought up earlier). You began this thread by asking about validity; my reading is no, a non‐emoji private‐use tag sequence is not valid according to the Standard. (Nevertheless, you might get it to function anyway.)
Itʼs not clear why you would want to use tag sequences (emoji or otherwise). The 137,468 private‐use code points available are well suited for specialty characters. The fallback of having your specialty font(s) visibly display the tag characters of a (private‐use) well‐formed but unrecognized tag sequence, though possibly useful, not only perverts the notion that tag characters are supposed to be invisible in normal rendering but also sets up a needlessly inconsistent system. If itʼs important and appropriate for end users to see a fallback display resembling the Basic Latin repertoire, then why not use the Basic Latin characters, so that end users without the benefit of a special font can see them? If itʼs not appropriate or important, then why make the sequence characters visible in fallback at all (outside special modes such as composition or “show hidden”)? And if the sequence pieces arenʼt to be seen, why use a sequence at all (especially an invalid one), instead of individual private‐use code points? The tag characters seem like a needless complication.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240502/3a9f0c4c/attachment.htm>
More information about the Unicode
mailing list