Use of tag characters in a private encoding - is it valid please?
Erik Carvalhal Miller
ecm.unicode at gmail.com
Tue Apr 30 23:19:42 CDT 2024
On Mon, Apr 29, 2024 at 2:13 PM William_J_G Overington via Unicode <
unicode at corp.unicode.org> wrote:
> I consider that the phrase "private agreement" in The Unicode Standard
is, well, not. the whole situation, as it is perfectly possible for on
person to produce and publish a document declaring some meanings and/or
glyphs. So while for anyone else to apply those meanings and/or glyphs does
imply at least a tacit, temporary, like watching a science fiction movie
suspension of disbelief, sort of agreement, it is not the almost formal
contractual situation that The Unicode Standard could be reasonably thought
to be writing about.
>
> https://www.unicode.org/versions/Unicode15.0.0/ch23.pdf page 23 of the
PDF document
The section you cite does not support the obligation of an “almost formal
contractual situation”. One of Unicodeʼs online FAQ pages (
https://www.unicode.org/faq/private_use.html) has this to say:
>> Q: What does "private agreement among cooperating parties" mean?
>>
>> A "private agreement" simply refers to the fact that agreement about the
interpretation of some set of private-use characters is done privately,
outside the context of the standard. The Unicode Standard does not specify
any particular interpretation for any private-use character. There is no
implication that a private agreement necessarily has any contractual or
other legal status—it is simply an agreement between two or more parties
about how a particular set of private-use characters should be interpreted.
>>
>> Q: How would I define a private agreement?
>>
>> One can share, or even publish, documentation containing particular
assignments for private-use characters, their glyphs, and other relevant
information about their interpretation. One can then ask others to use
those private-use characters as documented. One can create appropriate
fonts and IMEs, or request that others do so.
On Mon, Apr 29, 2024 at 2:13 PM William_J_G Overington via Unicode <
unicode at corp.unicode.org> wrote this too:
> A font with visible glyphs for tag characters will be helpful for
composing sequences and could also be useful for finding the meaning of
sequences that are not supported by any font available to the particular
end user.
>
> > since in this case itʼs not likely that the PUA character would even be
recognized as an emoji, the fallback you saw is the best‐case scenario one
can expect in the absence of a private‐use agreement.
>
> Well, I was not restricting myself to emoji in applying the technique of
using U+10FFFD followed by a sequence of tag characters of which the final
one is a CANCEL TAG. Emoji sometimes, yet other things too.
That same chapter you linked to, in §23.9 (“Tag Characters”), specifies two
usages for tag characters: (1) the now‐deprecated language tagging that was
their original purpose and (2) emoji tag sequences, as further specified in
UTS #51 (as I brought up earlier). You began this thread by asking about
validity; my reading is no, a non‐emoji private‐use tag sequence is not
valid according to the Standard. (Nevertheless, you might get it to
function anyway.)
Itʼs not clear why you would want to use tag sequences (emoji or
otherwise). The 137,468 private‐use code points available are well suited
for specialty characters. The fallback of having your specialty font(s)
visibly display the tag characters of a (private‐use) well‐formed but
unrecognized tag sequence, though possibly useful, not only perverts the
notion that tag characters are supposed to be invisible in normal rendering
but also sets up a needlessly inconsistent system. If itʼs important and
appropriate for end users to see a fallback display resembling the Basic
Latin repertoire, then why not use the Basic Latin characters, so that end
users without the benefit of a special font can see them? If itʼs not
appropriate or important, then why make the sequence characters visible in
fallback at all (outside special modes such as composition or “show
hidden”)? And if the sequence pieces arenʼt to be seen, why use a sequence
at all (especially an invalid one), instead of individual private‐use code
points? The tag characters seem like a needless complication.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240501/a61d309a/attachment.htm>
More information about the Unicode
mailing list