Use of tag characters in a private encoding - is it valid please?
James Kass
jameskass at code2001.com
Mon May 6 18:21:54 CDT 2024
On 2024-05-06 7:29 PM, Peter Constable via Unicode wrote:
> Perhaps what Asmus was reacting to was the mention of "higher-level". I understand you to mean _defined externally to Unicode_. But I think more common use of that term would be in relation to some _application of Unicode text encoding_ involving more than plain text. So, in relation to Unicode PUA, a private agreement on semantics of PUA code points would comprise a protocol, but not a _higher-level_ protocol.
>
My phrasing may have been inept. For single PUA characters, or even
strings of PUA characters, private agreements are not higher level
because PUA characters are supposed to be defined by private agreement.
It's when PUA (or even non-PUA) characters are modified by tag
characters as part of a private agreement that the scheme becomes higher
level. As Asmus pointed out, this is essentially a private agreement
for mark-up.
Asmus wrote, "If you create elaborate conventions for the use of tag
characters you are creating a markup language. It's no different from
re-using ASCII characters for syntax in addition to text."
The question posed in the thread subject seems to have been answered by
Asmus Freytag.
PUA(1) + ZWJ + PUA(2) = a ligature glyph combining PUA(1) with PUA(2)
- that's legit. Not higher level.
PUA(1) + a string of tag characters = something completely different.
- higher level. Even though this can be handled at the font/font engine
level.
So, if we're on the same page,
1) U+10FFFD followed by the tag versions of !313125 and a CANCEL TAG.
2) COMET plus CIRCUMFLEX followed by the ASCII string "!313125"
... both examples represent a private agreement mark-up, and Unicode
shouldn't care.
More information about the Unicode
mailing list