Use of tag characters in a private encoding - is it valid please?
Peter Constable
pgcon6 at msn.com
Mon May 6 14:29:01 CDT 2024
In general (in my understanding at least), "protocol" means a documented specification for data representation or process interaction (APIs, file formats, structured message content...) that different parties can use for interoperability. (For example, see https://learn.microsoft.com/en-us/openspecs/windows_protocols). In that sense, for example, SIL's documentation of their use of PUA (https://scripts.sil.org/cms/scripts/page.php?id=pua_home&site_id=nrsi) would be considered protocol documentation.
Perhaps what Asmus was reacting to was the mention of "higher-level". I understand you to mean _defined externally to Unicode_. But I think more common use of that term would be in relation to some _application of Unicode text encoding_ involving more than plain text. So, in relation to Unicode PUA, a private agreement on semantics of PUA code points would comprise a protocol, but not a _higher-level_ protocol.
Peter
-----Original Message-----
From: Unicode <unicode-bounces at corp.unicode.org> On Behalf Of James Kass via Unicode
Sent: Friday, May 3, 2024 12:59 PM
To: unicode at corp.unicode.org
Subject: Re: Use of tag characters in a private encoding - is it valid please?
On 2024-05-03 12:29 AM, Asmus Freytag via Unicode wrote:
> On 5/2/2024 4:25 PM, James Kass via Unicode wrote:
>> Wouldn’t this kind of private use agreement be considered a higher
>> level protocol?
>
> No. You can agree to use a font that displays a certain glyph at a
> certain PUA position. That's a private agreement, but not a "higher
> level protocol". The way I like to think about it, PUA characters, in
> contrast to images inserted into the flown text, constitute plain text
> (as long as you don't append the font selection instructions via some
> private tag, e.g. <font pua="use-this.ttf">.
Maybe we're talking about different things. Of course PUA characters are plain-text by definition. Even when people map all kinds of non-textual items to the PUA. But I'm referring to the substitution of a glyph/image for a string of plain-text characters. This sort of thing is very common in fonts.
Any private agreement is an alternate protocol regardless of its altitude. I consider this kind of agreement (substitution of a text string with something different) to be "higher level" because it's over-and-above.
>>
>> [HTML]
>> Yadda yadda <img src="aardvark.jpg"> et cetera.
>>
>> [tags shown using encircled alphanumerics] Yadda yadda 🆔Ⓠ④⑥②①② et
>> cetera.
> The minute you agree to show different glyphs for non-PUA characters,
> you are no longer simply conforming to Unicode.
Sorry for not understanding this. Both examples above involve the computer system substituting an image/glyph for a string of text. Both examples should be considered conformant. In either case, the underlying encoded text does not get changed. The higher level protocol only affects how that text is displayed.
> If you create elaborate conventions for the use of tag > characters you are creating a markup language. It's no > different from re-using ASCII characters for syntax > in addition to text.
It's also true when re-using any text characters, public or private, for the same purpose.
More information about the Unicode
mailing list