Use of tag characters in a private encoding - is it valid please?
Asmus Freytag
asmusf at ix.netcom.com
Thu May 2 19:29:36 CDT 2024
On 5/2/2024 4:25 PM, James Kass via Unicode wrote:
>
> On 2024-05-02 9:05 PM, Asmus Freytag via Unicode wrote:
>> PS: you are free to solicit other parties to join such private
>> agreements and you may even choose to write them down. However, it's
>> up to you to resolve any issues due to non-compliance with your
>> private agreements. Unicode doesn't care -- as long as you don't
>> agree to things that conflict with conformance to the Standard. In
>> which case, such any conformance by participants in your agreement
>> may no longer be valid.
>
> Wouldn’t this kind of private use agreement be considered a higher
> level protocol?
No. You can agree to use a font that displays a certain glyph at a
certain PUA position. That's a private agreement, but not a "higher
level protocol". The way I like to think about it, PUA characters, in
contrast to images inserted into the flown text, constitute plain text
(as long as you don't append the font selection instructions via some
private tag, e.g. <font pua="use-this.ttf">.
>
> [HTML]
> Yadda yadda <img src="aardvark.jpg"> et cetera.
>
> [tags shown using encircled alphanumerics]
> Yadda yadda 🆔Ⓠ④⑥②①② et cetera.
The minute you agree to show different glyphs for non-PUA characters,
you are no longer simply conforming to Unicode. At least, as long as
those glyphs aren't already associated as alternate glyphs to the given
character by ordinary practice. Using Fraktur glyphs for Latin
characters is very much conformant for that reason.
>
> There’s nothing stopping folks from putting out fonts with glyphs
> covering large sets of images using QID numbers expressed as tag
> characters (or even as enclosed alphanumerics) and treating them as
> ligature substitutions. The same goes for any non-QID strings, as well.
>
> Yet both of the examples above can be considered mark-up languages
> which use elements of text. Which may explain why “Unicode doesn’t
> care” about such private agreements. Because they are beyond the
> realm of plain-text.
>
If you create elaborate conventions for the use of tag characters you
are creating a markup language. It's no different from re-using ASCII
characters for syntax in addition to text. The same is true for
repurposing the control codes. Especially, if your syntax allows
parameters that are using non-control code characters. They are not SGML
style markup, but they constitute markup in a most general sense.
The way markup languages are conformant with Unicode is that they
identify those text runs that are plain text unicode and those text runs
where code points have syntactic functions.
A./
More information about the Unicode
mailing list