Use of tag characters in a private encoding - is it valid please?

Asmus Freytag asmusf at ix.netcom.com
Thu May 2 19:29:36 CDT 2024


On 5/2/2024 4:25 PM, James Kass via Unicode wrote:
>
> On 2024-05-02 9:05 PM, Asmus Freytag via Unicode wrote:
>> PS: you are free to solicit other parties to join such private 
>> agreements and you may even choose to write them down. However, it's 
>> up to you to resolve any issues due to non-compliance with your 
>> private agreements. Unicode doesn't care -- as long as you don't 
>> agree to things that conflict with conformance to the Standard. In 
>> which case, such any conformance by participants in your agreement 
>> may no longer be valid.
>

> Wouldn’t this kind of private use agreement be considered a higher 
> level protocol?

No. You can agree to use a font that displays a certain glyph at a 
certain PUA position. That's a private agreement, but not a "higher 
level protocol". The way I like to think about it, PUA characters, in 
contrast to images inserted into the flown text, constitute plain text 
(as long as you don't append the font selection instructions via some 
private tag, e.g. <font pua="use-this.ttf">.
>
> [HTML]
> Yadda yadda <img src="aardvark.jpg"> et cetera.
>
> [tags shown using encircled alphanumerics]
> Yadda yadda 🆔Ⓠ④⑥②①② et cetera.
The minute you agree to show different glyphs for non-PUA characters, 
you are no longer simply conforming to Unicode. At least, as long as 
those glyphs aren't already associated as alternate glyphs to the given 
character by ordinary practice. Using Fraktur glyphs for Latin 
characters is very much conformant for that reason.
>
> There’s nothing stopping folks from putting out fonts with glyphs 
> covering large sets of images using QID numbers expressed as tag 
> characters (or even as enclosed alphanumerics) and treating them as 
> ligature substitutions.  The same goes for any non-QID strings, as well.
>
> Yet both of the examples above can be considered mark-up languages 
> which use elements of text.  Which may explain why “Unicode doesn’t 
> care” about such private agreements.  Because they are beyond the 
> realm of plain-text.
>
If you create elaborate conventions for the use of tag characters you 
are creating a markup language. It's no different from re-using ASCII 
characters for syntax in addition to text. The same is true for 
repurposing the control codes. Especially, if your syntax allows 
parameters that are using non-control code characters. They are not SGML 
style markup, but they constitute markup in a most general sense.

The way markup languages are conformant with Unicode is that they 
identify those text runs that are plain text unicode and those text runs 
where code points have syntactic functions.

A./




More information about the Unicode mailing list