Use of tag characters in a private encoding - is it valid please?

William_J_G Overington wjgo_10009 at btinternet.com
Mon Apr 29 13:06:44 CDT 2024


Erik Carvalhal Miller wrote as follows.
 
> Although the angle brackets have Unicode names containing the word 
> “mathematical” and reside in the Miscellaneous Mathematical Symbols-A 
> block, I was thinking of their linguistic use for denoting characters 
> qua characters.
 
I was unaware of that usage. Thank you for explaining.
 
> The single missing‐glyph glyph you originally saw between them was the 
> fallback display I expected in accordance with the Standard.
 
> Note that UTS #51 encourages any implementation that supports emoji 
> tag sequences but has difficulty with a particular sequence to fall 
> back by displaying the base emoji either followed by or overlaid by a 
> “missing‐emoji glyph”;
 
That situation is because the character that is used for the base 
character of the tag sequence can also be used on its own for its 
original meaning. I am not suggesting, (within the limits of the usage 
being discussed here as anyone may use a Private Use character for their 
own purpose) using U+10FFFD other than as the base character for a tag 
sequence. If the OpenType font recognizes a particular sequence of the 
base character and some tag characters as if a ligature and displays a 
substituted glyph accordingly, then no glyph for U+10FFFD will be 
displayed. So a display of a glyph for U+10FFFD will only be displayed 
if the font in use does not recognize a particular sequence of the base 
character and some tag characters. So, for example, if a font with the 
suggested glyph for U+10FFFD and recognizing, say, twenty sequences of 
the base character and some tag characters, is used to display some 
text, then the font could respond according to whatever sequences are in 
the text that is displayed, substituting a glyph or displaying U+10FFFD 
as appropriate for each sequence encountered.
 
A font with visible glyphs for tag characters will be helpful for 
composing sequences and could also be useful for finding the meaning of 
sequences that are not supported by any font available to the particular 
end user.
 
> since in this case itʼs not likely that the PUA character would even 
> be recognized as an emoji, the fallback you saw is the best‐case 
> scenario one can expect in the absence of a private‐use agreement.
 
Well, I was not restricting myself to emoji in applying the technique of 
using U+10FFFD followed by a sequence of tag characters of which the 
final one is a CANCEL TAG. Emoji sometimes, yet other things too.
 
I had in mind a font where the glyph for U+10FFFD would be a rectangle 
with within the rectangle the top half of a question mark and instead of 
a dot a horizontal arrow pointing to the right as viewed by the viewer.
 
I consider that the phrase "private agreement" in The Unicode 
Standard is, well, not. the whole situation, as it is perfectly possible 
for on person to produce and publish a document declaring some meanings 
and/or glyphs. So while for anyone else to apply those meanings and/or 
glyphs does imply at least a tacit, temporary, like watching a science 
fiction movie suspension of disbelief, sort of agreement, it is not the 
almost formal contractual situation that The Unicode Standard could be 
reasonably thought to be writing about.
 
https://www.unicode.org/versions/Unicode15.0.0/ch23.pdf 
<https://www.unicode.org/versions/Unicode15.0.0/ch23.pdf>  page 23 of 
the PDF document
 
William Overington
 
Monday 29 April 2024
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240429/e428f40e/attachment-0001.htm>


More information about the Unicode mailing list