Use of tag characters in a private encoding - is it valid please?
William_J_G Overington
wjgo_10009 at btinternet.com
Mon Apr 29 13:06:44 CDT 2024
Erik Carvalhal Miller wrote as follows.
> Although the angle brackets have Unicode names containing the word
> “mathematical” and reside in the Miscellaneous Mathematical Symbols-A
> block, I was thinking of their linguistic use for denoting characters
> qua characters.
I was unaware of that usage. Thank you for explaining.
> The single missing‐glyph glyph you originally saw between them was the
> fallback display I expected in accordance with the Standard.
> Note that UTS #51 encourages any implementation that supports emoji
> tag sequences but has difficulty with a particular sequence to fall
> back by displaying the base emoji either followed by or overlaid by a
> “missing‐emoji glyph”;
That situation is because the character that is used for the base
character of the tag sequence can also be used on its own for its
original meaning. I am not suggesting, (within the limits of the usage
being discussed here as anyone may use a Private Use character for their
own purpose) using U+10FFFD other than as the base character for a tag
sequence. If the OpenType font recognizes a particular sequence of the
base character and some tag characters as if a ligature and displays a
substituted glyph accordingly, then no glyph for U+10FFFD will be
displayed. So a display of a glyph for U+10FFFD will only be displayed
if the font in use does not recognize a particular sequence of the base
character and some tag characters. So, for example, if a font with the
suggested glyph for U+10FFFD and recognizing, say, twenty sequences of
the base character and some tag characters, is used to display some
text, then the font could respond according to whatever sequences are in
the text that is displayed, substituting a glyph or displaying U+10FFFD
as appropriate for each sequence encountered.
A font with visible glyphs for tag characters will be helpful for
composing sequences and could also be useful for finding the meaning of
sequences that are not supported by any font available to the particular
end user.
> since in this case itʼs not likely that the PUA character would even
> be recognized as an emoji, the fallback you saw is the best‐case
> scenario one can expect in the absence of a private‐use agreement.
Well, I was not restricting myself to emoji in applying the technique of
using U+10FFFD followed by a sequence of tag characters of which the
final one is a CANCEL TAG. Emoji sometimes, yet other things too.
I had in mind a font where the glyph for U+10FFFD would be a rectangle
with within the rectangle the top half of a question mark and instead of
a dot a horizontal arrow pointing to the right as viewed by the viewer.
I consider that the phrase "private agreement" in The Unicode
Standard is, well, not. the whole situation, as it is perfectly possible
for on person to produce and publish a document declaring some meanings
and/or glyphs. So while for anyone else to apply those meanings and/or
glyphs does imply at least a tacit, temporary, like watching a science
fiction movie suspension of disbelief, sort of agreement, it is not the
almost formal contractual situation that The Unicode Standard could be
reasonably thought to be writing about.
https://www.unicode.org/versions/Unicode15.0.0/ch23.pdf
<https://www.unicode.org/versions/Unicode15.0.0/ch23.pdf> page 23 of
the PDF document
William Overington
Monday 29 April 2024
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240429/e428f40e/attachment-0001.htm>
More information about the Unicode
mailing list