Use of tag characters in a private encoding - is it valid please?
Ken Whistler
kenwhistler at sonic.net
Mon May 6 22:28:15 CDT 2024
I'm not going to pile on about what constitutes "higher-level", but ...
On 5/6/2024 7:23 PM, Erik Carvalhal Miller via Unicode wrote:
> If emoji tag sequences, including the existing flag emoji tag
> sequences, categorically constitute markup, then this markup format is
> one which Unicode has paradoxically defined as part of its plain‐text
> standard.
This is erroneous. Emoji tag sequences are not "defined as part of [the
Unicode Consortium's] plain-text standard", i.e. the Unicode Standard.
Emoji tag sequences are defined in and by UTS #51, which is a *separate*
specification defined on top of the Unicode Standard. Emoji tag
sequences make use of the tag characters defined in the Unicode
Standard, but UTS #51 is defining a protocol for their use which is
built on top of the Unicode Standard, and not formally a part of it.
> If Unicodeʼs RGI sequence for a Welsh‐flag emoji is plain text, then
> an emoji tag sequence headed by a private‐use emoji can be too, as per
> TUS §23.5 and UTS #51. Why could it not be?
Well, because private use is private use. The formal definition of an
emoji_tag_sequence depends on the definition of a tag_base, which can
either be an emoji_character or an emoji_modifier_sequence or an
emoji_presentation_sequence. The problem, for extending any of those to
PUA, is that all of those entity sets are very clearly and precisely
defined by enumerations in data files associated with each version of
the publication of UTS #51. PUA characters are not included in any of
those lists. Therefore, a PUA character cannot be a tag_base, per UTS #51.
It doesn't suffice to say, well, I've decided that U+F0000 is going to
be an emoji character, so I can use it in an emoji_tag_sequence, per UTS
#51. Rather, what one would have to do is build out a private agreement
that 1) I am going to be treat U+F0000 as an emoji, and 2) I am going to
be using a private extension of the concept of an emoji_tag_sequence
which allows my "emoji" U+F0000 as a tag_base. I can document that, and
if I can get somebody else to buy into that private agreement, then by
all means, interchange all you want. But anybody else who happens to
sample some of that text is under no obligation whatsoever to interpret
any of that, or to even recognize your private extension of the concept
of an emoji_tag_sequence to even be syntactically correct, let alone
interpretable.
--Ken
More information about the Unicode
mailing list