Unicode Emoji 5.0 characters now final

Doug Ewell doug at ewellic.org
Tue Mar 28 13:41:38 CDT 2017

Mark Davis wrote:

> 3. Valid, but not recommended: "usca". Corresponds to the valid
> Unicode subdivision code for California according to
> http://unicode.org/reports/tr51/proposed.html#valid-emoji-tag-sequences
> and CLDR, but is not listed in http://unicode.org/Public/emoji/5.0/.

"Not recommended" is no better and no less disappointing than "not
standard." Both phrases imply strongly that the sequence, while
syntactically valid, should not be used.

Burying a disclaimer that "implementations can support them, but they
may not interoperate well" in the speaker's notes of slide 38 of a
53-page presentation does nothing to change this perception.

"Even though it is possible to support the US states, or any subset of
them, implementations don’t have to." Well, of course they don't.
Implementations don't have to support the three British flags either if
they don't want to, or any national flags or other emoji, or any
particular character for that matter. The superfluous statement is
easily reduced to "Don't do this."

Joan Montané's return to the list to comment on this issue was
interesting because of a post from February 2015, in which Andrea
Giammarchi reported [1] on Joan's request [2] for Twitter to support
flags for specific "active online communities" that happened to have a
TLD, by stringing three or more Regional Indicator Symbols together:

> [S][C][O][T] --> it shows Scottish flag
> [C][Y][M][R][U] --> it shows a Welsh flag
> [B][Z][H] --> it shows a Breton flag
> [C][A][T] --> it shows Catalan flag
> [E][U][S] --> it shows a Basque flag
> [G][A][L] --> it shows a Gallician flag

[1] http://www.unicode.org/mail-arch/unicode-ml/y2015-m02/0039.html
[2] https://github.com/twitter/twemoji/issues/40

Of course this approach was incompatible with conformant use of RIS;
visit [2] with an RIS-conformant browser to see the inadvertently
displayed flags of Seychelles, Cyprus, Belize, Canada, etc.

I don't know if the ensuing thread helped inspire ESC to pursue the
present mechanism involving sequences of Plane 14 tags -- the earliest
mention I can find is PRI #299, just a few months later -- but the
intent seemed straightforward and sensible: provide an official,
conformant mechanism to support a recognized user need, with a suitable
fallback strategy, rather than encouraging users via inaction to adopt a
non-conformant and broken solution.

Unfortunately, the follow-up turned out to be "... and then discourage
THAT mechanism as well, except in a couple of selected cases, and tell
people to use stickers instead."

If this story sounds vaguely familiar to old-timers, it's exactly the
path that was followed the last time Plane 14 tag characters were under
discussion, between 1998 and 2000: someone wrote an RFC to embed
language tags in plain text using invalid UTF-8 sequences; Unicode
responded by introducing a proper, conformant mechanism to use Plane 14
characters instead; then the conformant replacement mechanism itself was
deprecated and users were told to use out-of-band tagging, exactly what
the original RFC sought to avoid.

"Not recommended," "not standard," "not interoperable," or any other
term ESC settles on for the 5000+ valid flag sequences that are not
England, Scotland, and Wales is just a short, easy step away from
deprecation for these as well.

Doug Ewell | Thornton, CO, US | ewellic.org

More information about the Unicode mailing list