Unicode Emoji 5.0 characters now final

Doug Ewell doug at ewellic.org
Wed Mar 29 15:12:11 CDT 2017

Martin J. Dürst wrote:

> I think there is some missing information here. First, the original
> proposal that used invalid UTF-8 sequences never was an RFC, only an
> Internet Draft.

Yes, you're right. I realized that a minute after "Send" but didn't
think it changed the story enough to justify a correction. For the
curious, the I-D is at
https://www.ietf.org/archive/id/draft-ietf-acap-mlsf-01.txt .

> But what's more important, the protocol that motivated all this work
> (ACAP) never went anywhere. Nor did any other use of the plane 14
> language tag characters get any kind of significant traction. That
> lead to depreciation, because it would have been a bad idea to let
> people think that the information in these taggings would actually be
> used.

Is that common practice in Unicode, that if something doesn't gain
significant traction in the comparatively short term, it becomes a
candidate for deprecation?

> For some people (including me), that was always seen as the likely
> outcome; the language tag characters were mostly introduced as a
> defensive mechanism (way better than invalid UTF-8) rather than
> something we hoped everybody would jump on. Putting them on plane 14
> (which meant that it would be four bytes for each character, and
> therefore quite a lot of bytes for each tag) was part of that message.

I understand the "defensive" aspect of trying to prevent people from
abusing Unicode, especially in the 1997–1998 time frame when UTF-8 was
still new and people didn't realize the cost of tampering with it.

But if you're going to build a mechanism at all, it seems peculiar to
define it in full but then discourage its intended use at the outset, or
to build it in such a way that users will find it difficult or
unpalatable to use.

> I think the situation is vastly different here. First, the Consortium
> never officially 'activated' any subdivision flags, so it would be
> impossible to deprecate them.

The Emoji 5.0 mechanism of using tag sequences for three subdivision
flags was announced earlier this week. The specification grudgingly
allows, but non-recommends, use of the mechanism for any other flags. It
is that grudging allowance that could be deprecated, not any of the
specific flags.

> Second, we already see some pressure (on this list) to 'recommend'
> more of these, and I guess the vendors and the Consortium will give in
> to this pressure, even if slowly and to some extent quite reluctantly.
> It's anyone's bet in what time frame and order e.g. the flags of
> California and Texas will be 'recommended'. But I have personally no
> doubt that these (and quite a few others) will eventually make it,
> even if I have mixed feelings about that.

Then what was the benefit of "not recommending" them in the first place?
Why is it a problem if vendors look at the list of 5100 or so
subdivisions, or even the small subset that actually have flags, and
think, "OMG, look at all those new flags we'll be forced to support"? Is
this any different from when a new CJK extension or other large block of
characters is added?

I would think vendors could make their own business decisions about what
flags to support. "Hmm, yeah, definitely Texas, maybe Lombardy, not so
sure about Colorado, probably not Guna Yala." I don't see why they had
to be essentially told what to support and what not to. 
Doug Ewell | Thornton, CO, US | ewellic.org

More information about the Unicode mailing list