Unicode encoding policy

Asmus Freytag asmusf at ix.netcom.com
Mon Dec 29 13:46:34 CST 2014

On 12/29/2014 10:32 AM, Doug Ewell wrote:
> Asmus Freytag wrote:
>> The "critical mass" of support is now assumed for currency symbols,
>> some special symbols like emoji, and should be granted to additional
>> types of symbols, punctuations and letters, whenever there is an
>> "authority" that controls normative orthography or notation.
>> Whether this is for an orthography reform in some country or addition
>> to the standard math symbols supported by AMS journals, such external
>> adoption can signify immediate "critical need" and "critical mass of
>> option" for the relevant characters.
> To me, it is remarkable that the "critical mass of support" argument 
> that is applied, entirely appropriately, to new currency symbols 
> (however misguided the motives for such might be) and math symbols and 
> characters for people's names, is now also applied to BURRITO and 
Does it - in principle - matter what a symbol is used for? If millions 
of happy users choose to communicate by peppering their messages with 
BURRITO and UNICORN FACE is that any less worthy of standardization than 
if thousands (or hundreds) of linguists use some arcane letterform to 
mark pronunciation differences between neighboring dialects on the 
Scandinavian peninsula?

The "critical mass" argument does not (and should not) make value 
judgements, but instead focus on whether the infrastructure exists to 
make a character code widely available pretty much directly after 
publication, and whether there is implicit or explicit demand that would 
guarantee that such code is actually widely used the minute it comes 

For currency symbols, or for a new letter form demanded by a new or 
revised, but standard, orthography, the demand is created by some 
"authority" creating a requirement for conforming users. Because of 
that, the evaluation of the "critical mass" requirement is straightforward.

Emoji lack an "authority", but they do not lack demand. For better or 
for worse, they have grabbed significant mind share; the number of news 
reports, blogs, social media posts, shared videos and what not that were 
devoted to Emoji simply dwarfs anything reported on currency symbols in 
a comparative time frame. With tracking applications devoted to them, 
anyone can convince themselves, in real time, that the entire repertoire 
is being used, even, as appropriate for such a collection, with a clear 
differentiation by frequency.

Nevertheless, the indication is clear that any emoji that will be added 
by the relevant vendors is going to be used as soon as it comes 
available. Further, as no vendor has a closed ecosystem, to be usable 
requires agreement on how they are coded.

The critical question, and I fully understand that this gives you pause, 
is one of selection. There are hundreds, if not thousands of potential 
additions to the emoji collection, some fear the set is, in principle, 
endless. Lacking an "authority" how does one come to a principled 
agreement on encoding any emoji now, rather than later.

One would run an experiment, which is to say, create an alternate 
environment where users can use non-standard emoji and then the 
Uni-scientists in white lab coats could count the frequency of usage and 
promote the cream off the top to standardized codes.

Or one could run an experiment where one defines a small number of 
slots, say 40, and opens them up for public discussion, and proceeds on 
that basis. Yes, that would turn the UTC into the "authority".

My personal take is that the former approach is inappropriate for 
something that is in high demand and actively supported; the latter I 
can accept, provisionally, as an experiment to try to deal with an 
evolving system. Because of the ability to track, in real time, the use 
or non-use of any of the new additions it would be a true experiment, 
the outcome of which can be accurately measured. If it should lead to 
the standardization of few dozen symbols that prove not as popular as 
predicted, then we would conclude a failure of the experiment, and 
retire this process. Otherwise, I'd have no problem cautiously 
continuing with it.

> But then, I remember when folks used to cite the WG2 "Principles and 
> Procedures" document for examples of what was and was not a good 
> candidate for encoding. That seems so long ago now.

The P&P, like most by-laws and constitutions, are living documents. In 
this case, they try to capture best practice, without taking from the 
UTC (or WG2) the ability to deal with new or changed situations.

The degree to which emoji have captured the popular imagination is 
unprecedented. It means the game has changed. Let's give the UTC the 
space to work out appropriate coping mechanisms.


PS: this does not mean that, for all other types of code points, the 
existing wording on the P&P can simply be disregarded. In fact, the end 
result will be to see them updated with additional criteria explicitly 
geared towards the kind of high-profile use case we are discussing here.

More information about the Unicode mailing list