[UTR#51-8] 1.4.3 Emoji Variation Sequences: Female/Venus and Male/Mars Signs

Christoph Päper christoph.paeper at crissov.de
Thu Aug 25 09:52:11 CDT 2016


TL;DR: Unicode properties should reflect user expectations, not vendor choices.

Mark Davis ☕️ <mark at macchiato.com>:
> On Mon, Aug 22, 2016 at 11:26 PM, Christoph Päper <christoph.paeper at crissov.de> wrote:
>> 1. it’s incomplete without an explicit neutral/ambiguous alternative and
> 
> ​As I said, people are actively investigating what to do about such cases. It may be that the solution is to add ⚲ U+26B2 Neuter, but maybe not. We'll see as they develop further.

Natively speaking a language which can explicitly mark any actor noun with a morpheme as female/feminine, but neither as neutral nor as male/masculine – a generic version of English ‘actor/actress’, ‘waiter/waitress’, ‘prince/princess’ – and having intensely dealt with guidelines for corporate languages and public speech, I’ll assure you that a feminism/LGBT shitstorm will be heading for UTC and vendors if binary gender became mandatory for profession emojis. You should not approve Google’s and Apple’s ZWJ sequences without a neutral option. 

JFTR, I know that ☿ U+263F Mercury is also being proposed to denote androgynous/asexual emoji sequences.

>> 2. if they need `Emoji=yes` as a result, this must also be applied to a bunch of related characters.
> 
> ​As I said, ​that is absolutely not a criterion.

As I said, it absolutely should be to honor user expectations.

> If one were to apply that principle (…), then because we have one playing-card emoji, we should make all of the playing cards be emoji; because of one Mahjong tile, one would add all of them. And then add all the chess pieces, and other game pieces.

It’s an open secret that all characters for game notations will have to become emojis sooner or later, regardless if one of them already had the emoji property. (I’m not sure I would have supported them being encoded in the first place, though, especially as lots of precomposed characters.) One big problem at the moment is, I think, that another user demand as anticipated by vendors is that every emoji font and UI should cover all of them.

> And because we have a few circled or squared ideograph and katakana emoji, make all the others emoji. And there are squared or negative ASCII emoji, so add all of the others as emoji.

I already addressed that strawman argument in my previous mail, regarding blood types. Precomposed characters with enclosing shapes are just there for compatibility reasons, so their Emoji property reflects compatibility needs.

> And alchemical symbols, and ... I suspect the transitive closure of this process could end up marking essentially all Unicode characters with the Emoji property.

No, but many, perhaps most of ‘General Category = Other_Symbol (So), Script = Common, Bidirectional Category = Other_Neutral (ON)’ probably and few others (e.g. with ‘Bidirectional Category = L’). That’s little more than 3000 characters as of Unicode 9.0, which includes most existing emojis. Some of them, like reversed or rotated glyphs, would be simple to support for font designers, others could use identical emoji glyphs, e.g. lots of the Light/Medium/Bold/Heavy compatibility dingbat arrows, asterisks etc. Overall, the number of emojis (not counting Fitzpatrick and ZWJ variants) would less than double.

> The committee has and does consider related characters when looking at properties. But this case was not an oversight. Those particular characters were deliberately chosen. It is always possible to add other characters in the future; it will depend on whether they are deemed to be necessary.

The problem lies within the “deemed to be necessary”.

> The purpose for character properties is to promote interoperability. That has always been the case.

Sure, but for almost all characters and properties this has mostly been a descriptive approach, based upon existing texts. Whether a certain character will be included in emoji fonts and IMEs very strongly depends on whether it has the Emoji property (and how it reacts on VS-15/16). Unicode is hence wandering into prescriptive territory here. 

In the Rifle case, for instance, vendors have even removed emoji glyphs after the character, which was specifically proposed for emoji purposes like similar ones, became non-emoji late in the standardization process. On the other side, there are lots of legacy emojis that noone uses (or at least not with the originally intended meaning), but every emoji font supports. Since emojis are often input on mobile devices with some OSes being quite restrictive on installing alternative fonts or keyboards, this problem becomes even more serious.

> The goal of the emoji properties is to have structure that promotes the highest degree of interoperability among the major implementations supporting emoji.

What’s that, a “major implementation[] supporting emoji”? Is it a font, an OS component, a GUI picker, a soft keyboard, a text/input prediction algorithm, a text substitution feature …? You seem to be talking about the default setup on stock iOS (and Mac OS) and Android, maybe Windows (Phone). This effectively means that few US-based multi-billion-dollar companies – Apple, Google, Microsoft and Facebook basically – decide which character can be used as an emoji and which one cannot (while making money on “stickers” at the same time) and unlike Japanese telcos Docomo, KDDI and Softbank they increasingly do so with an agenda. This is a problem. The UTC could be the voice of the global multi-billion-head user base here, but, alas, it’s largely funded and staffed by the aforementioned companies and others like them.

You see, if I was an ancient Egyptian chiseling an ejaculating/peeing penis �� or a 19th-century typographer drawing a heart-shaped exclamation mark ❣ or a late 20th-century Japanese engineer encoding brothels �� as POIs in my mobile map application, these would be considered characters and become part of the Unicode standard in the 21st century. If there are millions or even billions of people who use pictograms for human genitalia in electronic textual communication today (as their ancestors had been doing in analog media for millenia), they have to rely on conventionalized linguistic �� or graphical �� metaphors or they must abuse punctuation marks, digits and letters to “draw” body parts inline, ({|}) 3==D (.Y.) (_!_) (and *many* variants thereof), if they don’t want to resort to actual pictures, which most users are bad at drawing and thus must acquire elsewhere which means additional efforts, costs and legal issues. 

The chance of these pictographs being encoded as single, unambiguous (see ��) characters is basically nil due to the mentioned gatekeepers. Even if they ever made it into the standard, there would still be font vendors who would either not ship any glyph for such characters (see U+130BA etc.), only an inferior one (see ��) or, perhaps worst, a misleading/wrong one (see ��) and OS vendors may exclude them from input methods (see ��) or search engines would ignore them (see #��) on religious, political or other non-technical grounds.

And yes, I’m preparing a proper proposal for missing body part emojis nevertheless, but maybe someone beats me to it.

> It doesn't do any good for Unicode to mark a character as being emoji unless that would result in it being widely deployed as such.

Sorry, but you got that backwards. There are some characters that have non-intuitive or unsystematic properties in Unicode, due to mistakes in the standardization process or bugs in widespread implementations. This may apply as well to some existing emojis (or all of them, for some people), which shouldn’t have been in i-mode phones in the first place. It does not apply, however, to future emojis, whether made from existing characters or new ones.

If a character is a pictogram that is less abstracted than sinograms and other signs used for writing proper, people will want to use it as an emoji (or at least find a use for it if it was available). They can only do so if fonts and software treat them as such. Most vendors will not make those do so unless the standard says they should, because only then they can expect the competition (i.e. potential partners in communication interchange) to do so, too.

A major part of standardization is to document existing (best) practice, but another is to synthesize general concepts from this and to develop new solutions based there upon for better interoperability and user experience in the future. It is failing the latter to deny some characters the Emoji property on arbitrary grounds (incl. demands of high-profile stakeholders) or not including tabooish characters.

> So the committee has to consider carefully what implementations will do. That is nothing new; we have to consider carefully what the impact of any change in property (such as Line_Break) will do in implementations. 

�� What major implementers want.
�� Effect of change on (existing) implementations.

> You can certainly propose (…), that any particular set of additional characters should get the Emoji property, and try to make a case for it. 

Will do, but I’m trying to find out here beforehand whether I’m just wasting my time and everyone else’s, because I’m afraid that could indeed be the case.

> But I'd advise you to make a convincing case for your proposal — without using grounds that would apply to hundreds or thousands of other characters. In particular, you should address the question — for each of those characters — of whether there is a strong expectation that it would be frequently used.

That’s trying to scare away useful input from small and independent parties. The Unicode process is good at that, but at least it allows for it, unlike many other standardization bodies.

Sorry, this got long.


More information about the Unicode mailing list