On emoji and the two rightwards black arrows
js_choi at icloud.com
Fri Oct 30 13:51:45 CDT 2015
# On emoji and the two rightwards black arrows
This is a long post, and I apologize for that; it’s a somewhat complicated topic. The post is about two encoded characters:
U+27A1 Black Rightwards Arrow <http://www.unicode.org/charts/PDF/U2700.pdf>
and U+2B95 Rightwards Black Arrow <http://www.unicode.org/charts/PDF/U2B00.pdf>.
• The post first reviews their encodings’ respective histories, as I currently understand it; hopefully I’m not mistaken about anything.
• It then informally suggests that U+2B95 be added to emoji-data.txt (and possibly be given standardized text/emoji variants)—as U+27A1 already has been—on the basis that U+2B95 is as equally, if not more, suited than U+27A1 to serve as a general rightwards arrow symbol.
• It also proposes that clarification be added to their entries in the code charts about the differences between their intended functions, and answering when to use one versus the other, as per their contrasting histories.
I don’t intend to be making anything like a formal proposal yet, but I might in the future. For now, I’d like to clarify the characters’ respective intended purposes and see how feasible or likely the proposed changes would be before investing time, etc. in a formal proposal.
The history below is taken from the following posts:
• Ken Whistler:
• 2015-05 <http://www.unicode.org/mail-arch/unicode-ml/y2015-m05/0272.html>.
• 2015-10 <http://www.unicode.org/mail-arch/unicode-ml/y2015-m10/0223.html>.
• Mark Davis:
• 2015-10 <http://www.unicode.org/mail-arch/unicode-ml/y2015-m10/0226.html>.
• Michel Suignard:
• 2015-05 <http://www.unicode.org/mail-arch/unicode-ml/y2015-m05/0268.html>. (Note that this post contains paragraphs quoted from another person that is not marked differently, with Suignard’s replies below each one.)
• 1993: The glyphs from ITC Zapf Dingbats typeface were encoded in the Unicode Standard 1.1 for compatibility with PostScript printers that use them. This included U+27A1 Black Rightwards Arrow.
The Zapf Dingbat arrows all face rightwards, as generically rotatable arrow glyphs. No leftwards, upwards, or downwards versions of arrows were encoded because PostScript printers were assumed to rotate generic rightwards arrows in original Zapf Dingbats fonts. U+27A1’s representative glyph is taken from Zapf Dingbats.
• 2003: Representatives of North Korea (the DPRK) submitted a proposal to add compatibility characters for a DPRK encoding standard <http://www.unicode.org/L2/L2001/01349-N2374-DPRK-AddSymbols.pdf>. These included black-filled arrows in the four cardinal directions.
The proposal only included leftwards, upwards, and downwards black arrows, apparently because the representatives believed that U+27A1 fit their purposes for compatibility with their rightwards black arrow.
The former three were encoded as U+2B05–U+2B07 in the Unicode Standard 4.0. Their representative glyphs and names were taken from the DPRK proposal; the glyphs and names thus did not align with U+27A1 (e.g., U+2B05 Leftwards Black Arrow vs. U+27A1 Black Rightwards Arrow). Whistler states that “…nobody commented on” them and “nobody much cared, because because these were compatibility additions for a DPRK standard, and weren't mapped to any commercial sets at the time, anyway” (2015-05).
The unification of new DPRK compatibility arrows U+2B05–U+2B07 with rotations of Zapf Dingbat arrow U+27A1 was implied by the Standard but not explicit. For the next decade, most fonts implementing all four characters used glyphs matching the code charts’ (i.e., using the mismatching Zapf Dingbat glyph for the right arrow, and the DPRK glyphs for the other black arrows).
• 2011–2013: Google, Apple, and Microsoft begin to support emoji characters from Japanese cellular carriers using characters from the Unicode Standard 6.0. Four of those Japanese-carrier characters are black arrows in the four cardinal directions (UTR #51).
The three companies use the DPRK-compatibility black arrows U+2B05–U+2B07 for three of them. Presumably because it was assumed to be part of their set and there was no better alternative, the Zapf Dingbat U+27A1 for the final, rightward black arrow from the Japanese-carrier emoji.
Based on then-current usage, these four characters’ mappings, among others, are added to a new, separate Unicode data file for emoji data <http://www.unicode.org/Public/emoji/1.0/emoji-data.txt>. The data to this data have “not yet been formally rationalized into a coherent set of Unicode character properties” (Whistler 2015-10), in
• 2014: A “complete re-rationalization of all the arrows symbols” occurred (Whistler 2015-05) in the Unicode Standard 7.0 due to addition of arrows from Wingdings, Wingdings 2, and Webdings <http://www.unicode.org/L2/L2012/12130-n4239.pdf>.
The DPRK-compatibility black arrows U+2B05–U+2B07 are unified with similar Wingding black arrows, and their representative glyphs are modified thus to harmonize. However, the glyph of Zapf Dingbat arrow U+27A1 is deemed to be unmodifiable, because its identity is strongly coupled to the original arrow glyph in the ITC Zapf Dingbat typefaces.
The now-generic black arrows U+2B05–U+2B07 are thus disunified from rotations of U+27A1. A new character, U+2B95 Rightwards Black Arrow, is added with the intention of completing the U+2B05–U+2B07 set; it receives a correspondingly matching representative glyph.
## Present issues
The new U+B295 Rightwards Black Arrow together with the now-generic U+2B05–U+2B07 are supposed to form a single set of arrows, with correspondingly matching representative glyphs, as Mr. Suignard has said. It will take time for U+B295 to be implemented by new fonts, but “the explicit glyph updates for U+2B00..U+2B0D…were clearly intentional” (Whistler 2015-05). In other words, according to the Standard since version 7.0, the matching character that is the rightward version of U+2B05–U+2B07 is now clearly U+B295—not U+27A1, which has been disunified from the set and is now merely a Zapf Dingbat.
However, this is still not yet completely true: UTR #51 and emoji-data.txt currently define the rightwards version of U+2B05–U+2B07 to be the Zapf Dingbat U+27A1. UTR #51 currently does not define U+B295 to be an emoji character. Furthermore, there are no text/emoji standardized variants of U+B295 yet, unlike U+27A1.
Upon reviewing the history above, it becomes apparent that this is due to missed timing between the advent of Unicode emoji (in 2011–2013) and the advent of U+B295 (in 2014). Apple, Google, and Microsoft had no character other than U+27A1 that they could use for the Japanese carriers’ rightward black arrow; at that time U+27A1 was still implicitly unified with the other black arrows.
It seems to be possible to change the emoji data to more logically match the intended usage of the new U+B295. My questions are thus:
1. Should U+B295 be added to the set of emoji characters as given by UTR #51 and emoji-data.txt, with the intent to complete the harmonization with U+2B05–U+2B07 in 2014?
2. If #1’s answer is yes, then should U+B295 be given text/emoji standardized variation sequences, just as U+2B05–U+2B07 already do?
3. Regardless of the answers to the above, should clarification on the conceptual differences between U+B295 (the right black arrow completing U+2B05–U+2B07) and U+27A1 (the Zapf Dingbat) be added to their entries in the Standard’s code charts? This might clear up a lot of confusion from users and font creators, and would only make clearer what has already been made explicit by 7.0’s glyph changes.
## Possible objections
There are two objections to #1 and #2 that I could foresee:
First is that, when using emoji, a user might perceive redundancy between an emoji form of U+B295 and the already existent emoji form of U+27A1, and this might cause user confusion over which one to use. However, this redundancy has already existed since Unicode 7.0, when U+B295 was added in the first place. The Consortium apparently decided at the time that the risk of user confusion between U+B295 and U+27A1 was worth it in regular-text contexts; I don’t see why it would be significantly different in emoji contexts. Vendors’ emoji input palettes could merely present only U+B295, rather than U+27A1, to the user, with little disadvantage.
Second is that compatibility mappings with Japanese carrier sets already use U+27A1, and mappings should generally be stable across versions of Unicode. However, the Unicode emoji data are not yet formally set in stone; there has only been preliminary discussion and the initial publication of UTR #51 (Whistler 2015-10; Davis 2015-10). The mappings with the carrier sets are probably thus not under the same stability guarantees that other formal mappings are under (and, even if they are, I could find no policy in <http://www.unicode.org/policies/stability_policy.html> that prohibits modifying formal mappings in general).
In any case, I might make a formal proposal in the future, but I first want to determine here how probable that such a proposal would be discussed. What would you say the answers to those three questions are?
J. S. Choi
More information about the Unicode