Emoji data in UCD xml ?

Mark Davis ☕️ mark at macchiato.com
Thu Oct 29 13:20:50 CDT 2015

As Ken said, there's been some preliminary discussion, but we wanted to get
initial information out in connection with UTR #51 first, and take more
time to consider what UCD properties would look like, and which are

The basic information that people want to access for implementations are:

   - Is a character emoji or not?
   - Which emoji have default text presentation? (others having emoji
   - Which emoji are modifiers, and which are modifier bases? (others being
   - Which sequences of emoji are recommended (zwj and/or combining marks)
   for those who support them?
   - flags and modifier sequences are specified algorithmically, and don't
      need to be listed.

The levels, the distinction between primary and secondary, and the carrier
sources were useful in development of the emoji data and tr51 but aren't
really necessary for implementations.


On Thu, Oct 29, 2015 at 9:14 AM, Ken Whistler <kenwhistler at att.net> wrote:

> There has been some preliminary discussion of this. The problem is that
> the data in emoji-data.txt has not yet been formally rationalized into a
> coherent set of Unicode character properties. The UTC would first need to
> determine exactly what property (or list of properties) is involved, before
> incorporating it (or them) formally into the Unicode Character Database
> (UCD)
> and into the XML version of the UCD, and the documentation of it (or them)
> formally into UAX #44.
> --Ken
> On 10/26/2015 10:39 AM, Daniel Bünzli wrote:
>> If I read correctly UTR #51, the way of determining if a scalar value is
>> an emoji character is to consult this data file [1]. Are there any plans to
>> integrate this data in the UCD xml ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20151029/6a3de56a/attachment.html>

More information about the Unicode mailing list