Emoji and Annotation data

Takao Fujiwara tfujiwar at redhat.com
Mon Jun 27 02:48:20 CDT 2016


On 06/27/16 16:01, Peter Edberg-san wrote:
> I had suggested that you check
> http://unicode.org/cldr/trac/browser/tags/latest/common/annotations/en.xml
> which has the line
> <annotation cp='[��]' tts='grinning face'>face; grin</annotation>
>
> Is that not what you want?

I'm sorry. I missed that.
OK, it seems emoji-list.html is the combination of en.xml and /Public/emoji/3.0/emoji-*.txt
However I cannot find some annotations. E.g. "america".

BTW, I think more categories are useful for the annotations likes "animal", "country".

Fujiwara

>
> - Peter
>
>
> On Jun 26, 2016, at 10:34 PM, Takao Fujiwara <tfujiwar at redhat.com> wrote:
>>
>> Hi,
>>
>> E.g. http://unicode.org/emoji/charts/emoji-list.html
>> "��" has the annotations of "face" and "grin".
>>
>> The data is available in only the html files.
>>
>> Fujiwara
>>
>> On 06/27/16 14:16, Peter Edberg-san wrote:
>>> Fujiwara-san,
>>> If you follow the information indicated by UTR 51 (as Mark had suggested), you will see that:
>>>
>>> 1. The annotations data is available in CLDR here, in English:
>>> http://unicode.org/cldr/trac/browser/tags/latest/common/annotations/en.xml
>>> (or in many other languages, such as Japanese:)
>>> http://unicode.org/cldr/trac/browser/tags/latest/common/annotations/ja.xml
>>>
>>> The description of the format for those xml files is here:
>>> http://www.unicode.org/reports/tr35/tr35-general.html#Annotations
>>>
>>> 2. Other emoji data files are here:
>>> http://www.unicode.org/Public/emoji/latest/
>>>
>>> These data files are what drive the generation of the charts.
>>>
>>> Best regards,
>>> Peter Edberg
>>>
>>>
>>>
>>>> On Jun 26, 2016, at 9:09 PM, Takao Fujiwara <tfujiwar at redhat.com> wrote:
>>>>
>>>> On 06/25/16 01:04, Mark Davis ☕️-san wrote:
>>>>> You should never be scraping /any/ Unicode HTML files. They are not made for that, and there is no guarantee of stability.
>>>>
>>>> I cannot find the license or descriptions about the HTML files.
>>>>
>>>>>
>>>>> The emoji files are built from data which is described in http://www.unicode.org/reports/tr51/
>>>>> (plus CLDR annotations and collation)
>>>>
>>>> OK, I need the data which packages Emoji unicode and the annotation.
>>>> It would be great if the data could be provided besides the html files.
>>>>
>>>> Thanks,
>>>> Fujiwara
>>>>
>>>>>
>>>>> Mark
>>>>> //////
>>>>>
>>>>> On Fri, Jun 24, 2016 at 7:21 AM, Takao Fujiwara <tfujiwar at redhat.com <mailto:tfujiwar at redhat.com>> wrote:
>>>>>
>>>>>   Hi,
>>>>>
>>>>>   I'm working on IBus - the input method framework for Linux.
>>>>>   I parse http://unicode.org/emoji/charts/emoji-list.html and create a dictionary between the annotations and the Emoji characters.
>>>>>   Since the file size is large and it's often updated, I'm thinking how to maintain the file.
>>>>>
>>>>>   I copied the file as http://ibus.github.io/files/ibus/emoji-list.html for the build at the moment.
>>>>>
>>>>>   I have questions:
>>>>>    - if unicode.org <http://unicode.org> provides the tarball of the stable html files or other data.
>>>>>    - what is the license of the html files.
>>>>>
>>>>>   Do you have any ideas?
>>>>>
>>>>>   Thanks,
>>>>>   Fujiwara
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>



More information about the Unicode mailing list