Emoji and Annotation data

Mark Davis ☕️ mark at macchiato.com
Fri Jun 24 11:04:40 CDT 2016


You should never be scraping *any* Unicode HTML files. They are not made
for that, and there is no guarantee of stability.

The emoji files are built from data which is described in
http://www.unicode.org/reports/tr51/
(plus CLDR annotations and collation)

Mark

On Fri, Jun 24, 2016 at 7:21 AM, Takao Fujiwara <tfujiwar at redhat.com> wrote:

> Hi,
>
> I'm working on IBus - the input method framework for Linux.
> I parse http://unicode.org/emoji/charts/emoji-list.html and create a
> dictionary between the annotations and the Emoji characters.
> Since the file size is large and it's often updated, I'm thinking how to
> maintain the file.
>
> I copied the file as http://ibus.github.io/files/ibus/emoji-list.html for
> the build at the moment.
>
> I have questions:
>  - if unicode.org provides the tarball of the stable html files or other
> data.
>  - what is the license of the html files.
>
> Do you have any ideas?
>
> Thanks,
> Fujiwara
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160624/75d7a31d/attachment.html>


More information about the Unicode mailing list