Another UAX #29 bug: property tables need updating

Manish Goregaokar manish at mozilla.com
Thu Dec 22 12:35:55 CST 2016


The spec lists GraphemeBreakProperty.txt[1] and
WordBreakProperty.txt[2] as the normative source for grapheme and word
categorization respectively.

However, the spec also gives non-normative definitions of these
properties. In particular, it defines Glue_After_Zwj[3] as

> Emoji characters that do not break from a previous ZWJ in a defined emoji zwj sequence, and are not listed as Emoji_Modifier_Base=Yes in emoji-data.txt. See [UTR51].

Going through emoji-zwj-sequences.txt[4], there are a lot of emoji
characters that satisfy this property. The kiss/heart emojis are like
this, as well as every object emoji in the "Gendered Role, with
object" section. However, we only count the kiss, heart, and speech
bubble emoji as GAZ in the property table.

The property table should include all role and gender modifiers as GAZ.

Could this be updated?

 [1]: http://www.unicode.org/Public/UCD/latest/ucd/auxiliary/GraphemeBreakProperty.txt
 [2]: http://www.unicode.org/Public/UCD/latest/ucd/auxiliary/WordBreakProperty.txt
 [3]:http://www.unicode.org/reports/tr29/proposed.html#Glue_After_Zwj
 [4]: http://unicode.org/Public/emoji/4.0/emoji-zwj-sequences.txt

Thanks,
-Manish


More information about the Unicode mailing list