Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10)

Manish Goregaokar via Unicode unicode at unicode.org
Mon Jan 1 01:54:29 CST 2018

In UAX 29, the GB10 rule[1] (and the WB14 rule[2]) states that we should
not break before E_modifier characters in case it is after an emoji base
(with optional Extend characters in between)

Given that the spec is allowed to ignore degenerates, is there any value
lost by merging E_Modifier and Extend into a single category? This means we
can completely get rid of the Emoji_Base category, and the EBG category
gets merged with GAZ.

<random non-emoji, skin tone modifier> sounds very much like a degenerate
case to me. <GAZ emoji, skin tone> also feels rather degenerate. There are
only three GAZes (heart (U+2764), kiss (U+1F48B), speech bubble (U+1F5E8))
and I can't see why you'd end up with a skin tone modifier on them except
by accident. (Unless we plan to support lip colors or something but in that
case the kiss emoji would switch to EBG anyway)


 [1]: http://www.unicode.org/reports/tr29/#GB10
 [2]: http://www.unicode.org/reports/tr29/#WB14
