Line breaking status of emoji modifiers

Simon Cozens simon at simon-cozens.org
Sat Dec 5 18:08:59 CST 2015


My renderer just got hit with an interesting, if possibly obscure, bug.

UTR#51 says "A supported emoji modifier sequence should be treated as a
single grapheme cluster for editing purposes (cursor moment, deletion,
etc.); word break, line break, etc." However, the modifier codepoints
have line break category AL.

So you have an emoji (line break ID) and its modifier (line break AL),
and ICU (quite correctly) inserts a line break opportunity between the
two. This split the cluster, and then everything went downhill after that.

If you don't expect a line break here, shouldn't they be better as CM
for line breaking purposes rather than AL?


More information about the Unicode mailing list