Line breaking status of emoji modifiers

Mark Davis ☕️ mark at macchiato.com
Sun Dec 6 11:25:19 CST 2015


Yes. This was discussed at the last UTC, and for line break (and other
segmentation, eg #29), there is an action to proposal appropriate rules for
9.0. There are three types of emoji sequences that need to be handled:

   - flag sequences
   - modifier sequences
   - zwj sequences

In the meantime, people are customizing their implementations to deal with
the emoji sequences. For now, it may be simpler for some to just use the
complete list of current sequences as exceptions, and disallow breaking
within them.

Mark

On Sun, Dec 6, 2015 at 1:08 AM, Simon Cozens <simon at simon-cozens.org> wrote:

> My renderer just got hit with an interesting, if possibly obscure, bug.
>
> UTR#51 says "A supported emoji modifier sequence should be treated as a
> single grapheme cluster for editing purposes (cursor moment, deletion,
> etc.); word break, line break, etc." However, the modifier codepoints
> have line break category AL.
>
> So you have an emoji (line break ID) and its modifier (line break AL),
> and ICU (quite correctly) inserts a line break opportunity between the
> two. This split the cluster, and then everything went downhill after that.
>
> If you don't expect a line break here, shouldn't they be better as CM
> for line breaking purposes rather than AL?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20151206/2b198ede/attachment.html>


More information about the Unicode mailing list