UAX 29 9.0.0 new emoji flag rules questions and comments

Laurentiu Iancu liancu at microsoft.com
Tue Jun 21 19:32:08 CDT 2016


Hello,



Re #1, the ^ symbol indeed denotes a start-of-line anchor, in usual regex notation, and the corresponding rules could use sot instead.



Re #2, that was an oversight, and will be addressed in the Proposed Update of UAX #29 for Unicode 10.0.



Re #3 and #4, both were addressed before the release of Version 9.0.



For suggestions such as #1, which require review by the UTC, please remember to use the feedback reporting form.



Thank you,

L.



-----Original Message-----
From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Daniel Bünzli
Sent: Tuesday, June 21, 2016 9:02 AM
To: Unicode Public <unicode at unicode.org>
Subject: UAX 29 9.0.0 new emoji flag rules questions and comments



I have a few questions/comments about the new emoji segmentation rules in 9.0.0



1. I have trouble understanding what the ^ symbol means in these rules:



http://www.unicode.org/reports/tr29/proposed.html#GB8a

http://www.unicode.org/reports/tr29/proposed.html#WB15



does it correspond to the regexp SOL symbol ? If that is the case SOL is a bit ambiguous in that context it could also mean that you need to match start of lines which is a whole different business. Couldn't that simply be replaced by sot ?



2. Besides given that with GB8* rules you need to be able to count an odd number of RI, it seems to me that the sentence "Grapheme cluster boundaries can be easily tested by looking at immediately adjacent characters." is no longer accurate.



3. There are two rules named GB8c.



4. In §1.1 the link to UTS18 is broken (#RegEx does not exist in UAX 41).



Best,



Daniel






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160622/fda03923/attachment.html>


More information about the Unicode mailing list