UAX 29 9.0.0 new emoji flag rules questions and comments

Daniel Bünzli daniel.buenzli at erratique.ch
Tue Jun 21 11:02:15 CDT 2016


I have a few questions/comments about the new emoji segmentation rules in 9.0.0

1. I have trouble understanding what the ^ symbol means in these rules:  

http://www.unicode.org/reports/tr29/proposed.html#GB8a
http://www.unicode.org/reports/tr29/proposed.html#WB15

does it correspond to the regexp SOL symbol ? If that is the case SOL is a bit ambiguous in that context it could also mean that you need to match start of lines which is a whole different business. Couldn't that simply be replaced by sot ?  

2. Besides given that with GB8* rules you need to be able to count an odd number of RI, it seems to me that the sentence "Grapheme cluster boundaries can be easily tested by looking at immediately adjacent characters." is no longer accurate.

3. There are two rules named GB8c.

4. In §1.1 the link to UTS18 is broken (#RegEx does not exist in UAX 41).  

Best,  

Daniel  





More information about the Unicode mailing list