Non-RGI sequences are not emoji? (was: Re: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10))
Mark Davis ☕️ via Unicode
unicode at unicode.org
Wed Jan 3 02:29:14 CST 2018
Thanks for your comments; you raise an excellent issue. There are valid
sequences that are not RGI; a vendor can support additional emoji sequences
(in particular, flags). So the wording in the doc isn't correct.
It should do something like replace the use of "testing for RGI" by
"testing for validity". The key areas involved in that are checking for the
valid base+modifier combinations, valid RI pairs, and TAG sequences. The
latter two involve testing based on the information applied in the
appendix, while the valid base+modifiers are more regular and can be tested
based on properties.
On Tue, Jan 2, 2018 at 9:55 PM, Doug Ewell via Unicode <unicode at unicode.org>
> Mark Davis wrote:
> BTW, relevant to this discussion is a proposal filed
>> http://www.unicode.org/L2/L2017/17434-emoji-rejex-uts51-def.pdf (The
>> date is wrong, should be 2017-12-22)
> The phrase "emoji regex" had caused me to ignore this document, but I took
> a look based on this thread. It says "we still depend on the RGI test to
> filter the set of emoji sequences" and proposes that the EBNF in UTS #51 be
> simplified on the basis that only RGI sequences will pass the "possible
> emoji" test anyway.
> Thus it is true, as some people have said (i.e. in L2/17‐382), that
> non-RGI sequences do not actually count as emoji, and therefore there is no
> way — not merely no "recommended" way — to represent the flags of entities
> such as Catalonia and Brittany.
> In 2016 I had asked for the emoji tag sequence mechanism for flags to be
> available for all CLDR subdivisions, not just three, with the understanding
> that the vast majority would not be supported by vendor glyphs. II t is
> unfortunate that, while the conciliatory name "recommended" was adopted for
> the three, the intent of "exclusively permitted" was retained.
> Doug Ewell | Thornton, CO, US | ewellic.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode