Non-RGI sequences are not emoji? (was: Re: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10))

Mark Davis ☕️ via Unicode unicode at unicode.org
Fri Jan 5 05:30:55 CST 2018


Doug, I modified my working draft, at
https://docs.google.com/document/d/1EuNjbs0XrBwqlvCJxra44o3EVrwdBJUWsPf8Ec1fWKY

If that looks ok, I'll submit.

Thanks again for your comments.

Mark

Mark

On Wed, Jan 3, 2018 at 9:29 AM, Mark Davis ☕️ <mark at macchiato.com> wrote:

> Thanks for your comments; you raise an excellent issue. There are valid
> sequences that are not RGI; a vendor can support additional emoji sequences
> (in particular, flags). So the wording in the doc isn't correct.
>
> It should do something like replace the use of "testing for RGI" by
> "testing for validity". The key areas involved in that are checking for the
> valid base+modifier combinations, valid RI pairs, and TAG sequences. The
> latter two involve testing based on the information applied in the
> appendix, while the valid base+modifiers are more regular and can be tested
> based on properties.
>
>
> Mark
>
> On Tue, Jan 2, 2018 at 9:55 PM, Doug Ewell via Unicode <
> unicode at unicode.org> wrote:
>
>> Mark Davis wrote:
>>
>> BTW, relevant to this discussion is a proposal filed
>>> http://www.unicode.org/L2/L2017/17434-emoji-rejex-uts51-def.pdf (The
>>> date is wrong, should be 2017-12-22)
>>>
>>
>> The phrase "emoji regex" had caused me to ignore this document, but I
>> took a look based on this thread. It says "we still depend on the RGI test
>> to filter the set of emoji sequences" and proposes that the EBNF in UTS #51
>> be simplified on the basis that only RGI sequences will pass the "possible
>> emoji" test anyway.
>>
>> Thus it is true, as some people have said (i.e. in L2/17‐382), that
>> non-RGI sequences do not actually count as emoji, and therefore there is no
>> way — not merely no "recommended" way — to represent the flags of entities
>> such as Catalonia and Brittany.
>>
>> In 2016 I had asked for the emoji tag sequence mechanism for flags to be
>> available for all CLDR subdivisions, not just three, with the understanding
>> that the vast majority would not be supported by vendor glyphs. II t is
>> unfortunate that, while the conciliatory name "recommended" was adopted for
>> the three, the intent of "exclusively permitted" was retained.
>>
>> --
>> Doug Ewell | Thornton, CO, US | ewellic.org
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180105/e94a6851/attachment.html>


More information about the Unicode mailing list