Unicode Emoji 5.0 characters now final

Mark Davis ☕️ mark at macchiato.com
Tue Mar 28 05:49:39 CDT 2017


​Thanks. Probably best as:

unicode_locale_id = unicode_language_id
                    ( transformed_extensions unicode_locale_extensions?
                    | unicode_locale_extensions transformed_extensions? )?
;​

even clearer would be two steps:

unicode_locale_id = unicode_language_id extensions? ;

extensions        = transformed_extensions unicode_locale_extensions?
                  | unicode_locale_extensions transformed_extensions? ;

​Could you file a CLDR ticket on this?

​
Mark

On Tue, Mar 28, 2017 at 12:36 PM, Philippe Verdy <verdy_p at wanadoo.fr> wrote:

> I note this in TR32
> *3.2 Unicode Locale Identifier
> <http://unicode.org/reports/tr35/index.html#Unicode_locale_identifier>*
>
> EBNF
> ABNF
>
> unicode_locale_id
> <http://unicode.org/reports/tr35/index.html#unicode_locale_id> =
> unicode_language_id
>   (transformed_extensions
>   unicode_locale_extensions?
> | unicode_locale_extensions?
>   transformed_extensions?) ; = unicode_language_id
>   ([trasformed_extensions
>   [unicode_locale_extensions]]
> / [unicode_locale_extensions
>   [transformed_extensions]])
>
> * first there's a typo in the ABNF syntax ("trasformed")
> * the syntax is not strictly equivalent, or the ABNF is unnecessarily not
> context-free
>
> It should better be:
>
> EBNF
> ABNF
>
> unicode_locale_id
> <http://unicode.org/reports/tr35/index.html#unicode_locale_id> =
> unicode_language_id
>  (transformed_extensions
>   unicode_locale_extensions?
> | unicode_locale_extensions
>   transformed_extensions?)?; = unicode_language_id
>  [transformed_extensions
>   [unicode_locale_extensions]
> / unicode_locale_extensions
>   [transformed_extensions]]
>
>
>
> 2017-03-28 11:56 GMT+02:00 Joan Montané <joan at montane.cat>:
>
>>
>>
>> 2017-03-28 7:57 GMT+02:00 Mark Davis ☕️ <mark at macchiato.com>:
>>
>>> To add to what Ken and Markus said: like many other identifiers, there
>>> are a number of different categories.
>>>
>>>    1. *Ill-formed: *"$1"
>>>    2. *Well-formed, but not valid: *"usx". Is *syntactic* according to
>>>    http://unicode.org/reports/tr51/proposed.html#def_emoji_tag_sequence
>>>    <http://unicode.org/reports/tr51/proposed.html#def_emoji_tag_sequence>,
>>>    but is not *valid* according to http://unicode.org/reports/tr5
>>>    1/proposed.html#valid-emoji-tag-sequences
>>>    <http://unicode.org/reports/tr51/proposed.html#valid-emoji-tag-sequences>
>>>    .
>>>    3. *Valid, but not recommended: "usca". *Corresponds to the valid
>>>    Unicode subdivision code for California according to
>>>    http://unicode.org/reports/tr51/proposed.html#valid-emoji-ta
>>>    g-sequences
>>>    <http://unicode.org/reports/tr51/proposed.html#valid-emoji-tag-sequences>
>>>    and CLDR, but is not listed in http://unicode.org/Public/emoji/5.0/.
>>>    4. *Recommended:* "gbsct". Corresponds to the valid Unicode
>>>    subdivision code for Scotland, and *is* listed in
>>>    http://unicode.org/Public/emoji/5.0/
>>>    <http://unicode.org/Public/emoji/5.0/>.
>>>
>>>  As Ken says, the terminology is a little bit in flux for term
>>> 'recommended'. TR51 is still open for comment, although we won't make any
>>> changes that would invalidate http://unicode.org/Public/emoji/5.0/.
>>>
>>
>> Just two remarks
>>
>> 1st one: point 4 (Unicode subdivision codes listed in emoji Unicode site)
>> arises something like chicken-egg problem. Vendors don't easily add new
>> subdivision-flags (because they aren't recommended), and Unicode doesn't
>> recommend new subdivision flags (because they aren't supported by vendors).
>>
>> 2n one: What about "Adopt a Character" (AKA "Adopt an emoji"). Will be
>> valid, but not recommended, Unicode subdivisions codes eligible? For
>> instances, say, could someone adopt California, Texas, Pomerania, or
>> Catalonia flags?
>>
>>
>> Regards,
>> Joan Montané
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20170328/5b7fe81a/attachment.html>


More information about the Unicode mailing list