Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10)

Manish Goregaokar via Unicode unicode at unicode.org
Tue Jan 2 03:41:48 CST 2018


In the current draft GB11 mentions Extended_Pictographic Extend* ZWJ x
Extended_Pictographic.

Can this similarly be distilled to just ZWJ x Extended_Pictographic? This
does affect cases like <indic letter, virama, ZWJ, emoji> or <arabic
letter, zwj, emoji> and I'm not certain if that counts as a degenerate
case. If we do this then all of the rules except the flag emoji one become
things which can be easily calculated with local information, which is nice
for implementors.

(Also in the current draft I think GB11 needs a `E_Modifier?` somewhere but
if we merge that with Extend that's not going to be necessary anyway)

-Manish

On Tue, Jan 2, 2018 at 3:02 PM, Manish Goregaokar <manish at mozilla.com>
wrote:

> > Note: we are already planning to get rid of the GAZ/EBG distinction (
> http://www.unicode.org/reports/tr29/tr29-32.html#GB10) in any event.
>
>
> This is great! I hadn't noticed this when I last saw that draft (I was
> focusing on the Virama stuff). Good to know!
>
>
> > Instead, we'd add one line to
> *Extend <http://www.unicode.org/reports/tr29/tr29-32.html#Extend>:*
>
> Yeah, this is essentially what I was hoping we could do.
>
> Is there any way to formally propose this? Or is bringing it up here good
> enough?
>
> Thanks,
>
> -Manish
>
> On Mon, Jan 1, 2018 at 9:17 PM, Mark Davis ☕️ via Unicode <
> unicode at unicode.org> wrote:
>
>> This is an interesting suggestion, Manish.
>>
>> <non-emoji-base, skin tone modifier> is a degenerate case, so if we
>> following your suggestion we also could drop E_Base and E_Modifier, and
>> rule GB10.
>>
>> Instead, we'd add one line to *Extend
>> <http://www.unicode.org/reports/tr29/tr29-32.html#Extend>:*
>>
>> OLD
>> Grapheme_Extend = Yes
>> *and not* GCB = Virama
>>
>> NEW
>> Grapheme_Extend = Yes, or
>> Emoji characters listed as Emoji_Modifier=Yes in emoji-data.txt. See [
>> UTS51 <http://www.unicode.org/reports/tr41/tr41-21.html#UTS51>].
>> *and not* GCB = Virama
>>
>> Note: we are already planning to get rid of the GAZ/EBG distinction (
>> http://www.unicode.org/reports/tr29/tr29-32.html#GB10) in any event.
>>
>> Mark
>>
>> On Mon, Jan 1, 2018 at 3:52 PM, Richard Wordingham via Unicode <
>> unicode at unicode.org> wrote:
>>
>>> On Mon, 1 Jan 2018 13:24:29 +0530
>>> Manish Goregaokar via Unicode <unicode at unicode.org> wrote:
>>>
>>> > <random non-emoji, skin tone modifier> sounds very much like a
>>> > degenerate case to me.
>>>
>>> Generally yes, but I'm not sure that they'd be inappropriate for
>>> Egyptian hieroglyphs showing human beings.  The choice of determinative
>>> can convey unpronounceable semantic information, though I'm not sure
>>> that that can be as sensitive as skin colour.  However, in such a case
>>> it would also be appropriate to give a skin tone modifier the property
>>> Extend.
>>>
>>> Richard.
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180102/295c0ffd/attachment.html>


More information about the Unicode mailing list