From unicode at unicode.org Mon Jan 1 01:54:29 2018 From: unicode at unicode.org (Manish Goregaokar via Unicode) Date: Mon, 1 Jan 2018 13:24:29 +0530 Subject: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10) Message-ID: In UAX 29, the GB10 rule[1] (and the WB14 rule[2]) states that we should not break before E_modifier characters in case it is after an emoji base (with optional Extend characters in between) Given that the spec is allowed to ignore degenerates, is there any value lost by merging E_Modifier and Extend into a single category? This means we can completely get rid of the Emoji_Base category, and the EBG category gets merged with GAZ. sounds very much like a degenerate case to me. also feels rather degenerate. There are only three GAZes (heart (U+2764), kiss (U+1F48B), speech bubble (U+1F5E8)) and I can't see why you'd end up with a skin tone modifier on them except by accident. (Unless we plan to support lip colors or something but in that case the kiss emoji would switch to EBG anyway) Thanks, -Manish [1]: http://www.unicode.org/reports/tr29/#GB10 [2]: http://www.unicode.org/reports/tr29/#WB14 -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Jan 1 02:06:27 2018 From: unicode at unicode.org (Jonathan Rosenne via Unicode) Date: Mon, 1 Jan 2018 08:06:27 +0000 Subject: Popular wordprocessors treating U+00A0 as fixed-width In-Reply-To: References: Message-ID: May we all please keep this discussion civil. People, being human, may sometimes make mistakes, but that does not necessarily justify calling them names. Best Regards, Jonathan Rosenne From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Philippe Verdy via Unicode Sent: Monday, January 01, 2018 5:43 AM To: Shriramana Sharma Cc: UnicoDe List Subject: Re: Popular wordprocessors treating U+00A0 as fixed-width Well it's unfortunate that Microsoft's own response (by its MSVP) is completely wrong, suggesting to use Narrow non-breaking space to get justification, which is exactly the reverse where these NNBSP should NOT be justified and keep their width. Microsoft's developers have absolutely misunderstood the standard where both SPACE and NBSP should really behave the same for justification (being different only for the existence of the break opportunity). This Microsoft response is completrrely supid, and it even breaks the classic typography for French use of NNBSP ("fine" in French) around some punctuations (before :;!?? or after ?) and as group separators in numbers (note that NNBSP was introduced in Unicode very late in the standard (and before that NBSP was used only because this was the only non-breaking space available but it was much too large!) Still many documents use NBSP instead of NNBSP around punctuations or as group separators (but in Word these contextual occurences of NBSP which are easy to detect, could have been autoreplaced when typesetting, or proposed as a correction in the integrated speller, at least for French). But the old behavior of old versions of Office (before NNBSP existed in Unicode) should have been cleaned up since long. It's clear that MS Office developers don't know the standards and do what they want (they also don't know the correct standards for maths in Excel and use a lot of very stupid assumptions, as if they were smarter than their users that suffer since long from these bugs !) and don't want to fix their past errors. 2018-01-01 3:14 GMT+01:00 Shriramana Sharma via Unicode >: While http://unicode.org/reports/tr14/ clearly states that: When expanding or compressing interword space according to common typographical practice, only the spaces marked by U+0020 SPACE and U+00A0 NO-BREAK SPACE are subject to compression, and only spaces marked by U+0020 SPACE, U+00A0 NO-BREAK SPACE, and occasionally spaces marked by U+2009 THIN SPACE are subject to expansion. All other space characters normally have fixed width. ? really sad to see the misunderstanding around U+00A0: https://answers.microsoft.com/en-us/msoffice/forum/msoffice_word-mso_windows8-mso_2016/nonbreakable-space-justification-in-word-2016/4fa1ad30-004c-454f-9775-a3beaa91c88b?auth=1 https://bugs.documentfoundation.org/show_bug.cgi?id=41652 -- Shriramana Sharma ???????????? ???????????? ???????????????????????? -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Jan 1 04:32:47 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Mon, 1 Jan 2018 11:32:47 +0100 Subject: Popular wordprocessors treating U+00A0 as fixed-width In-Reply-To: References: Message-ID: I do not call them by names, what I call is their reply, even when people explain them, and when they even suggest something else which is obviously wrong (and in fact absolutely not needed in Office which offers another way using styles for controling linebreaks without having to change the encoded character (a Word document has never been plain text, so I wonder why they even speak about compatibility by breaking another compatibility rule as a pseudo-workaround). 2018-01-01 9:06 GMT+01:00 Jonathan Rosenne via Unicode : > May we all please keep this discussion civil. People, being human, may > sometimes make mistakes, but that does not necessarily justify calling them > names. > > > > Best Regards, > > > > Jonathan Rosenne > > > > *From:* Unicode [mailto:unicode-bounces at unicode.org] *On Behalf Of *Philippe > Verdy via Unicode > *Sent:* Monday, January 01, 2018 5:43 AM > *To:* Shriramana Sharma > *Cc:* UnicoDe List > *Subject:* Re: Popular wordprocessors treating U+00A0 as fixed-width > > > > Well it's unfortunate that Microsoft's own response (by its MSVP) is > completely wrong, suggesting to use Narrow non-breaking space to get > justification, which is exactly the reverse where these NNBSP should NOT be > justified and keep their width. > > > > Microsoft's developers have absolutely misunderstood the standard where > both SPACE and NBSP should really behave the same for justification (being > different only for the existence of the break opportunity). > > > > This Microsoft response is completrrely supid, and it even breaks the > classic typography for French use of NNBSP ("fine" in French) around some > punctuations (before :;!?? or after ?) and as group separators in numbers > (note that NNBSP was introduced in Unicode very late in the standard (and > before that NBSP was used only because this was the only non-breaking space > available but it was much too large!) > > > > Still many documents use NBSP instead of NNBSP around punctuations or as > group separators (but in Word these contextual occurences of NBSP which are > easy to detect, could have been autoreplaced when typesetting, or proposed > as a correction in the integrated speller, at least for French). But the > old behavior of old versions of Office (before NNBSP existed in Unicode) > should have been cleaned up since long. > > > > It's clear that MS Office developers don't know the standards and do what > they want (they also don't know the correct standards for maths in Excel > and use a lot of very stupid assumptions, as if they were smarter than > their users that suffer since long from these bugs !) and don't want to fix > their past errors. > > > > 2018-01-01 3:14 GMT+01:00 Shriramana Sharma via Unicode < > unicode at unicode.org>: > > While http://unicode.org/reports/tr14/ clearly states that: > > > When expanding or compressing interword space according to common > typographical practice, only the spaces marked by U+0020 SPACE and > U+00A0 NO-BREAK SPACE are subject to compression, and only spaces > marked by U+0020 SPACE, U+00A0 NO-BREAK SPACE, and occasionally spaces > marked by U+2009 THIN SPACE are subject to expansion. All other space > characters normally have fixed width. > > > ? really sad to see the misunderstanding around U+00A0: > > https://answers.microsoft.com/en-us/msoffice/forum/msoffice_ > word-mso_windows8-mso_2016/nonbreakable-space-justification-in-word-2016/ > 4fa1ad30-004c-454f-9775-a3beaa91c88b?auth=1 > > https://bugs.documentfoundation.org/show_bug.cgi?id=41652 > > -- > Shriramana Sharma ???????????? ???????????? ???????????????????????? > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Jan 1 08:52:20 2018 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Mon, 1 Jan 2018 14:52:20 +0000 Subject: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10) In-Reply-To: References: Message-ID: <20180101145220.7334ba83@JRWUBU2> On Mon, 1 Jan 2018 13:24:29 +0530 Manish Goregaokar via Unicode wrote: > sounds very much like a > degenerate case to me. Generally yes, but I'm not sure that they'd be inappropriate for Egyptian hieroglyphs showing human beings. The choice of determinative can convey unpronounceable semantic information, though I'm not sure that that can be as sensitive as skin colour. However, in such a case it would also be appropriate to give a skin tone modifier the property Extend. Richard. From unicode at unicode.org Mon Jan 1 09:47:59 2018 From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode) Date: Mon, 1 Jan 2018 16:47:59 +0100 Subject: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10) In-Reply-To: <20180101145220.7334ba83@JRWUBU2> References: <20180101145220.7334ba83@JRWUBU2> Message-ID: This is an interesting suggestion, Manish. is a degenerate case, so if we following your suggestion we also could drop E_Base and E_Modifier, and rule GB10. Instead, we'd add one line to *Extend :* OLD Grapheme_Extend = Yes *and not* GCB = Virama NEW Grapheme_Extend = Yes, or Emoji characters listed as Emoji_Modifier=Yes in emoji-data.txt. See [UTS51 ]. *and not* GCB = Virama Note: we are already planning to get rid of the GAZ/EBG distinction ( http://www.unicode.org/reports/tr29/tr29-32.html#GB10) in any event. Mark On Mon, Jan 1, 2018 at 3:52 PM, Richard Wordingham via Unicode < unicode at unicode.org> wrote: > On Mon, 1 Jan 2018 13:24:29 +0530 > Manish Goregaokar via Unicode wrote: > > > sounds very much like a > > degenerate case to me. > > Generally yes, but I'm not sure that they'd be inappropriate for > Egyptian hieroglyphs showing human beings. The choice of determinative > can convey unpronounceable semantic information, though I'm not sure > that that can be as sensitive as skin colour. However, in such a case > it would also be appropriate to give a skin tone modifier the property > Extend. > > Richard. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Jan 2 03:21:37 2018 From: unicode at unicode.org (Asmus Freytag via Unicode) Date: Tue, 2 Jan 2018 01:21:37 -0800 Subject: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10) In-Reply-To: <20180101145220.7334ba83@JRWUBU2> References: <20180101145220.7334ba83@JRWUBU2> Message-ID: <9b4df777-189c-287a-fdfd-d9bf4e750d0e@ix.netcom.com> On 1/1/2018 6:52 AM, Richard Wordingham via Unicode wrote: > On Mon, 1 Jan 2018 13:24:29 +0530 > Manish Goregaokar via Unicode wrote: > >> sounds very much like a >> degenerate case to me. > Generally yes, but I'm not sure that they'd be inappropriate for > Egyptian hieroglyphs showing human beings. The choice of determinative > can convey unpronounceable semantic information, though I'm not sure > that that can be as sensitive as skin colour. However, in such a case > it would also be appropriate to give a skin tone modifier the property > Extend. They would be inappropriate because it's not part of the hieroglyphic writing system to make those distinctions. "Over expressiveness" is sometimes a problem rather than a feature when it comes to Unicode. A./ > > Richard. > From unicode at unicode.org Tue Jan 2 03:32:27 2018 From: unicode at unicode.org (Manish Goregaokar via Unicode) Date: Tue, 2 Jan 2018 15:02:27 +0530 Subject: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10) In-Reply-To: References: <20180101145220.7334ba83@JRWUBU2> Message-ID: > Note: we are already planning to get rid of the GAZ/EBG distinction ( http://www.unicode.org/reports/tr29/tr29-32.html#GB10) in any event. This is great! I hadn't noticed this when I last saw that draft (I was focusing on the Virama stuff). Good to know! > Instead, we'd add one line to *Extend :* Yeah, this is essentially what I was hoping we could do. Is there any way to formally propose this? Or is bringing it up here good enough? Thanks, -Manish On Mon, Jan 1, 2018 at 9:17 PM, Mark Davis ?? via Unicode < unicode at unicode.org> wrote: > This is an interesting suggestion, Manish. > > is a degenerate case, so if we > following your suggestion we also could drop E_Base and E_Modifier, and > rule GB10. > > Instead, we'd add one line to *Extend > :* > > OLD > Grapheme_Extend = Yes > *and not* GCB = Virama > > NEW > Grapheme_Extend = Yes, or > Emoji characters listed as Emoji_Modifier=Yes in emoji-data.txt. See [ > UTS51 ]. > *and not* GCB = Virama > > Note: we are already planning to get rid of the GAZ/EBG distinction ( > http://www.unicode.org/reports/tr29/tr29-32.html#GB10) in any event. > > Mark > > On Mon, Jan 1, 2018 at 3:52 PM, Richard Wordingham via Unicode < > unicode at unicode.org> wrote: > >> On Mon, 1 Jan 2018 13:24:29 +0530 >> Manish Goregaokar via Unicode wrote: >> >> > sounds very much like a >> > degenerate case to me. >> >> Generally yes, but I'm not sure that they'd be inappropriate for >> Egyptian hieroglyphs showing human beings. The choice of determinative >> can convey unpronounceable semantic information, though I'm not sure >> that that can be as sensitive as skin colour. However, in such a case >> it would also be appropriate to give a skin tone modifier the property >> Extend. >> >> Richard. >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Jan 2 03:41:48 2018 From: unicode at unicode.org (Manish Goregaokar via Unicode) Date: Tue, 2 Jan 2018 15:11:48 +0530 Subject: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10) In-Reply-To: References: <20180101145220.7334ba83@JRWUBU2>

Message-ID: In the current draft GB11 mentions Extended_Pictographic Extend* ZWJ x Extended_Pictographic. Can this similarly be distilled to just ZWJ x Extended_Pictographic? This does affect cases like or and I'm not certain if that counts as a degenerate case. If we do this then all of the rules except the flag emoji one become things which can be easily calculated with local information, which is nice for implementors. (Also in the current draft I think GB11 needs a `E_Modifier?` somewhere but if we merge that with Extend that's not going to be necessary anyway) -Manish On Tue, Jan 2, 2018 at 3:02 PM, Manish Goregaokar wrote: > > Note: we are already planning to get rid of the GAZ/EBG distinction ( > http://www.unicode.org/reports/tr29/tr29-32.html#GB10) in any event. > > > This is great! I hadn't noticed this when I last saw that draft (I was > focusing on the Virama stuff). Good to know! > > > > Instead, we'd add one line to > *Extend :* > > Yeah, this is essentially what I was hoping we could do. > > Is there any way to formally propose this? Or is bringing it up here good > enough? > > Thanks, > > -Manish > > On Mon, Jan 1, 2018 at 9:17 PM, Mark Davis ?? via Unicode < > unicode at unicode.org> wrote: > >> This is an interesting suggestion, Manish. >> >> is a degenerate case, so if we >> following your suggestion we also could drop E_Base and E_Modifier, and >> rule GB10. >> >> Instead, we'd add one line to *Extend >> :* >> >> OLD >> Grapheme_Extend = Yes >> *and not* GCB = Virama >> >> NEW >> Grapheme_Extend = Yes, or >> Emoji characters listed as Emoji_Modifier=Yes in emoji-data.txt. See [ >> UTS51 ]. >> *and not* GCB = Virama >> >> Note: we are already planning to get rid of the GAZ/EBG distinction ( >> http://www.unicode.org/reports/tr29/tr29-32.html#GB10) in any event. >> >> Mark >> >> On Mon, Jan 1, 2018 at 3:52 PM, Richard Wordingham via Unicode < >> unicode at unicode.org> wrote: >> >>> On Mon, 1 Jan 2018 13:24:29 +0530 >>> Manish Goregaokar via Unicode wrote: >>> >>> > sounds very much like a >>> > degenerate case to me. >>> >>> Generally yes, but I'm not sure that they'd be inappropriate for >>> Egyptian hieroglyphs showing human beings. The choice of determinative >>> can convey unpronounceable semantic information, though I'm not sure >>> that that can be as sensitive as skin colour. However, in such a case >>> it would also be appropriate to give a skin tone modifier the property >>> Extend. >>> >>> Richard. >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Jan 2 04:37:30 2018 From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode) Date: Tue, 2 Jan 2018 11:37:30 +0100 Subject: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10) In-Reply-To: References: <20180101145220.7334ba83@JRWUBU2>

Message-ID: > Or is bringing it up here good enough? You should submit a proposal, which you can do at https://www.unicode.org/reporting.html. It doesn't have to be much more than what you put in email. (A reminder for everyone here: This is simply a discussion list, and has no effect whatsoever unless someone submits a proposal for the UTC.) Mark On Tue, Jan 2, 2018 at 10:32 AM, Manish Goregaokar wrote: > > Note: we are already planning to get rid of the GAZ/EBG distinction ( > http://www.unicode.org/reports/tr29/tr29-32.html#GB10) in any event. > > > This is great! I hadn't noticed this when I last saw that draft (I was > focusing on the Virama stuff). Good to know! > > > > Instead, we'd add one line to > *Extend :* > > Yeah, this is essentially what I was hoping we could do. > > Is there any way to formally propose this? Or is bringing it up here good > enough? > > Thanks, > > -Manish > > On Mon, Jan 1, 2018 at 9:17 PM, Mark Davis ?? via Unicode < > unicode at unicode.org> wrote: > >> This is an interesting suggestion, Manish. >> >> is a degenerate case, so if we >> following your suggestion we also could drop E_Base and E_Modifier, and >> rule GB10. >> >> Instead, we'd add one line to *Extend >> :* >> >> OLD >> Grapheme_Extend = Yes >> *and not* GCB = Virama >> >> NEW >> Grapheme_Extend = Yes, or >> Emoji characters listed as Emoji_Modifier=Yes in emoji-data.txt. See [ >> UTS51 ]. >> *and not* GCB = Virama >> >> Note: we are already planning to get rid of the GAZ/EBG distinction ( >> http://www.unicode.org/reports/tr29/tr29-32.html#GB10) in any event. >> >> Mark >> >> On Mon, Jan 1, 2018 at 3:52 PM, Richard Wordingham via Unicode < >> unicode at unicode.org> wrote: >> >>> On Mon, 1 Jan 2018 13:24:29 +0530 >>> Manish Goregaokar via Unicode wrote: >>> >>> > sounds very much like a >>> > degenerate case to me. >>> >>> Generally yes, but I'm not sure that they'd be inappropriate for >>> Egyptian hieroglyphs showing human beings. The choice of determinative >>> can convey unpronounceable semantic information, though I'm not sure >>> that that can be as sensitive as skin colour. However, in such a case >>> it would also be appropriate to give a skin tone modifier the property >>> Extend. >>> >>> Richard. >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Jan 2 04:41:16 2018 From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode) Date: Tue, 2 Jan 2018 11:41:16 +0100 Subject: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10) In-Reply-To: References: <20180101145220.7334ba83@JRWUBU2>

Message-ID: We had that originally, but some people objected that some languages (Arabic, as I recall) can end a string of letters with a ZWJ, and immediately follow it by an emoji (without an intervening space) without wanting it to be joined into a grapheme cluster with a following symbol. While I personally consider that a degenerate case, we tightened the definition to prevent that. Mark Mark On Tue, Jan 2, 2018 at 10:41 AM, Manish Goregaokar wrote: > In the current draft GB11 mentions Extended_Pictographic Extend* ZWJ x > Extended_Pictographic. > > Can this similarly be distilled to just ZWJ x Extended_Pictographic? This > does affect cases like or letter, zwj, emoji> and I'm not certain if that counts as a degenerate > case. If we do this then all of the rules except the flag emoji one become > things which can be easily calculated with local information, which is nice > for implementors. > > (Also in the current draft I think GB11 needs a `E_Modifier?` somewhere > but if we merge that with Extend that's not going to be necessary anyway) > > -Manish > > On Tue, Jan 2, 2018 at 3:02 PM, Manish Goregaokar > wrote: > >> > Note: we are already planning to get rid of the GAZ/EBG distinction ( >> http://www.unicode.org/reports/tr29/tr29-32.html#GB10) in any event. >> >> >> This is great! I hadn't noticed this when I last saw that draft (I was >> focusing on the Virama stuff). Good to know! >> >> >> > Instead, we'd add one line to >> *Extend :* >> >> Yeah, this is essentially what I was hoping we could do. >> >> Is there any way to formally propose this? Or is bringing it up here good >> enough? >> >> Thanks, >> >> -Manish >> >> On Mon, Jan 1, 2018 at 9:17 PM, Mark Davis ?? via Unicode < >> unicode at unicode.org> wrote: >> >>> This is an interesting suggestion, Manish. >>> >>> is a degenerate case, so if we >>> following your suggestion we also could drop E_Base and E_Modifier, and >>> rule GB10. >>> >>> Instead, we'd add one line to *Extend >>> :* >>> >>> OLD >>> Grapheme_Extend = Yes >>> *and not* GCB = Virama >>> >>> NEW >>> Grapheme_Extend = Yes, or >>> Emoji characters listed as Emoji_Modifier=Yes in emoji-data.txt. See [ >>> UTS51 ]. >>> *and not* GCB = Virama >>> >>> Note: we are already planning to get rid of the GAZ/EBG distinction ( >>> http://www.unicode.org/reports/tr29/tr29-32.html#GB10) in any event. >>> >>> Mark >>> >>> On Mon, Jan 1, 2018 at 3:52 PM, Richard Wordingham via Unicode < >>> unicode at unicode.org> wrote: >>> >>>> On Mon, 1 Jan 2018 13:24:29 +0530 >>>> Manish Goregaokar via Unicode wrote: >>>> >>>> > sounds very much like a >>>> > degenerate case to me. >>>> >>>> Generally yes, but I'm not sure that they'd be inappropriate for >>>> Egyptian hieroglyphs showing human beings. The choice of determinative >>>> can convey unpronounceable semantic information, though I'm not sure >>>> that that can be as sensitive as skin colour. However, in such a case >>>> it would also be appropriate to give a skin tone modifier the property >>>> Extend. >>>> >>>> Richard. >>>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Jan 2 04:52:27 2018 From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode) Date: Tue, 2 Jan 2018 11:52:27 +0100 Subject: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10) In-Reply-To: References: <20180101145220.7334ba83@JRWUBU2>

Message-ID: BTW, relevant to this discussion is a proposal filed http://www.unicode.org/ L2/L2017/17434-emoji-rejex-uts51-def.pdf (The date is wrong, should be 2017-12-22) Mark On Tue, Jan 2, 2018 at 11:41 AM, Mark Davis ?? wrote: > We had that originally, but some people objected that some languages > (Arabic, as I recall) can end a string of letters with a ZWJ, and > immediately follow it by an emoji (without an intervening space) without > wanting it to be joined into a grapheme cluster with a following symbol. > While I personally consider that a degenerate case, we tightened the > definition to prevent that. > > Mark > > Mark > > On Tue, Jan 2, 2018 at 10:41 AM, Manish Goregaokar > wrote: > >> In the current draft GB11 mentions Extended_Pictographic Extend* ZWJ x >> Extended_Pictographic. >> >> Can this similarly be distilled to just ZWJ x Extended_Pictographic? This >> does affect cases like or > letter, zwj, emoji> and I'm not certain if that counts as a degenerate >> case. If we do this then all of the rules except the flag emoji one become >> things which can be easily calculated with local information, which is nice >> for implementors. >> >> (Also in the current draft I think GB11 needs a `E_Modifier?` somewhere >> but if we merge that with Extend that's not going to be necessary anyway) >> >> -Manish >> >> On Tue, Jan 2, 2018 at 3:02 PM, Manish Goregaokar >> wrote: >> >>> > Note: we are already planning to get rid of the GAZ/EBG distinction ( >>> http://www.unicode.org/reports/tr29/tr29-32.html#GB10) in any event. >>> >>> >>> This is great! I hadn't noticed this when I last saw that draft (I was >>> focusing on the Virama stuff). Good to know! >>> >>> >>> > Instead, we'd add one line to >>> *Extend :* >>> >>> Yeah, this is essentially what I was hoping we could do. >>> >>> Is there any way to formally propose this? Or is bringing it up here >>> good enough? >>> >>> Thanks, >>> >>> -Manish >>> >>> On Mon, Jan 1, 2018 at 9:17 PM, Mark Davis ?? via Unicode < >>> unicode at unicode.org> wrote: >>> >>>> This is an interesting suggestion, Manish. >>>> >>>> is a degenerate case, so if we >>>> following your suggestion we also could drop E_Base and E_Modifier, and >>>> rule GB10. >>>> >>>> Instead, we'd add one line to *Extend >>>> :* >>>> >>>> OLD >>>> Grapheme_Extend = Yes >>>> *and not* GCB = Virama >>>> >>>> NEW >>>> Grapheme_Extend = Yes, or >>>> Emoji characters listed as Emoji_Modifier=Yes in emoji-data.txt. See [ >>>> UTS51 ]. >>>> *and not* GCB = Virama >>>> >>>> Note: we are already planning to get rid of the GAZ/EBG distinction ( >>>> http://www.unicode.org/reports/tr29/tr29-32.html#GB10) in any event. >>>> >>>> Mark >>>> >>>> On Mon, Jan 1, 2018 at 3:52 PM, Richard Wordingham via Unicode < >>>> unicode at unicode.org> wrote: >>>> >>>>> On Mon, 1 Jan 2018 13:24:29 +0530 >>>>> Manish Goregaokar via Unicode wrote: >>>>> >>>>> > sounds very much like a >>>>> > degenerate case to me. >>>>> >>>>> Generally yes, but I'm not sure that they'd be inappropriate for >>>>> Egyptian hieroglyphs showing human beings. The choice of determinative >>>>> can convey unpronounceable semantic information, though I'm not sure >>>>> that that can be as sensitive as skin colour. However, in such a case >>>>> it would also be appropriate to give a skin tone modifier the property >>>>> Extend. >>>>> >>>>> Richard. >>>>> >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Jan 2 14:30:34 2018 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Tue, 2 Jan 2018 20:30:34 +0000 Subject: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10) In-Reply-To: <9b4df777-189c-287a-fdfd-d9bf4e750d0e@ix.netcom.com> References: <20180101145220.7334ba83@JRWUBU2> <9b4df777-189c-287a-fdfd-d9bf4e750d0e@ix.netcom.com> Message-ID: <20180102203034.1f25bbe2@JRWUBU2> On Tue, 2 Jan 2018 01:21:37 -0800 Asmus Freytag via Unicode wrote: > On 1/1/2018 6:52 AM, Richard Wordingham via Unicode wrote: > > Generally yes, but I'm not sure that they'd be inappropriate for > > Egyptian hieroglyphs showing human beings. The choice of > > determinative can convey unpronounceable semantic information, > > though I'm not sure that that can be as sensitive as skin colour. > > However, in such a case it would also be appropriate to give a skin > > tone modifier the property Extend. > They would be inappropriate because it's not part of the hieroglyphic > writing system to make those distinctions. If the distinction is kept to indisputable pictures, then that does keep it out of scope. It just occurred to me that the painter might choose the ethnically appropriate skin colour rather than just using the Egyptian skin colour. Richard. From unicode at unicode.org Tue Jan 2 14:55:47 2018 From: unicode at unicode.org (Doug Ewell via Unicode) Date: Tue, 2 Jan 2018 13:55:47 -0700 Subject: Non-RGI sequences are not emoji? (was: Re: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10)) In-Reply-To: References: Message-ID: Mark Davis wrote: > BTW, relevant to this discussion is a proposal filed > http://www.unicode.org/L2/L2017/17434-emoji-rejex-uts51-def.pdf (The > date is wrong, should be 2017-12-22) The phrase "emoji regex" had caused me to ignore this document, but I took a look based on this thread. It says "we still depend on the RGI test to filter the set of emoji sequences" and proposes that the EBNF in UTS #51 be simplified on the basis that only RGI sequences will pass the "possible emoji" test anyway. Thus it is true, as some people have said (i.e. in L2/17?382), that non-RGI sequences do not actually count as emoji, and therefore there is no way ? not merely no "recommended" way ? to represent the flags of entities such as Catalonia and Brittany. In 2016 I had asked for the emoji tag sequence mechanism for flags to be available for all CLDR subdivisions, not just three, with the understanding that the vast majority would not be supported by vendor glyphs. II t is unfortunate that, while the conciliatory name "recommended" was adopted for the three, the intent of "exclusively permitted" was retained. -- Doug Ewell | Thornton, CO, US | ewellic.org From unicode at unicode.org Wed Jan 3 02:29:14 2018 From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode) Date: Wed, 3 Jan 2018 09:29:14 +0100 Subject: Non-RGI sequences are not emoji? (was: Re: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10)) In-Reply-To: References: Message-ID: Thanks for your comments; you raise an excellent issue. There are valid sequences that are not RGI; a vendor can support additional emoji sequences (in particular, flags). So the wording in the doc isn't correct. It should do something like replace the use of "testing for RGI" by "testing for validity". The key areas involved in that are checking for the valid base+modifier combinations, valid RI pairs, and TAG sequences. The latter two involve testing based on the information applied in the appendix, while the valid base+modifiers are more regular and can be tested based on properties. Mark On Tue, Jan 2, 2018 at 9:55 PM, Doug Ewell via Unicode wrote: > Mark Davis wrote: > > BTW, relevant to this discussion is a proposal filed >> http://www.unicode.org/L2/L2017/17434-emoji-rejex-uts51-def.pdf (The >> date is wrong, should be 2017-12-22) >> > > The phrase "emoji regex" had caused me to ignore this document, but I took > a look based on this thread. It says "we still depend on the RGI test to > filter the set of emoji sequences" and proposes that the EBNF in UTS #51 be > simplified on the basis that only RGI sequences will pass the "possible > emoji" test anyway. > > Thus it is true, as some people have said (i.e. in L2/17?382), that > non-RGI sequences do not actually count as emoji, and therefore there is no > way ? not merely no "recommended" way ? to represent the flags of entities > such as Catalonia and Brittany. > > In 2016 I had asked for the emoji tag sequence mechanism for flags to be > available for all CLDR subdivisions, not just three, with the understanding > that the vast majority would not be supported by vendor glyphs. II t is > unfortunate that, while the conciliatory name "recommended" was adopted for > the three, the intent of "exclusively permitted" was retained. > > -- > Doug Ewell | Thornton, CO, US | ewellic.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Jan 3 03:16:36 2018 From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode) Date: Wed, 3 Jan 2018 10:16:36 +0100 Subject: Regex for Grapheme Cluster Breaks Message-ID: I had a UTC action to adjust http://www.unicode.org/reports/tr29/proposed.html#Table_Combining_Char_Sequences_and_Grapheme_Clusters to update the regex, and other necessary changes surrounding text. Here is what I've come up with for an EBNF formulation. The $x are the GCB properties. cluster = crlf | $Control | precore* core postcore* ; crlf = $CR $LF ; precore = $Prepend ; postcore = (?: virama-sequence | [$Extend $ZWJ $Virama $SpacingMark] ); core = (?: hangul-syllable | ri-sequence | xpicto-sequence | virama-sequence | [^$Control $CR $LF] ); hangul-syllable = $L* (?:$V+ | $LV $V* | $LVT) $T* | $L+ | $T+ ; ri-sequence = $RI $RI ; skin-sequence = $E_Base $E_Modifier ; xpicto-sequence = (?: skin-sequence | \p{Extended_Pictographic} ) (?: $Extend* $ZWJ (?: skin-sequence | \p{Extended_Pictographic} ))* ; virama-sequence = [$Virama $ZWJ] $LinkingConsonant ; ?I have tools to turn that into a (lovely) regex: \p{gcb=cr}\p{gcb=lf}|\p{gcb=control}|\p{gcb=Prepend}*(?:\p{gcb=l}*(?:\p{gcb=v}+|\p{gcb=lv}\p{gcb=v}*|\p{gcb=lvt})\p{gcb=t}*|\p{gcb=l}+|\p{gcb=t}+|\p{gcb=ri}\p{gcb=ri}|(?:\p{gcb=e_base}\p{gcb=E_Modifier}|\p{Extended_Pictographic})(?:\p{gcb=Extend}*\p{gcb=zwj}(?:\p{gcb=e_base}\p{gcb=E_Modifier}|\p{Extended_Pictographic}))*|[\p{gcb=Virama}\p{gcb=zwj}]\p{gcb=LinkingConsonant}|[^\p{gcb=control}\p{gcb=cr}\p{gcb=lf}])(?:[\p{gcb=Virama}\p{gcb=zwj}]\p{gcb=LinkingConsonant}|[\p{gcb=Extend}\p{gcb=zwj}\p{gcb=Virama}\p{gcb=SpacingMark}])* ? ?(It is a bit shorter if some more property names/values are abbreviated.) I then tested against the current test file: GraphemeBreakTest.txt. There is one outlying failure with that test file: 813) ???? hex: 261D 0308 1F3FB test: [0, 4] ebnf: [0, 2, 4] I believe that is a problem with the test rather than the BNF, but I need to track it down in any event. ?A regex is much easier for many applications to use than the current rule syntax, so I'm going to see if the other segmentations could be reformulated ?as ebnfs (ideally corresponding to regular grammars, or in the worst case, for PEGs). Feedback is welcome. ? Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Jan 3 04:38:17 2018 From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode) Date: Wed, 3 Jan 2018 11:38:17 +0100 Subject: Regex for Grapheme Cluster Breaks In-Reply-To: References: Message-ID: Quick update: Manish pointed out that I'd misstated one of the rules, should be: skin-sequence = $E_Base $Extend* $E_Modifier ; ?With that change, the test passes. (Thanks Manish!)? Mark On Wed, Jan 3, 2018 at 10:16 AM, Mark Davis ?? wrote: > I had a UTC action to adjust http://www.unicode.org/ > reports/tr29/proposed.html#Table_Combining_Char_Sequences_and_Grapheme_ > Clusters to update the regex, and other necessary changes surrounding > text. > > Here is what I've come up with for an EBNF formulation. The $x are the GCB > properties. > > cluster = crlf | $Control | precore* core postcore* ; > > > crlf = $CR $LF ; > > > precore = $Prepend ; > > > postcore = (?: virama-sequence | [$Extend $ZWJ $Virama $SpacingMark] ); > > > core = (?: hangul-syllable | ri-sequence | xpicto-sequence | virama-sequence > | [^$Control $CR $LF] ); > > > hangul-syllable = $L* (?:$V+ | $LV $V* | $LVT) $T* | $L+ | $T+ ; > > > ri-sequence = $RI $RI ; > > > > skin-sequence = $E_Base $E_Modifier ; > > > xpicto-sequence = (?: skin-sequence | \p{Extended_Pictographic} ) (?: > $Extend* $ZWJ (?: skin-sequence | \p{Extended_Pictographic} ))* ; > > > virama-sequence = [$Virama $ZWJ] $LinkingConsonant ; > > > ?I have tools to turn that into a (lovely) regex: > > \p{gcb=cr}\p{gcb=lf}|\p{gcb=control}|\p{gcb=Prepend}*(?:\ > p{gcb=l}*(?:\p{gcb=v}+|\p{gcb=lv}\p{gcb=v}*|\p{gcb=lvt})\p{ > gcb=t}*|\p{gcb=l}+|\p{gcb=t}+|\p{gcb=ri}\p{gcb=ri}|(?:\p{ > gcb=e_base}\p{gcb=E_Modifier}|\p{Extended_Pictographic})(?:\ > p{gcb=Extend}*\p{gcb=zwj}(?:\p{gcb=e_base}\p{gcb=E_Modifier}|\p{Extended_ > Pictographic}))*|[\p{gcb=Virama}\p{gcb=zwj}]\p{gcb= > LinkingConsonant}|[^\p{gcb=control}\p{gcb=cr}\p{gcb=lf}]) > (?:[\p{gcb=Virama}\p{gcb=zwj}]\p{gcb=LinkingConsonant}|[\p{ > gcb=Extend}\p{gcb=zwj}\p{gcb=Virama}\p{gcb=SpacingMark}])* > ? > ?(It is a bit shorter if some more property names/values are abbreviated.) > > I then tested against the current test file: GraphemeBreakTest.txt. There > is one outlying failure with that test file: > > 813) ???? > > hex: 261D 0308 1F3FB > > test: [0, 4] > > ebnf: [0, 2, 4] > > I believe that is a problem with the test rather than the BNF, but I need > to track it down in any event. > > ?A regex is much easier for many applications to use than the current rule > syntax, so I'm going to see if the other segmentations could be > reformulated ?as ebnfs (ideally corresponding to regular grammars, or in > the worst case, for PEGs). > > Feedback is welcome. > > ? > Mark > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Jan 3 11:15:39 2018 From: unicode at unicode.org (=?utf-8?Q? J.=C2=A0S._Choi ?= via Unicode) Date: Wed, 03 Jan 2018 10:15:39 -0700 Subject: W3C discussion: nullifying BCP47 tags for emoji presentation in HTML/XML Message-ID: <669D0643-49C3-44E2-86AF-6B59C43350DB@icloud.com> A discussion relevant to UTS 51: Unicode Emoji is occurring in the W3C?s CSS Working Group on GitHub at https://github.com/w3c/csswg-drafts/issues/2138. To review, the Consortium recently registered several BCP47 language-tag extension keys for specifying transliteration and text-vs.-emoji presentation such as ?en-u-em-emoji? (see http://blog.unicode.org/2016/03/cldr-version-29-released.html). Basically, the W3C and the major web-browser vendors are considering normatively forbidding any influence of Unicode?s BCP47 extensions on the presentation of emoji characters in HTML and XML, viewing them as currently little used and fully redundant to variation-selector characters and the CSS font-presentation property. The Consortium was the the originator of the BCP47 extensions and may have insight into their use cases; thus, those involved in registering the extensions may be interested in participating in this discussion, which is occurring on GitHub at https://github.com/w3c/csswg-drafts/issues/2138. So far, representatives from Google Chrome / Blink (Sascha Brawer), Microsoft Edge / Chakra (Sergey Malkin), Apple Safari / WebKit (Myles C. Maxfield), and W3C (Chris Lilley) have been participating. J. S. Choi -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Jan 5 05:30:55 2018 From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode) Date: Fri, 5 Jan 2018 12:30:55 +0100 Subject: Non-RGI sequences are not emoji? (was: Re: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10)) In-Reply-To: References: Message-ID: Doug, I modified my working draft, at https://docs.google.com/document/d/1EuNjbs0XrBwqlvCJxra44o3EVrwdBJUWsPf8Ec1fWKY If that looks ok, I'll submit. Thanks again for your comments. Mark Mark On Wed, Jan 3, 2018 at 9:29 AM, Mark Davis ?? wrote: > Thanks for your comments; you raise an excellent issue. There are valid > sequences that are not RGI; a vendor can support additional emoji sequences > (in particular, flags). So the wording in the doc isn't correct. > > It should do something like replace the use of "testing for RGI" by > "testing for validity". The key areas involved in that are checking for the > valid base+modifier combinations, valid RI pairs, and TAG sequences. The > latter two involve testing based on the information applied in the > appendix, while the valid base+modifiers are more regular and can be tested > based on properties. > > > Mark > > On Tue, Jan 2, 2018 at 9:55 PM, Doug Ewell via Unicode < > unicode at unicode.org> wrote: > >> Mark Davis wrote: >> >> BTW, relevant to this discussion is a proposal filed >>> http://www.unicode.org/L2/L2017/17434-emoji-rejex-uts51-def.pdf (The >>> date is wrong, should be 2017-12-22) >>> >> >> The phrase "emoji regex" had caused me to ignore this document, but I >> took a look based on this thread. It says "we still depend on the RGI test >> to filter the set of emoji sequences" and proposes that the EBNF in UTS #51 >> be simplified on the basis that only RGI sequences will pass the "possible >> emoji" test anyway. >> >> Thus it is true, as some people have said (i.e. in L2/17?382), that >> non-RGI sequences do not actually count as emoji, and therefore there is no >> way ? not merely no "recommended" way ? to represent the flags of entities >> such as Catalonia and Brittany. >> >> In 2016 I had asked for the emoji tag sequence mechanism for flags to be >> available for all CLDR subdivisions, not just three, with the understanding >> that the vast majority would not be supported by vendor glyphs. II t is >> unfortunate that, while the conciliatory name "recommended" was adopted for >> the three, the intent of "exclusively permitted" was retained. >> >> -- >> Doug Ewell | Thornton, CO, US | ewellic.org >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Sat Jan 6 18:08:45 2018 From: unicode at unicode.org (Paul Hoffman via Unicode) Date: Sat, 6 Jan 2018 16:08:45 -0800 Subject: Printed versions of Unicode v1 through v4 available Message-ID: Greetings. I am cleaning out my closet, and have printed versions of TUS v1 through v4 that I'm no longer interested in. If you want them and are willing to pay postage (US media mail rates are lowest), send me a note off-list. Otherwise, they will go the way of so many things in this world... --Paul Hoffman -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Sun Jan 7 18:32:47 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Mon, 8 Jan 2018 01:32:47 +0100 Subject: Printed versions of Unicode v1 through v4 available In-Reply-To: References: Message-ID: If you don't know what to do with your books (any kind), go to your local public library to give it there, or give it to a school, they may interest students. Such books are rarely found in primary schools but this may insterest them to get some supports and the earlier versions are simpler to sudy than recent versions and not all children have a suitable Internet to work with in better conditions than a poor smartphone. Yoy should only drop dialy newspapers or old magazines. Even students could use them for creating art and would be amazed to discover that there are more scripts than what they think or learn or may find interests in learning foreign languages because of these books. 2018-01-07 1:08 GMT+01:00 Paul Hoffman via Unicode : > Greetings. I am cleaning out my closet, and have printed versions of TUS > v1 through v4 that I'm no longer interested in. If you want them and are > willing to pay postage (US media mail rates are lowest), send me a note > off-list. Otherwise, they will go the way of so many things in this world... > > --Paul Hoffman > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Sun Jan 7 19:19:21 2018 From: unicode at unicode.org (Paul Hoffman via Unicode) Date: Sun, 7 Jan 2018 17:19:21 -0800 Subject: Printed versions of Unicode v1 through v4 available In-Reply-To: References:

Message-ID: Thanks, but folks have already spoken for them. Also, my local library is shedding this type of historical book, which is why I was looking for active Unicoders. --Paul Hoffman On Sun, Jan 7, 2018 at 4:32 PM, Philippe Verdy wrote: > If you don't know what to do with your books (any kind), go to your local > public library to give it there, or give it to a school, they may interest > students. Such books are rarely found in primary schools but this may > insterest them to get some supports and the earlier versions are simpler to > sudy than recent versions and not all children have a suitable Internet to > work with in better conditions than a poor smartphone. > Yoy should only drop dialy newspapers or old magazines. > Even students could use them for creating art and would be amazed to > discover that there are more scripts than what they think or learn or may > find interests in learning foreign languages because of these books. > > 2018-01-07 1:08 GMT+01:00 Paul Hoffman via Unicode : > >> Greetings. I am cleaning out my closet, and have printed versions of TUS >> v1 through v4 that I'm no longer interested in. If you want them and are >> willing to pay postage (US media mail rates are lowest), send me a note >> off-list. Otherwise, they will go the way of so many things in this world... >> >> --Paul Hoffman >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Jan 9 21:44:28 2018 From: unicode at unicode.org (Karl Sanders via Unicode) Date: Wed, 10 Jan 2018 04:44:28 +0100 Subject: Whitespace-related characters Message-ID: Hi all, I was looking at this page: https://en.wikipedia.org/wiki/Whitespace_character specifically at the 'Related whitespace characters without Unicode character property "WSpace=Y"' table. I was wondering: 1) Is there an official source for this table in the standard? I think not and hence the following two questions. 2) Are there any characters that you think are missing from the table or maybe there are some that don't belong there? 3) I wouldn't put the U+2800 and U+2063 code points into such a table. Would you? Regards, Karl From unicode at unicode.org Wed Jan 10 21:44:25 2018 From: unicode at unicode.org (jillian mestel via Unicode) Date: Wed, 10 Jan 2018 22:44:25 -0500 Subject: =?utf-8?Q?Emoji=E2=80=99s?= Message-ID: To whom it may concern, I was very disappointed to learn that there are no emojis of portraying a dominant left hand. I feel this is rude, and is setting this group of people apart, and disregarding them. There are emojis of all different races of right dominant hands, yet not left dominant hands are portrayed. I hope this can be fixed, and that leftys and rightys can be equals. ??????????????? From unicode at unicode.org Wed Jan 10 23:35:01 2018 From: unicode at unicode.org (Pierpaolo Bernardi via Unicode) Date: Thu, 11 Jan 2018 06:35:01 +0100 Subject: =?UTF-8?B?UmU6IEVtb2pp4oCZcw==?= In-Reply-To: References: Message-ID: On Thu, Jan 11, 2018 at 4:44 AM, jillian mestel via Unicode wrote: > To whom it may concern, > I was very disappointed to learn that there are no emojis of portraying a dominant left hand. I feel this is rude, and is setting this group of people apart, and disregarding them. There are emojis of all different races of right dominant hands, yet not left dominant hands are portrayed. I hope this can be fixed, and that leftys and rightys can be equals. > ??????????????? Then people with no hands will be discriminated. From unicode at unicode.org Thu Jan 11 03:56:06 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Thu, 11 Jan 2018 10:56:06 +0100 Subject: =?UTF-8?B?UmU6IEVtb2pp4oCZcw==?= In-Reply-To: References:

Message-ID: 2018-01-11 6:35 GMT+01:00 Pierpaolo Bernardi via Unicode < unicode at unicode.org>: > On Thu, Jan 11, 2018 at 4:44 AM, jillian mestel via Unicode > wrote: > > To whom it may concern, > > I was very disappointed to learn that there are no emojis of portraying > a dominant left hand. I feel this is rude, and is setting this group of > people apart, and disregarding them. There are emojis of all different > races of right dominant hands, yet not left dominant hands are portrayed. I > hope this can be fixed, and that leftys and rightys can be equals. > > ??????????????? > > Then people with no hands will be discriminated. Do you suggest those unable to use their hands should have their emojis with their right or left foot holding the pen ? Or with the pen in their mouth ? Or with their eyes followed by a camera and blinking to select letters/words to compose on a display ? or using seech-to-text processors ? There are lot of different handicaps with different solutions, and the first one is severe visual deficiency (or blindness), and severe intellectual deficiencies (from their birth, or after health accidents), where people can't read or distinguish the emojis or understand their differences, and will need assitance by an equipement or a third party. Think about the symbol for wheelchair: do you want to distinguish a "left-hand" and "right-hand" version (by mirroring), or a motorized version for those that can't push it with their hands, or a wheelbed for those that can't sit down ? These omissions in existing emojis are not "rude" or "discriminatory", they are just not requested for actual use ; what is really "rude" is the experienced handicaps, and what is "discriminatory" is how we accept (or refuse) to adapt our social life, common equipement and laws, to improve the coexistence of people with and without these deficiencies. -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Jan 11 04:07:23 2018 From: unicode at unicode.org (Pierpaolo Bernardi via Unicode) Date: Thu, 11 Jan 2018 11:07:23 +0100 Subject: =?UTF-8?Q?Re:_Emoji=E2=80=99s?= Message-ID: <2icg7ha1v1k4uanc63s3p3l9.1515665243457@email.android.com> Il giorno 11 gennaio 2018, alle ore 10:56, Philippe Verdy ha scritto: > > >2018-01-11 6:35 GMT+01:00 Pierpaolo Bernardi via Unicode : > >On Thu, Jan 11, 2018 at 4:44 AM, jillian mestel via Unicode > wrote: >> To whom it may concern, >> I was very disappointed to learn that there are no emojis of portraying a dominant left hand. I feel this is rude, and is setting this group of people apart, and disregarding them. There are emojis of all different races of right dominant hands, yet not left dominant hands are portrayed. I hope this can be fixed, and that leftys and rightys can be equals. >> ??????????????? > >Then people with no hands will be discriminated. > >? > >Do you suggest those unable to use their hands should have their emojis with their right or left foot holding the pen ? No. Where did you get this idea from? -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Jan 11 05:30:39 2018 From: unicode at unicode.org (=?UTF-8?Q?Christoph_P=C3=A4per?= via Unicode) Date: Thu, 11 Jan 2018 12:30:39 +0100 (CET) Subject: =?UTF-8?Q?Re:_Emoji=E2=80=99s?= In-Reply-To: References: Message-ID: <1258322087.13885.1515670239525@ox.hosteurope.de> jillian mestel: > > I was very disappointed to learn that there are no emojis of portraying a dominant left hand. See for the general emoji proposal process. This would actually not need a new character being assigned a code point, because existing ?? U+1F58E could be reused to contrast with ?? U+270D. It would just need the Emoji property being set which can be done with any update to UTS#51. UTS#51 11.0 (beta) introduces ZWJ sequences with left and right arrows (?? U+2B05, ?? U+27A1) as suffixed determiners to explicitly indicate directional orientation, but this would be an inappropriate solution for this case. The custom emoji sets by Samsung and LG already include colorful graphics for U+1F58E. UTC should adopt a policy that grants any pictographic character the Emoji property if it is supported by at least two major vendors. ("Major vendor" would need a proper definition.) These 20 characters would be affected at the moment if my records are correct and complete: - U+2610 ?: BALLOT BOX - U+2612 ?: BALLOT BOX WITH X - U+261C ?: WHITE LEFT POINTING INDEX [L2/17-421] - U+261E ?: WHITE RIGHT POINTING INDEX [L2/17-421] - U+261F ?: WHITE DOWN POINTING INDEX [L2/17-421] - U+1F323 ??: WHITE SUN - U+1F544 ??: NOTCHED RIGHT SEMICIRCLE WITH THREE DOTS - U+1F546 ??: WHITE LATIN CROSS - U+1F547 ??: HEAVY LATIN CROSS - U+1F568 ??: RIGHT SPEAKER - U+1F569 ??: RIGHT SPEAKER WITH ONE SOUND WAVE - U+1F56A ??: RIGHT SPEAKER WITH THREE SOUND WAVES - U+1F56D ??: RINGING BELL [L2/17-240] - U+1F58E ??: LEFT WRITING HAND - U+1F591 ??: REVERSED RAISED HAND WITH FINGERS SPLAYED - U+1F592 ??: REVERSED THUMBS UP SIGN - U+1F593 ??: REVERSED THUMBS DOWN SIGN - U+1F5E2 ??: LIPS - U+1F6C6 ??: TRIANGLE WITH ROUNDED CORNERS - U+1F6C7 ??: PROHIBITED SIGN [L2/17-240]: http://www.unicode.org/L2/L2017/17240-ringing-bell-chg.pdf [L2/17-421]: http://www.unicode.org/L2/L2017/17421r-emoji-changes.pdf From unicode at unicode.org Thu Jan 11 22:53:26 2018 From: unicode at unicode.org (Manish Goregaokar via Unicode) Date: Fri, 12 Jan 2018 10:23:26 +0530 Subject: =?UTF-8?B?UmU6IEVtb2pp4oCZcw==?= In-Reply-To: <1258322087.13885.1515670239525@ox.hosteurope.de> References: <1258322087.13885.1515670239525@ox.hosteurope.de> Message-ID: I submitted a proposal to emojify the left writing hand code point. -Manish On Thu, Jan 11, 2018 at 5:00 PM, Christoph P?per via Unicode < unicode at unicode.org> wrote: > jillian mestel: > > > > I was very disappointed to learn that there are no emojis of portraying > a dominant left hand. > > See for the general emoji > proposal process. This would actually not need a new character being > assigned a code point, because existing ?? U+1F58E could be reused to > contrast with ?? U+270D. It would just need the Emoji property being set > which can be done with any update to UTS#51. > > UTS#51 11.0 (beta) introduces ZWJ sequences with left and right arrows (?? > U+2B05, ?? U+27A1) as suffixed determiners to explicitly indicate > directional orientation, but this would be an inappropriate solution for > this case. > > The custom emoji sets by Samsung and LG already include colorful graphics > for U+1F58E. UTC should adopt a policy that grants any pictographic > character the Emoji property if it is supported by at least two major > vendors. ("Major vendor" would need a proper definition.) These 20 > characters would be affected at the moment if my records are correct and > complete: > > - U+2610 ?: BALLOT BOX > - U+2612 ?: BALLOT BOX WITH X > - U+261C ?: WHITE LEFT POINTING INDEX [L2/17-421] > - U+261E ?: WHITE RIGHT POINTING INDEX [L2/17-421] > - U+261F ?: WHITE DOWN POINTING INDEX [L2/17-421] > - U+1F323 ??: WHITE SUN > - U+1F544 ??: NOTCHED RIGHT SEMICIRCLE WITH THREE DOTS > - U+1F546 ??: WHITE LATIN CROSS > - U+1F547 ??: HEAVY LATIN CROSS > - U+1F568 ??: RIGHT SPEAKER > - U+1F569 ??: RIGHT SPEAKER WITH ONE SOUND WAVE > - U+1F56A ??: RIGHT SPEAKER WITH THREE SOUND WAVES > - U+1F56D ??: RINGING BELL [L2/17-240] > - U+1F58E ??: LEFT WRITING HAND > - U+1F591 ??: REVERSED RAISED HAND WITH FINGERS SPLAYED > - U+1F592 ??: REVERSED THUMBS UP SIGN > - U+1F593 ??: REVERSED THUMBS DOWN SIGN > - U+1F5E2 ??: LIPS > - U+1F6C6 ??: TRIANGLE WITH ROUNDED CORNERS > - U+1F6C7 ??: PROHIBITED SIGN > > [L2/17-240]: http://www.unicode.org/L2/L2017/17240-ringing-bell-chg.pdf > [L2/17-421]: http://www.unicode.org/L2/L2017/17421r-emoji-changes.pdf > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Sat Jan 13 11:14:13 2018 From: unicode at unicode.org (Henri Sivonen via Unicode) Date: Sat, 13 Jan 2018 19:14:13 +0200 Subject: PDF restrictions on the Unicode Standard 10.0 Message-ID: I was reading https://www.unicode.org/versions/Unicode10.0.0/UnicodeStandard-10.0.pdf on a Sony Digital Paper device and tried to scribble some notes and make highlights but I couldn't. I still couldn't after ensuring that the pen was charged and could write on other PDFs. Since Evince told me just "Security: No", since the Digital Paper's UI for designating non-editability is easy to miss and since there's no password required to open the file, it took me non-trivial time to figure out what was going on. Upon examining the PDF in Acrobat Reader, it turned out that even though the PDF can be viewed, printed and copied from without artificial restrictions, there are various restriction bits set for modifying the file. (Screenshot: https://hsivonen.fi/screen/unicode-pdf-restrictions.png ) It doesn't make sense to me that the Consortium restricts me from adding highlights or handwriting if I open the Standard on an e-Ink device even though I can do those things if I print the PDF. I'd like to request that going forward the Consortium refrain from using restriction bits or any "security" on the PDFs it publishes. -- Henri Sivonen hsivonen at hsivonen.fi https://hsivonen.fi/ From unicode at unicode.org Mon Jan 15 21:25:01 2018 From: unicode at unicode.org (Eric Muller via Unicode) Date: Mon, 15 Jan 2018 19:25:01 -0800 Subject: 0027, 02BC, 2019, or a new character? Message-ID: <175e07ea-9092-6c22-9bb4-3d817fa37dbe@efele.net> https://www.nytimes.com/2018/01/15/world/asia/kazakhstan-alphabet-nursultan-nazarbayev.html Eric. From unicode at unicode.org Mon Jan 15 21:57:13 2018 From: unicode at unicode.org (Doug Ewell via Unicode) Date: Mon, 15 Jan 2018 20:57:13 -0700 Subject: Non-RGI sequences are not emoji? (was: Re: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10)) In-Reply-To: References:

Message-ID: On January 5, Mark Davis wrote: > Doug, I modified my working draft, at > https://docs.google.com/document/d/1EuNjbs0XrBwqlvCJxra44o3EVrwdBJUWsPf8Ec1fWKY > > If that looks ok, I'll submit. Sorry for the delay. The text substitutions look fine. -- Doug Ewell | Thornton, CO, US | ewellic.org From unicode at unicode.org Mon Jan 15 22:16:21 2018 From: unicode at unicode.org (James Kass via Unicode) Date: Mon, 15 Jan 2018 20:16:21 -0800 Subject: 0027, 02BC, 2019, or a new character? In-Reply-To: <175e07ea-9092-6c22-9bb4-3d817fa37dbe@efele.net> References: <175e07ea-9092-6c22-9bb4-3d817fa37dbe@efele.net> Message-ID: It will probably be the ASCII apostrophe. The stated intent favors the apostrophe over diacritics or special characters to ensure that the language can be input to computers with standard keyboards. From unicode at unicode.org Mon Jan 15 23:55:32 2018 From: unicode at unicode.org (Pravin Jain via Unicode) Date: Tue, 16 Jan 2018 11:25:32 +0530 Subject: 0027, 02BC, 2019, or a new character? In-Reply-To: References: <175e07ea-9092-6c22-9bb4-3d817fa37dbe@efele.net> Message-ID: new characters can always be left to proper input methods being available, I am not sure, but I feel over use of apostrophes can lead to ambiguity. On Tue, Jan 16, 2018 at 9:46 AM, James Kass via Unicode wrote: > It will probably be the ASCII apostrophe. The stated intent favors > the apostrophe over diacritics or special characters to ensure that > the language can be input to computers with standard keyboards. > -- Pravin Jain (M)+91-9426054269 -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Jan 16 01:40:15 2018 From: unicode at unicode.org (James Kass via Unicode) Date: Mon, 15 Jan 2018 23:40:15 -0800 Subject: 0027, 02BC, 2019, or a new character? In-Reply-To: References: <175e07ea-9092-6c22-9bb4-3d817fa37dbe@efele.net> Message-ID: It's possible that the ruler of Kazakhstan, who is guiding this script change movement, is unaware of modern, proper input methods. Avoiding ambiguity was the reason given for the government's rejection of ASCII-Latin digraphs; it was thought that, for example, English language students might become confused by a phonetic difference between the same digraph as used in English versus Kazakh. On a side note, wouldn't most of the "standard keyboards" currently in Kazakhstan be labelled in Cyrillic anyway? More info on Kazakh's writing system history: http://www.omniglot.com/writing/kazakh.htm From unicode at unicode.org Tue Jan 16 02:00:19 2018 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Tue, 16 Jan 2018 08:00:19 +0000 Subject: 0027, 02BC, 2019, or a new character? In-Reply-To: References: <175e07ea-9092-6c22-9bb4-3d817fa37dbe@efele.net> Message-ID: <20180116080019.2738554a@JRWUBU2> On Mon, 15 Jan 2018 20:16:21 -0800 James Kass via Unicode wrote: > It will probably be the ASCII apostrophe. The stated intent favors > the apostrophe over diacritics or special characters to ensure that > the language can be input to computers with standard keyboards. Typing U+0027 into a word processor takes planning. Of the three, it should obviously be the modifier letter U+02BC, but I think what gets stored will be U+0027 or the single quotation mark U+2019. However, we shouldn't overlook the diacritic mark U+0315 COMBINING COMMA ABOVE RIGHT. Richard. From unicode at unicode.org Tue Jan 16 02:10:19 2018 From: unicode at unicode.org (Shriramana Sharma via Unicode) Date: Tue, 16 Jan 2018 13:40:19 +0530 Subject: 0027, 02BC, 2019, or a new character? In-Reply-To: References: <175e07ea-9092-6c22-9bb4-3d817fa37dbe@efele.net> Message-ID: Rejecting the digraph method (which is probably the simplest) doesn't have much meaning because they have different sounds in different languages all the time like ch in English and German. Anyhow, it certainly can be difficult convincing non technical political people. Modifier letters are more legible than modifier punctuation IMO so that maybe an option And the labels on keycaps don't mean anything at all. We in India use the plain QWERTY keyboard all the time for our scripts. In any case, the linguistic committee should present their recommendation along with a new set of actual keycaps and an MSKLC or such input method to just show the president that what is recommended can be input using "a standard keyboard". -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Jan 16 02:46:19 2018 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Tue, 16 Jan 2018 08:46:19 +0000 Subject: 0027, 02BC, 2019, or a new character? In-Reply-To: References: <175e07ea-9092-6c22-9bb4-3d817fa37dbe@efele.net> Message-ID: <20180116084619.58d0c5a7@JRWUBU2> On Mon, 15 Jan 2018 23:40:15 -0800 James Kass via Unicode wrote: > On a side note, wouldn't most of the "standard keyboards" currently in > Kazakhstan be labelled in Cyrillic anyway? They're probably already labelled in Cyrillic *and* printable ASCII (US QWERTY). Using the Cyrillic labels for non-ASCII Latin Kazakh would cause utter confusion. Richard. From unicode at unicode.org Wed Jan 17 07:06:26 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Wed, 17 Jan 2018 14:06:26 +0100 Subject: 0027, 02BC, 2019, or a new character? In-Reply-To: References: <175e07ea-9092-6c22-9bb4-3d817fa37dbe@efele.net>

Message-ID: Excessive digrams based on (non-combining) apostrophes will create numerous problems. The only case I know that uses an apostrophe in a polygram is the trigram "c'h" used in Breton, where it is used to differentiate it from "ch" (but here also it would have been simpler to use another digraph, such as "sh", or a diacritic but Bretons wanted to use the diacritics available in Frencdh which has no diacritic on consonants except "?" with the cedilla which could have been used there, and the tilde in "?"). The "c'h" trigram in Breton however causes less problems because it is not final and within a pair where it is unlikely to mark an elision between two words. But now Kazakh will have difficulties to mark elisions, and will also have problem to allow distinctive quotations I hope they will never have cases like: 's'a'n'd'' with pairs of apostrophes at end and it would have been better readable to see: '????'. Using the caron diacritic, typical in Eastern European languages, would have also done the trick over consonnants, while preserving the possibility to capitalize letters: a single diacritic was easy to map on keyboards. Adding the diaeresis or macron, or even the acute for the long vowels would have also done the trick with the second diacritic. But here Kazakh has some turkic origin and solutions based on other turkic alphabets could have been used. But may be they did not like the compelxity of Turkish for dotless vs. dotted "i". But a few diacritics could have helped without having to use custom ligatures or digrams. Now I think that these proposed non-combining apostrophes will evolve to combining acute accents (the most widely used diacritic in Latin in most languages): it will make the texts actually more readable. 2018-01-16 9:10 GMT+01:00 Shriramana Sharma via Unicode : > Rejecting the digraph method (which is probably the simplest) doesn't have > much meaning because they have different sounds in different languages all > the time like ch in English and German. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Jan 17 16:11:09 2018 From: unicode at unicode.org (=?UTF-8?Q?Christoph_P=C3=A4per?= via Unicode) Date: Wed, 17 Jan 2018 23:11:09 +0100 (CET) Subject: 0027, 02BC, 2019, or a new character? In-Reply-To: References: <175e07ea-9092-6c22-9bb4-3d817fa37dbe@efele.net> Message-ID: <1563876561.75365.1516227071367@ox.hosteurope.de> James Kass via Unicode : > > It will probably be the ASCII apostrophe. The stated intent favors > the apostrophe over diacritics or special characters to ensure that > the language can be input to computers with standard keyboards. Yes, this can only mean U+0027, but apparently official material, in MS Word format, shows the curly apostrophe punctuation mark U+2019 instead. There is probably no doubt among list subscribers that U+02BC should be used for any apostrophe that works like a proper letter. embeds , and both are quoted in . Cyrl Latn-kz Latn ? A?/A' ?/? ? G?/G' ?/? ?/? I?/I' ?/? ? N?/N' ?/? ? O?/O' ?/? ? Y?/Y' W ? U?/U' ? ? C?/C' ?/Ch ? S?/S' ?/Sh I sympathize with the ease of input argument, but input (keys) does neither have to equate storage (characters) nor output (glyphs). Furthermore, all orthographies should (and many constructed ones don't) respect that almost all text is read more often and by more people than it is written by, thus reader experience is more important than writer experience. Whether you use - a single dead key that has to be typed before the corresponding letter without diacritic marks or - a combinator key (e.g. AltGr) that must be kept pressed while the base is typed or - a secondary selection that appears when the base letter's key is hold down longer or - separate keys for each letter outside the MRA, the best solution depends on the hardware, software and, of course, the writing system, i.e. how frequently and prominently these letters occur. From unicode at unicode.org Wed Jan 17 20:30:57 2018 From: unicode at unicode.org (Mark E. Shoulson via Unicode) Date: Wed, 17 Jan 2018 21:30:57 -0500 Subject: Observations and rants Message-ID: <289f25b5-8754-4b9e-4256-51667ece2948@kli.org> An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Jan 18 02:21:27 2018 From: unicode at unicode.org (Andre Schappo via Unicode) Date: Thu, 18 Jan 2018 08:21:27 +0000 Subject: 0027, 02BC, 2019, or a new character? In-Reply-To: <20180116080019.2738554a@JRWUBU2> References: <175e07ea-9092-6c22-9bb4-3d817fa37dbe@efele.net> <20180116080019.2738554a@JRWUBU2> Message-ID: <57DC0C82-2C14-43B3-BED7-5C5C03F0FCAA@lboro.ac.uk> On 16 Jan 2018, at 08:00, Richard Wordingham via Unicode > wrote: On Mon, 15 Jan 2018 20:16:21 -0800 James Kass via Unicode > wrote: It will probably be the ASCII apostrophe. The stated intent favors the apostrophe over diacritics or special characters to ensure that the language can be input to computers with standard keyboards. Typing U+0027 into a word processor takes planning. Of the three, it should obviously be the modifier letter U+02BC, but I think what gets stored will be U+0027 or the single quotation mark U+2019. However, we shouldn't overlook the diacritic mark U+0315 COMBINING COMMA ABOVE RIGHT. Richard. I have just tested twitter hashtags and as one would expect, U+02BC does not break hashtags. See twitter.com/andreschappo/status/953903964722024448 Andr? Schappo -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Jan 18 05:00:35 2018 From: unicode at unicode.org (Andre Schappo via Unicode) Date: Thu, 18 Jan 2018 11:00:35 +0000 Subject: 0027, 02BC, 2019, or a new character? In-Reply-To: <57DC0C82-2C14-43B3-BED7-5C5C03F0FCAA@lboro.ac.uk> References: <175e07ea-9092-6c22-9bb4-3d817fa37dbe@efele.net> <20180116080019.2738554a@JRWUBU2> <57DC0C82-2C14-43B3-BED7-5C5C03F0FCAA@lboro.ac.uk> Message-ID: <713142F1-22AF-479B-9DD8-9A317EBD608B@lboro.ac.uk> On 18 Jan 2018, at 08:21, Andre Schappo via Unicode > wrote: On 16 Jan 2018, at 08:00, Richard Wordingham via Unicode > wrote: On Mon, 15 Jan 2018 20:16:21 -0800 James Kass via Unicode > wrote: It will probably be the ASCII apostrophe. The stated intent favors the apostrophe over diacritics or special characters to ensure that the language can be input to computers with standard keyboards. Typing U+0027 into a word processor takes planning. Of the three, it should obviously be the modifier letter U+02BC, but I think what gets stored will be U+0027 or the single quotation mark U+2019. However, we shouldn't overlook the diacritic mark U+0315 COMBINING COMMA ABOVE RIGHT. Richard. I have just tested twitter hashtags and as one would expect, U+02BC does not break hashtags. See twitter.com/andreschappo/status/953903964722024448 ...and, just in case twitter.com/andreschappo/status/953944089896083456 Andr? Schappo -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Jan 18 08:55:52 2018 From: unicode at unicode.org (Shriramana Sharma via Unicode) Date: Thu, 18 Jan 2018 20:25:52 +0530 Subject: Emoji for major planets at least? Message-ID: Hello people. We have sun, earth and moon emoji (3 for the earth and more for the moon's phases). But we don't have emoji for the rest of the planets. We have astrological symbols for all the planets and a few non-existent imaginary "planets" as well. Given this, would it be impractical to encode proper emoji characters for the rest of the planets, at least the major ones whose physical characteristics are well known and identifiable? I mean for example identifying Sedna and Quaoar (https://en.wikipedia.org/wiki/File:EightTNOs.png) is probably not going to be practical for all those other than astronomy buffs but the physical shapes of the major planets are known to all high school students? -- Shriramana Sharma ???????????? ???????????? ???????????????????????? From unicode at unicode.org Thu Jan 18 09:38:06 2018 From: unicode at unicode.org (Andre Schappo via Unicode) Date: Thu, 18 Jan 2018 15:38:06 +0000 Subject: 0027, 02BC, 2019, or a new character? In-Reply-To: <57DC0C82-2C14-43B3-BED7-5C5C03F0FCAA@lboro.ac.uk> References: <175e07ea-9092-6c22-9bb4-3d817fa37dbe@efele.net> <20180116080019.2738554a@JRWUBU2> <57DC0C82-2C14-43B3-BED7-5C5C03F0FCAA@lboro.ac.uk> Message-ID: On 18 Jan 2018, at 08:21, Andre Schappo via Unicode > wrote: On 16 Jan 2018, at 08:00, Richard Wordingham via Unicode > wrote: On Mon, 15 Jan 2018 20:16:21 -0800 James Kass via Unicode > wrote: It will probably be the ASCII apostrophe. The stated intent favors the apostrophe over diacritics or special characters to ensure that the language can be input to computers with standard keyboards. Typing U+0027 into a word processor takes planning. Of the three, it should obviously be the modifier letter U+02BC, but I think what gets stored will be U+0027 or the single quotation mark U+2019. However, we shouldn't overlook the diacritic mark U+0315 COMBINING COMMA ABOVE RIGHT. Richard. I have just tested twitter hashtags and as one would expect, U+02BC does not break hashtags. See twitter.com/andreschappo/status/953903964722024448 I have done a bit more investigation and as a result have written a short blog article ? schappo.blogspot.co.uk/2018/01/computer-science-internationalization_18.html Andr? Schappo -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Jan 18 11:44:05 2018 From: unicode at unicode.org (Asmus Freytag via Unicode) Date: Thu, 18 Jan 2018 09:44:05 -0800 Subject: Emoji for major planets at least? In-Reply-To: References: Message-ID: <292f1f76-d469-04cf-6cca-26325084e68a@ix.netcom.com> An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Jan 18 12:01:43 2018 From: unicode at unicode.org (John H. Jenkins via Unicode) Date: Thu, 18 Jan 2018 11:01:43 -0700 Subject: Emoji for major planets at least? In-Reply-To: <292f1f76-d469-04cf-6cca-26325084e68a@ix.netcom.com> References: <292f1f76-d469-04cf-6cca-26325084e68a@ix.netcom.com> Message-ID: Well, you can go with Venus = white planet, Mercury = grey planet, Uranus = greenish planet, Neptune = bluish planet, Jupiter = striped planet. As you say, though, without a context, none of them convey much and Venus, at least, would just be a circle. Plus there's the question of the context in which someone would want to send little pictures of the planets. This sounds like it would be adding emoji just because. > On Jan 18, 2018, at 10:44 AM, Asmus Freytag via Unicode wrote: > > On 1/18/2018 6:55 AM, Shriramana Sharma via Unicode wrote: >> Hello people. >> >> We have sun, earth and moon emoji (3 for the earth and more for the >> moon's phases). But we don't have emoji for the rest of the planets. >> >> We have astrological symbols for all the planets and a few >> non-existent imaginary "planets" as well. >> >> Given this, would it be impractical to encode proper emoji characters >> for the rest of the planets, at least the major ones whose physical >> characteristics are well known and identifiable? >> >> I mean for example identifying Sedna and Quaoar >> (https://en.wikipedia.org/wiki/File:EightTNOs.png ) is probably not >> going to be practical for all those other than astronomy buffs but the >> physical shapes of the major planets are known to all high school >> students? >> > Earth = blue planet (with clouds) > > Mars = red planet > > Saturn = planet with rings > > I don't think any of the other ones are identifiable in a context-free setting, unless you draw a "big planet with red dot" for Jupiter. > > Earth would have to be depicted in a way that doesn't focus on "hemispheres", or you miss the idea of it as "planet". > > > > A./ > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Jan 18 12:46:24 2018 From: unicode at unicode.org (Asmus Freytag (c) via Unicode) Date: Thu, 18 Jan 2018 10:46:24 -0800 Subject: Emoji for major planets at least? In-Reply-To: References: <292f1f76-d469-04cf-6cca-26325084e68a@ix.netcom.com> Message-ID: <9c059394-b594-82c1-1c24-a68bf890ed5f@ix.netcom.com> On 1/18/2018 10:01 AM, John H. Jenkins wrote: > Well, you can go with Venus = white planet, Mercury = grey planet, > Uranus = greenish planet, Neptune = bluish planet, Jupiter = striped > planet. > > As you say, though, without a context, none of them convey much and > Venus, at least, would just be a circle. > > Plus there's the question of the context in which someone would want > to send little pictures of the planets. This sounds like it would be > adding emoji just because. "Earth" as in "a blue ball in space" is something that reached iconic status after the famous photo taken during the early Apollo missions. I could definitely see that used in a variety of possible contexts. And the recognition value is higher than for many recent emoji. Saturn, with its rings (even though it's no longer the only one known with rings) also is iconic and highly recognizable. I lack imagination as to when someone would want to use it in communication, but I have the same issue with quite a few recent emoji, some of which are far less iconic or recognizable. I think it does lend itself to describe a "non-earth" type planet, or even the generic idea of a planet (as opposed to a star/sun). Mars and Venus have tons of connotations, which could be expressed by using an emoji (as opposed to the astrological symbol for each), but only Mars is reasonably recognizable without lots of pre-established context. That red color. In a detailed enough rendering, Jupiter, as a shaded "ball" with stripes and red dot would more recognizable than any of the remaining planets (on par or better with many recent emoji), but I see even less scope for using it metaphorically or in extended contexts. If someone were to make a proposal, I would suggest to them to limit it to these four and to provide more of a suggestion as to how these might show up in use. A./ > >> On Jan 18, 2018, at 10:44 AM, Asmus Freytag via Unicode >> > wrote: >> >> On 1/18/2018 6:55 AM, Shriramana Sharma via Unicode wrote: >>> Hello people. >>> >>> We have sun, earth and moon emoji (3 for the earth and more for the >>> moon's phases). But we don't have emoji for the rest of the planets. >>> >>> We have astrological symbols for all the planets and a few >>> non-existent imaginary "planets" as well. >>> >>> Given this, would it be impractical to encode proper emoji characters >>> for the rest of the planets, at least the major ones whose physical >>> characteristics are well known and identifiable? >>> >>> I mean for example identifying Sedna and Quaoar >>> (https://en.wikipedia.org/wiki/File:EightTNOs.png) is probably not >>> going to be practical for all those other than astronomy buffs but the >>> physical shapes of the major planets are known to all high school >>> students? >>> >> Earth = blue planet (with clouds) >> >> Mars = red planet >> >> Saturn = planet with rings >> >> I don't think any of the other ones are identifiable in a >> context-free setting, unless you draw a "big planet with red dot" for >> Jupiter. >> >> Earth would have to be depicted in a way that doesn't focus on >> "hemispheres", or you miss the idea of it as "planet". >> >> >> A./ >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Jan 18 12:51:39 2018 From: unicode at unicode.org (Asmus Freytag via Unicode) Date: Thu, 18 Jan 2018 10:51:39 -0800 Subject: 0027, 02BC, 2019, or a new character? In-Reply-To: <713142F1-22AF-479B-9DD8-9A317EBD608B@lboro.ac.uk> References: <175e07ea-9092-6c22-9bb4-3d817fa37dbe@efele.net> <20180116080019.2738554a@JRWUBU2> <57DC0C82-2C14-43B3-BED7-5C5C03F0FCAA@lboro.ac.uk> <713142F1-22AF-479B-9DD8-9A317EBD608B@lboro.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Jan 18 13:04:09 2018 From: unicode at unicode.org (Anshuman Pandey via Unicode) Date: Thu, 18 Jan 2018 13:04:09 -0600 Subject: Emoji for major planets at least? In-Reply-To: <9c059394-b594-82c1-1c24-a68bf890ed5f@ix.netcom.com> References: <292f1f76-d469-04cf-6cca-26325084e68a@ix.netcom.com> <9c059394-b594-82c1-1c24-a68bf890ed5f@ix.netcom.com> Message-ID: <10D65683-2738-4A2A-831B-E27DF665B52A@umich.edu> Proposals for planet emoji were submitted in April 2017: https://www.unicode.org/L2/L2017/17100-planet-emoji-seq.pdf http://www.unicode.org/L2/L2017/17100r-planet-emoji-seq.pdf I?m not sure what the result was. Anshu > On Jan 18, 2018, at 12:46 PM, Asmus Freytag (c) via Unicode wrote: > >> On 1/18/2018 10:01 AM, John H. Jenkins wrote: >> Well, you can go with Venus = white planet, Mercury = grey planet, Uranus = greenish planet, Neptune = bluish planet, Jupiter = striped planet. >> >> As you say, though, without a context, none of them convey much and Venus, at least, would just be a circle. >> >> Plus there's the question of the context in which someone would want to send little pictures of the planets. This sounds like it would be adding emoji just because. > > "Earth" as in "a blue ball in space" is something that reached iconic status after the famous photo taken during the early Apollo missions. I could definitely see that used in a variety of possible contexts. And the recognition value is higher than for many recent emoji. > > Saturn, with its rings (even though it's no longer the only one known with rings) also is iconic and highly recognizable. I lack imagination as to when someone would want to use it in communication, but I have the same issue with quite a few recent emoji, some of which are far less iconic or recognizable. I think it does lend itself to describe a "non-earth" type planet, or even the generic idea of a planet (as opposed to a star/sun). > > Mars and Venus have tons of connotations, which could be expressed by using an emoji (as opposed to the astrological symbol for each), but only Mars is reasonably recognizable without lots of pre-established context. That red color. > > In a detailed enough rendering, Jupiter, as a shaded "ball" with stripes and red dot would more recognizable than any of the remaining planets (on par or better with many recent emoji), but I see even less scope for using it metaphorically or in extended contexts. > > If someone were to make a proposal, I would suggest to them to limit it to these four and to provide more of a suggestion as to how these might show up in use. > > A./ >> >>> On Jan 18, 2018, at 10:44 AM, Asmus Freytag via Unicode wrote: >>> >>>> On 1/18/2018 6:55 AM, Shriramana Sharma via Unicode wrote: >>>> Hello people. >>>> >>>> We have sun, earth and moon emoji (3 for the earth and more for the >>>> moon's phases). But we don't have emoji for the rest of the planets. >>>> >>>> We have astrological symbols for all the planets and a few >>>> non-existent imaginary "planets" as well. >>>> >>>> Given this, would it be impractical to encode proper emoji characters >>>> for the rest of the planets, at least the major ones whose physical >>>> characteristics are well known and identifiable? >>>> >>>> I mean for example identifying Sedna and Quaoar >>>> (https://en.wikipedia.org/wiki/File:EightTNOs.png) is probably not >>>> going to be practical for all those other than astronomy buffs but the >>>> physical shapes of the major planets are known to all high school >>>> students? >>>> >>> Earth = blue planet (with clouds) >>> >>> Mars = red planet >>> >>> Saturn = planet with rings >>> >>> I don't think any of the other ones are identifiable in a context-free setting, unless you draw a "big planet with red dot" for Jupiter. >>> >>> Earth would have to be depicted in a way that doesn't focus on "hemispheres", or you miss the idea of it as "planet". >>> >>> >>> >>> A./ >>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Jan 18 15:10:46 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Thu, 18 Jan 2018 22:10:46 +0100 Subject: Emoji for major planets at least? In-Reply-To: <10D65683-2738-4A2A-831B-E27DF665B52A@umich.edu> References: <292f1f76-d469-04cf-6cca-26325084e68a@ix.netcom.com> <9c059394-b594-82c1-1c24-a68bf890ed5f@ix.netcom.com> <10D65683-2738-4A2A-831B-E27DF665B52A@umich.edu> Message-ID: Well I can think of a popular pseudo-planet, the "Death Star" or "Black Star" (for the "Star Wars" series), which is easily recognized by its color and shape (with the deep built crater, and optionally its destroyed half part) which also looks like a real planet, the Saturnian moon Mimas with its very wide crater (to avoid the copyright issue)... 2018-01-18 20:04 GMT+01:00 Anshuman Pandey via Unicode : > Proposals for planet emoji were submitted in April 2017: > > https://www.unicode.org/L2/L2017/17100-planet-emoji-seq.pdf > > http://www.unicode.org/L2/L2017/17100r-planet-emoji-seq.pdf > > I?m not sure what the result was. > > Anshu > > > On Jan 18, 2018, at 12:46 PM, Asmus Freytag (c) via Unicode < > unicode at unicode.org> wrote: > > On 1/18/2018 10:01 AM, John H. Jenkins wrote: > > Well, you can go with Venus = white planet, Mercury = grey planet, Uranus > = greenish planet, Neptune = bluish planet, Jupiter = striped planet. > > As you say, though, without a context, none of them convey much and Venus, > at least, would just be a circle. > > Plus there's the question of the context in which someone would want to > send little pictures of the planets. This sounds like it would be adding > emoji just because. > > > "Earth" as in "a blue ball in space" is something that reached iconic > status after the famous photo taken during the early Apollo missions. I > could definitely see that used in a variety of possible contexts. And the > recognition value is higher than for many recent emoji. > > Saturn, with its rings (even though it's no longer the only one known with > rings) also is iconic and highly recognizable. I lack imagination as to > when someone would want to use it in communication, but I have the same > issue with quite a few recent emoji, some of which are far less iconic or > recognizable. I think it does lend itself to describe a "non-earth" type > planet, or even the generic idea of a planet (as opposed to a star/sun). > > Mars and Venus have tons of connotations, which could be expressed by > using an emoji (as opposed to the astrological symbol for each), but only > Mars is reasonably recognizable without lots of pre-established context. > That red color. > > In a detailed enough rendering, Jupiter, as a shaded "ball" with stripes > and red dot would more recognizable than any of the remaining planets (on > par or better with many recent emoji), but I see even less scope for using > it metaphorically or in extended contexts. > > If someone were to make a proposal, I would suggest to them to limit it to > these four and to provide more of a suggestion as to how these might show > up in use. > > A./ > > > On Jan 18, 2018, at 10:44 AM, Asmus Freytag via Unicode < > unicode at unicode.org> wrote: > > On 1/18/2018 6:55 AM, Shriramana Sharma via Unicode wrote: > > Hello people. > > We have sun, earth and moon emoji (3 for the earth and more for the > moon's phases). But we don't have emoji for the rest of the planets. > > We have astrological symbols for all the planets and a few > non-existent imaginary "planets" as well. > > Given this, would it be impractical to encode proper emoji characters > for the rest of the planets, at least the major ones whose physical > characteristics are well known and identifiable? > > I mean for example identifying Sedna and Quaoar > (https://en.wikipedia.org/wiki/File:EightTNOs.png) is probably not > going to be practical for all those other than astronomy buffs but the > physical shapes of the major planets are known to all high school > students? > > > Earth = blue planet (with clouds) > > Mars = red planet > > Saturn = planet with rings > > I don't think any of the other ones are identifiable in a context-free > setting, unless you draw a "big planet with red dot" for Jupiter. > > Earth would have to be depicted in a way that doesn't focus on > "hemispheres", or you miss the idea of it as "planet". > > > A./ > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Jan 18 15:59:12 2018 From: unicode at unicode.org (Walter Tross via Unicode) Date: Thu, 18 Jan 2018 22:59:12 +0100 Subject: Emoji for major planets at least? In-Reply-To: References: <292f1f76-d469-04cf-6cca-26325084e68a@ix.netcom.com> <9c059394-b594-82c1-1c24-a68bf890ed5f@ix.netcom.com> <10D65683-2738-4A2A-831B-E27DF665B52A@umich.edu> Message-ID: Sorry guys if I step in uninvited, but I must say that I had hoped that the subject of this thread was ironical. Do you guys want to have an emoji for every entry of some encyclopaedia? You need JPEG, PNG, etc., not Unicode. Sorry Walter 2018-01-18 22:10 GMT+01:00 Philippe Verdy via Unicode : > Well I can think of a popular pseudo-planet, the "Death Star" or "Black > Star" (for the "Star Wars" series), which is easily recognized by its color > and shape (with the deep built crater, and optionally its destroyed half > part) which also looks like a real planet, the Saturnian moon Mimas with > its very wide crater (to avoid the copyright issue)... > > 2018-01-18 20:04 GMT+01:00 Anshuman Pandey via Unicode < > unicode at unicode.org>: > >> Proposals for planet emoji were submitted in April 2017: >> >> https://www.unicode.org/L2/L2017/17100-planet-emoji-seq.pdf >> >> http://www.unicode.org/L2/L2017/17100r-planet-emoji-seq.pdf >> >> I?m not sure what the result was. >> >> Anshu >> >> >> On Jan 18, 2018, at 12:46 PM, Asmus Freytag (c) via Unicode < >> unicode at unicode.org> wrote: >> >> On 1/18/2018 10:01 AM, John H. Jenkins wrote: >> >> Well, you can go with Venus = white planet, Mercury = grey planet, Uranus >> = greenish planet, Neptune = bluish planet, Jupiter = striped planet. >> >> As you say, though, without a context, none of them convey much and >> Venus, at least, would just be a circle. >> >> Plus there's the question of the context in which someone would want to >> send little pictures of the planets. This sounds like it would be adding >> emoji just because. >> >> >> "Earth" as in "a blue ball in space" is something that reached iconic >> status after the famous photo taken during the early Apollo missions. I >> could definitely see that used in a variety of possible contexts. And the >> recognition value is higher than for many recent emoji. >> >> Saturn, with its rings (even though it's no longer the only one known >> with rings) also is iconic and highly recognizable. I lack imagination as >> to when someone would want to use it in communication, but I have the same >> issue with quite a few recent emoji, some of which are far less iconic or >> recognizable. I think it does lend itself to describe a "non-earth" type >> planet, or even the generic idea of a planet (as opposed to a star/sun). >> >> Mars and Venus have tons of connotations, which could be expressed by >> using an emoji (as opposed to the astrological symbol for each), but only >> Mars is reasonably recognizable without lots of pre-established context. >> That red color. >> >> In a detailed enough rendering, Jupiter, as a shaded "ball" with stripes >> and red dot would more recognizable than any of the remaining planets (on >> par or better with many recent emoji), but I see even less scope for using >> it metaphorically or in extended contexts. >> >> If someone were to make a proposal, I would suggest to them to limit it >> to these four and to provide more of a suggestion as to how these might >> show up in use. >> >> A./ >> >> >> On Jan 18, 2018, at 10:44 AM, Asmus Freytag via Unicode < >> unicode at unicode.org> wrote: >> >> On 1/18/2018 6:55 AM, Shriramana Sharma via Unicode wrote: >> >> Hello people. >> >> We have sun, earth and moon emoji (3 for the earth and more for the >> moon's phases). But we don't have emoji for the rest of the planets. >> >> We have astrological symbols for all the planets and a few >> non-existent imaginary "planets" as well. >> >> Given this, would it be impractical to encode proper emoji characters >> for the rest of the planets, at least the major ones whose physical >> characteristics are well known and identifiable? >> >> I mean for example identifying Sedna and Quaoar >> (https://en.wikipedia.org/wiki/File:EightTNOs.png) is probably not >> going to be practical for all those other than astronomy buffs but the >> physical shapes of the major planets are known to all high school >> students? >> >> >> Earth = blue planet (with clouds) >> >> Mars = red planet >> >> Saturn = planet with rings >> >> I don't think any of the other ones are identifiable in a context-free >> setting, unless you draw a "big planet with red dot" for Jupiter. >> >> Earth would have to be depicted in a way that doesn't focus on >> "hemispheres", or you miss the idea of it as "planet". >> >> >> A./ >> >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Jan 18 17:25:02 2018 From: unicode at unicode.org (Asmus Freytag via Unicode) Date: Thu, 18 Jan 2018 15:25:02 -0800 Subject: Emoji for major planets at least? In-Reply-To: References: <292f1f76-d469-04cf-6cca-26325084e68a@ix.netcom.com> <9c059394-b594-82c1-1c24-a68bf890ed5f@ix.netcom.com> <10D65683-2738-4A2A-831B-E27DF665B52A@umich.edu>

Message-ID: <94ad5a07-5b6c-1582-9be8-a2ea97b58e84@ix.netcom.com> An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Jan 18 18:19:14 2018 From: unicode at unicode.org (Aleksey Tulinov via Unicode) Date: Fri, 19 Jan 2018 02:19:14 +0200 Subject: Emoji for major planets at least? In-Reply-To: <94ad5a07-5b6c-1582-9be8-a2ea97b58e84@ix.netcom.com> References: <292f1f76-d469-04cf-6cca-26325084e68a@ix.netcom.com> <9c059394-b594-82c1-1c24-a68bf890ed5f@ix.netcom.com> <10D65683-2738-4A2A-831B-E27DF665B52A@umich.edu>

<94ad5a07-5b6c-1582-9be8-a2ea97b58e84@ix.netcom.com> Message-ID: Perhaps we all shall stop being ironical to each other, calm down, sit and discuss how to encode 3D animated emojies (animojies) in Unicode. Adopting something like COLLADA would be sweet. I guess COLLADA, being XML-based standard, already can be encoded by Unicode, so it shouldn't be a lot of hustle, just some paper work, right? 2018-01-19 1:25 GMT+02:00 Asmus Freytag via Unicode : > On 1/18/2018 1:59 PM, Walter Tross via Unicode wrote: > > Sorry guys if I step in uninvited, but I must say that I had hoped that > the subject of this thread was ironical. > > > Of course not, how could you think that? > > Do you guys want to have an emoji for every entry of some encyclopaedia? > You need JPEG, PNG, etc., not Unicode. > > > Clearly, the natural progression of modern communication is away from > bothersome alphabetic recordings of spoken sound to the expressive power of > picture-writing. > > You can't possibly dream of standing in the way of this evolution! > > A./ > > > Sorry > Walter > > 2018-01-18 22:10 GMT+01:00 Philippe Verdy via Unicode >: > >> Well I can think of a popular pseudo-planet, the "Death Star" or "Black >> Star" (for the "Star Wars" series), which is easily recognized by its color >> and shape (with the deep built crater, and optionally its destroyed half >> part) which also looks like a real planet, the Saturnian moon Mimas with >> its very wide crater (to avoid the copyright issue)... >> >> 2018-01-18 20:04 GMT+01:00 Anshuman Pandey via Unicode < >> unicode at unicode.org>: >> >>> Proposals for planet emoji were submitted in April 2017: >>> >>> https://www.unicode.org/L2/L2017/17100-planet-emoji-seq.pdf >>> >>> http://www.unicode.org/L2/L2017/17100r-planet-emoji-seq.pdf >>> >>> I?m not sure what the result was. >>> >>> Anshu >>> >>> >>> On Jan 18, 2018, at 12:46 PM, Asmus Freytag (c) via Unicode < >>> unicode at unicode.org> wrote: >>> >>> On 1/18/2018 10:01 AM, John H. Jenkins wrote: >>> >>> Well, you can go with Venus = white planet, Mercury = grey planet, >>> Uranus = greenish planet, Neptune = bluish planet, Jupiter = striped >>> planet. >>> >>> As you say, though, without a context, none of them convey much and >>> Venus, at least, would just be a circle. >>> >>> Plus there's the question of the context in which someone would want to >>> send little pictures of the planets. This sounds like it would be adding >>> emoji just because. >>> >>> >>> "Earth" as in "a blue ball in space" is something that reached iconic >>> status after the famous photo taken during the early Apollo missions. I >>> could definitely see that used in a variety of possible contexts. And the >>> recognition value is higher than for many recent emoji. >>> >>> Saturn, with its rings (even though it's no longer the only one known >>> with rings) also is iconic and highly recognizable. I lack imagination as >>> to when someone would want to use it in communication, but I have the same >>> issue with quite a few recent emoji, some of which are far less iconic or >>> recognizable. I think it does lend itself to describe a "non-earth" type >>> planet, or even the generic idea of a planet (as opposed to a star/sun). >>> >>> Mars and Venus have tons of connotations, which could be expressed by >>> using an emoji (as opposed to the astrological symbol for each), but only >>> Mars is reasonably recognizable without lots of pre-established context. >>> That red color. >>> >>> In a detailed enough rendering, Jupiter, as a shaded "ball" with stripes >>> and red dot would more recognizable than any of the remaining planets (on >>> par or better with many recent emoji), but I see even less scope for using >>> it metaphorically or in extended contexts. >>> >>> If someone were to make a proposal, I would suggest to them to limit it >>> to these four and to provide more of a suggestion as to how these might >>> show up in use. >>> >>> A./ >>> >>> >>> On Jan 18, 2018, at 10:44 AM, Asmus Freytag via Unicode < >>> unicode at unicode.org> wrote: >>> >>> On 1/18/2018 6:55 AM, Shriramana Sharma via Unicode wrote: >>> >>> Hello people. >>> >>> We have sun, earth and moon emoji (3 for the earth and more for the >>> moon's phases). But we don't have emoji for the rest of the planets. >>> >>> We have astrological symbols for all the planets and a few >>> non-existent imaginary "planets" as well. >>> >>> Given this, would it be impractical to encode proper emoji characters >>> for the rest of the planets, at least the major ones whose physical >>> characteristics are well known and identifiable? >>> >>> I mean for example identifying Sedna and Quaoar >>> (https://en.wikipedia.org/wiki/File:EightTNOs.png) is probably not >>> going to be practical for all those other than astronomy buffs but the >>> physical shapes of the major planets are known to all high school >>> students? >>> >>> >>> Earth = blue planet (with clouds) >>> >>> Mars = red planet >>> >>> Saturn = planet with rings >>> >>> I don't think any of the other ones are identifiable in a context-free >>> setting, unless you draw a "big planet with red dot" for Jupiter. >>> >>> Earth would have to be depicted in a way that doesn't focus on >>> "hemispheres", or you miss the idea of it as "planet". >>> >>> >>> A./ >>> >>> >>> >>> >>> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Jan 18 19:12:04 2018 From: unicode at unicode.org (Pierpaolo Bernardi via Unicode) Date: Fri, 19 Jan 2018 02:12:04 +0100 Subject: Emoji for major planets at least? In-Reply-To: References: <292f1f76-d469-04cf-6cca-26325084e68a@ix.netcom.com> <9c059394-b594-82c1-1c24-a68bf890ed5f@ix.netcom.com> <10D65683-2738-4A2A-831B-E27DF665B52A@umich.edu>

<94ad5a07-5b6c-1582-9be8-a2ea97b58e84@ix.netcom.com> Message-ID: On Fri, Jan 19, 2018 at 1:19 AM, Aleksey Tulinov via Unicode wrote: > Perhaps we all shall stop being ironical to each other, calm down, sit and > discuss how to encode 3D animated emojies (animojies) in Unicode. Adopting > something like COLLADA would be sweet. I guess COLLADA, being XML-based > standard, already can be encoded by Unicode, so it shouldn't be a lot of > hustle, just some paper work, right? What??? No MPEG-4? COLLADA is a step in the right direction, but it doesn't encode sounds! From unicode at unicode.org Fri Jan 19 02:42:44 2018 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Fri, 19 Jan 2018 08:42:44 +0000 Subject: Emoji for major planets at least? In-Reply-To: References: <292f1f76-d469-04cf-6cca-26325084e68a@ix.netcom.com> <9c059394-b594-82c1-1c24-a68bf890ed5f@ix.netcom.com> <10D65683-2738-4A2A-831B-E27DF665B52A@umich.edu>

<94ad5a07-5b6c-1582-9be8-a2ea97b58e84@ix.netcom.com>

Message-ID: <20180119084244.7bb6e31a@JRWUBU2> On Fri, 19 Jan 2018 02:12:04 +0100 Pierpaolo Bernardi via Unicode wrote: > On Fri, Jan 19, 2018 at 1:19 AM, Aleksey Tulinov via Unicode > wrote: > > Perhaps we all shall stop being ironical to each other, calm down, > > sit and discuss how to encode 3D animated emojies (animojies) in > > Unicode. Adopting something like COLLADA would be sweet. I guess > > COLLADA, being XML-based standard, already can be encoded by > > Unicode, so it shouldn't be a lot of hustle, just some paper work, > > right? > > What??? No MPEG-4? > > COLLADA is a step in the right direction, but it doesn't encode > sounds! Isn't there the issue that Unicode is supposed to encode writing? I only see two secure precedents for the encoding of multimedia emoji: 1) Korean compatibility ideographs which differ only in pronunciation 2) Character-level mark-up by CGJ for collation to distinguish German umlaut and diaeresis (if they are truly indistinguishable in all styles) and by CGJ to distinguish strings for the purposes of collation. (Soft hyphens are also usable in this r?le.) Of course, multimedia *glyphs* are permitted. Of course, it's rather tricky to print animated glyphs using muggle inks. Richard. From unicode at unicode.org Fri Jan 19 03:16:25 2018 From: unicode at unicode.org (Shriramana Sharma via Unicode) Date: Fri, 19 Jan 2018 14:46:25 +0530 Subject: 0027, 02BC, 2019, or a new character? In-Reply-To: References: <175e07ea-9092-6c22-9bb4-3d817fa37dbe@efele.net> <20180116080019.2738554a@JRWUBU2> <57DC0C82-2C14-43B3-BED7-5C5C03F0FCAA@lboro.ac.uk> <713142F1-22AF-479B-9DD8-9A317EBD608B@lboro.ac.uk> Message-ID: Wow. Somebody really needs to convey this to the Kazhaks. Else a short-sighted decision would ruin their chances at native IDNs. Any Kazhaks on this list? On 19-Jan-2018 00:23, "Asmus Freytag via Unicode" wrote: > Top level IDN domain names can not contain 02BC, nor 0027 or 2019. > > (RFC 6912 gives the rationale and RZ-LGR the implementation, see MSR-3 > ) > > A./ > > On 1/18/2018 3:00 AM, Andre Schappo via Unicode wrote: > > > > On 18 Jan 2018, at 08:21, Andre Schappo via Unicode > wrote: > > > > On 16 Jan 2018, at 08:00, Richard Wordingham via Unicode < > unicode at unicode.org> wrote: > > On Mon, 15 Jan 2018 20:16:21 -0800 > James Kass via Unicode wrote: > > It will probably be the ASCII apostrophe. The stated intent favors > the apostrophe over diacritics or special characters to ensure that > the language can be input to computers with standard keyboards. > > > Typing U+0027 into a word processor takes planning. Of the three, it > should obviously be the modifier letter U+02BC, but I think what gets > stored will be U+0027 or the single quotation mark U+2019. > > However, we shouldn't overlook the diacritic mark U+0315 COMBINING COMMA > ABOVE RIGHT. > > Richard. > > > I have just tested twitter hashtags and as one would expect, U+02BC does > not break hashtags. See twitter.com/andreschappo/status/953903964722024448 > > > ...and, just in case twitter.com/andreschappo/status/953944089896083456 > > > Andr? Schappo > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Jan 19 03:39:11 2018 From: unicode at unicode.org (Andrew West via Unicode) Date: Fri, 19 Jan 2018 09:39:11 +0000 Subject: 0027, 02BC, 2019, or a new character? In-Reply-To: References: <175e07ea-9092-6c22-9bb4-3d817fa37dbe@efele.net> <20180116080019.2738554a@JRWUBU2> <57DC0C82-2C14-43B3-BED7-5C5C03F0FCAA@lboro.ac.uk> <713142F1-22AF-479B-9DD8-9A317EBD608B@lboro.ac.uk>

Message-ID: On 19 January 2018 at 09:16, Shriramana Sharma via Unicode wrote: > Wow. Somebody really needs to convey this to the Kazhaks. Else a > short-sighted decision would ruin their chances at native IDNs. Any Kazhaks > on this list? There's only one Kazakh who counts, and I'm pretty sure he's not on this list. Andrew From unicode at unicode.org Fri Jan 19 07:19:53 2018 From: unicode at unicode.org (Michael Everson via Unicode) Date: Fri, 19 Jan 2018 13:19:53 +0000 Subject: 0027, 02BC, 2019, or a new character? In-Reply-To: References: <175e07ea-9092-6c22-9bb4-3d817fa37dbe@efele.net> <20180116080019.2738554a@JRWUBU2> <57DC0C82-2C14-43B3-BED7-5C5C03F0FCAA@lboro.ac.uk> <713142F1-22AF-479B-9DD8-9A317EBD608B@lboro.ac.uk>

Message-ID: I?d go talk with him :-) I published Alice in Kazakh. He might like that. Michael > On 19 Jan 2018, at 09:39, Andrew West via Unicode wrote: > > On 19 January 2018 at 09:16, Shriramana Sharma via Unicode > wrote: >> Wow. Somebody really needs to convey this to the Kazhaks. Else a >> short-sighted decision would ruin their chances at native IDNs. Any Kazhaks >> on this list? > > There's only one Kazakh who counts, and I'm pretty sure he's not on this list. > > Andrew From unicode at unicode.org Fri Jan 19 07:35:23 2018 From: unicode at unicode.org (Shriramana Sharma via Unicode) Date: Fri, 19 Jan 2018 19:05:23 +0530 Subject: 0027, 02BC, 2019, or a new character? In-Reply-To: References: <175e07ea-9092-6c22-9bb4-3d817fa37dbe@efele.net> <20180116080019.2738554a@JRWUBU2> <57DC0C82-2C14-43B3-BED7-5C5C03F0FCAA@lboro.ac.uk> <713142F1-22AF-479B-9DD8-9A317EBD608B@lboro.ac.uk>

Message-ID: You can just mail him or Skype-call him no? ?? On 19-Jan-2018 18:53, "Michael Everson via Unicode" wrote: > I?d go talk with him :-) I published Alice in Kazakh. He might like that. > > Michael > > > On 19 Jan 2018, at 09:39, Andrew West via Unicode > wrote: > > > > On 19 January 2018 at 09:16, Shriramana Sharma via Unicode > > wrote: > >> Wow. Somebody really needs to convey this to the Kazhaks. Else a > >> short-sighted decision would ruin their chances at native IDNs. Any > Kazhaks > >> on this list? > > > > There's only one Kazakh who counts, and I'm pretty sure he's not on this > list. > > > > Andrew > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Jan 19 07:37:40 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Fri, 19 Jan 2018 14:37:40 +0100 Subject: 0027, 02BC, 2019, or a new character? In-Reply-To: References: <175e07ea-9092-6c22-9bb4-3d817fa37dbe@efele.net> <20180116080019.2738554a@JRWUBU2> <57DC0C82-2C14-43B3-BED7-5C5C03F0FCAA@lboro.ac.uk> <713142F1-22AF-479B-9DD8-9A317EBD608B@lboro.ac.uk> Message-ID: May be the IDN could accept a new combining diacritic (sort of right-side acute accent). After all the Kazakh intent is not to define a new separate character but a modification of base letter to create a single letter in their alphabet. So a proposal for COMBINING APOSTROPHE (whose spacing non-combining version is 02BC), so that SPACE+COMBINING APOSTROPHE will render exactly like 02BC. 2018-01-18 19:51 GMT+01:00 Asmus Freytag via Unicode : > Top level IDN domain names can not contain 02BC, nor 0027 or 2019. > > (RFC 6912 gives the rationale and RZ-LGR the implementation, see MSR-3 > ) > > A./ > > > On 1/18/2018 3:00 AM, Andre Schappo via Unicode wrote: > > > > On 18 Jan 2018, at 08:21, Andre Schappo via Unicode > wrote: > > > > On 16 Jan 2018, at 08:00, Richard Wordingham via Unicode < > unicode at unicode.org> wrote: > > On Mon, 15 Jan 2018 20:16:21 -0800 > James Kass via Unicode wrote: > > It will probably be the ASCII apostrophe. The stated intent favors > the apostrophe over diacritics or special characters to ensure that > the language can be input to computers with standard keyboards. > > > Typing U+0027 into a word processor takes planning. Of the three, it > should obviously be the modifier letter U+02BC, but I think what gets > stored will be U+0027 or the single quotation mark U+2019. > > However, we shouldn't overlook the diacritic mark U+0315 COMBINING COMMA > ABOVE RIGHT. > > Richard. > > > I have just tested twitter hashtags and as one would expect, U+02BC does > not break hashtags. See twitter.com/andreschappo/status/953903964722024448 > > > ...and, just in case twitter.com/andreschappo/status/953944089896083456 > > > Andr? Schappo > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Jan 19 07:42:52 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Fri, 19 Jan 2018 14:42:52 +0100 Subject: 0027, 02BC, 2019, or a new character? In-Reply-To: References: <175e07ea-9092-6c22-9bb4-3d817fa37dbe@efele.net> <20180116080019.2738554a@JRWUBU2> <57DC0C82-2C14-43B3-BED7-5C5C03F0FCAA@lboro.ac.uk> <713142F1-22AF-479B-9DD8-9A317EBD608B@lboro.ac.uk>

Message-ID: Hmmm.... that character exists already at 0+0315 (a combining comma above right). It would work for the new Kazah orthographic system, including for collation purpose. I don't think IDN rejects this combining version. 2018-01-19 14:37 GMT+01:00 Philippe Verdy : > May be the IDN could accept a new combining diacritic (sort of right-side > acute accent). After all the Kazakh intent is not to define a new separate > character but a modification of base letter to create a single letter in > their alphabet. > So a proposal for COMBINING APOSTROPHE (whose spacing non-combining > version is 02BC), so that SPACE+COMBINING APOSTROPHE will render exactly > like 02BC > > 2018-01-18 19:51 GMT+01:00 Asmus Freytag via Unicode > : > >> Top level IDN domain names can not contain 02BC, nor 0027 or 2019. >> >> (RFC 6912 gives the rationale and RZ-LGR the implementation, see MSR-3 >> ) >> >> A./ >> >> >> On 1/18/2018 3:00 AM, Andre Schappo via Unicode wrote: >> >> >> >> On 18 Jan 2018, at 08:21, Andre Schappo via Unicode >> wrote: >> >> >> >> On 16 Jan 2018, at 08:00, Richard Wordingham via Unicode < >> unicode at unicode.org> wrote: >> >> On Mon, 15 Jan 2018 20:16:21 -0800 >> James Kass via Unicode wrote: >> >> It will probably be the ASCII apostrophe. The stated intent favors >> the apostrophe over diacritics or special characters to ensure that >> the language can be input to computers with standard keyboards. >> >> >> Typing U+0027 into a word processor takes planning. Of the three, it >> should obviously be the modifier letter U+02BC, but I think what gets >> stored will be U+0027 or the single quotation mark U+2019. >> >> However, we shouldn't overlook the diacritic mark U+0315 COMBINING COMMA >> ABOVE RIGHT. >> >> Richard. >> >> >> I have just tested twitter hashtags and as one would expect, U+02BC does >> not break hashtags. See twitter.com/andreschappo/s >> tatus/953903964722024448 >> >> >> ...and, just in case twitter.com/andreschappo/status/953944089896083456 >> >> >> Andr? Schappo >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Jan 19 07:47:43 2018 From: unicode at unicode.org (Michael Everson via Unicode) Date: Fri, 19 Jan 2018 13:47:43 +0000 Subject: 0027, 02BC, 2019, or a new character? In-Reply-To: References: <175e07ea-9092-6c22-9bb4-3d817fa37dbe@efele.net> <20180116080019.2738554a@JRWUBU2> <57DC0C82-2C14-43B3-BED7-5C5C03F0FCAA@lboro.ac.uk> <713142F1-22AF-479B-9DD8-9A317EBD608B@lboro.ac.uk>

Message-ID: <3BF8C43A-297D-4E3F-82E2-B585614B3788@evertype.com> There?s no redeeming this orthography. > On 19 Jan 2018, at 13:42, Philippe Verdy via Unicode wrote: > > Hmmm.... that character exists already at 0+0315 (a combining comma above right). It would work for the new Kazah orthographic system, including for collation purpose. I don't think IDN rejects this combining version. > > > 2018-01-19 14:37 GMT+01:00 Philippe Verdy : > May be the IDN could accept a new combining diacritic (sort of right-side acute accent). After all the Kazakh intent is not to define a new separate character but a modification of base letter to create a single letter in their alphabet. > So a proposal for COMBINING APOSTROPHE (whose spacing non-combining version is 02BC), so that SPACE+COMBINING APOSTROPHE will render exactly like 02BC > > 2018-01-18 19:51 GMT+01:00 Asmus Freytag via Unicode : > Top level IDN domain names can not contain 02BC, nor 0027 or 2019. > > (RFC 6912 gives the rationale and RZ-LGR the implementation, see MSR-3) > > A./ > > > On 1/18/2018 3:00 AM, Andre Schappo via Unicode wrote: >> >> >>> On 18 Jan 2018, at 08:21, Andre Schappo via Unicode wrote: >>> >>> >>> >>>> On 16 Jan 2018, at 08:00, Richard Wordingham via Unicode wrote: >>>> >>>> On Mon, 15 Jan 2018 20:16:21 -0800 >>>> James Kass via Unicode wrote: >>>> >>>>> It will probably be the ASCII apostrophe. The stated intent favors >>>>> the apostrophe over diacritics or special characters to ensure that >>>>> the language can be input to computers with standard keyboards. >>>> >>>> Typing U+0027 into a word processor takes planning. Of the three, it >>>> should obviously be the modifier letter U+02BC, but I think what gets >>>> stored will be U+0027 or the single quotation mark U+2019. >>>> >>>> However, we shouldn't overlook the diacritic mark U+0315 COMBINING COMMA >>>> ABOVE RIGHT. >>>> >>>> Richard. >>> >>> I have just tested twitter hashtags and as one would expect, U+02BC does not break hashtags. See twitter.com/andreschappo/status/953903964722024448 >>> >> >> ...and, just in case twitter.com/andreschappo/status/953944089896083456 >> >> Andr? Schappo >> > > > From unicode at unicode.org Fri Jan 19 07:51:35 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Fri, 19 Jan 2018 14:51:35 +0100 Subject: 0027, 02BC, 2019, or a new character? In-Reply-To: References: <175e07ea-9092-6c22-9bb4-3d817fa37dbe@efele.net> <20180116080019.2738554a@JRWUBU2> <57DC0C82-2C14-43B3-BED7-5C5C03F0FCAA@lboro.ac.uk> <713142F1-22AF-479B-9DD8-9A317EBD608B@lboro.ac.uk>

Message-ID: Also U+0315 is not part of any decomposition for canonical normalization purpose, so it would remain encoded separately (only subject to possible reordering if there are other diacritics) 2018-01-19 14:37 GMT+01:00 Philippe Verdy : > May be the IDN could accept a new combining diacritic (sort of right-side > acute accent). After all the Kazakh intent is not to define a new separate > character but a modification of base letter to create a single letter in > their alphabet. > So a proposal for COMBINING APOSTROPHE (whose spacing non-combining > version is 02BC), so that SPACE+COMBINING APOSTROPHE will render exactly > like 02BC. > > 2018-01-18 19:51 GMT+01:00 Asmus Freytag via Unicode > : > >> Top level IDN domain names can not contain 02BC, nor 0027 or 2019. >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Jan 19 07:51:43 2018 From: unicode at unicode.org (Andrew West via Unicode) Date: Fri, 19 Jan 2018 13:51:43 +0000 Subject: 0027, 02BC, 2019, or a new character? In-Reply-To: References: <175e07ea-9092-6c22-9bb4-3d817fa37dbe@efele.net> <20180116080019.2738554a@JRWUBU2> <57DC0C82-2C14-43B3-BED7-5C5C03F0FCAA@lboro.ac.uk> <713142F1-22AF-479B-9DD8-9A317EBD608B@lboro.ac.uk>

Message-ID: On 19 January 2018 at 13:19, Michael Everson via Unicode wrote: > > I?d go talk with him :-) I published Alice in Kazakh. He might like that. Damn, you'll have to reprint it with apostrophes now. Andrew From unicode at unicode.org Fri Jan 19 07:56:48 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Fri, 19 Jan 2018 14:56:48 +0100 Subject: 0027, 02BC, 2019, or a new character? In-Reply-To: <3BF8C43A-297D-4E3F-82E2-B585614B3788@evertype.com> References: <175e07ea-9092-6c22-9bb4-3d817fa37dbe@efele.net> <20180116080019.2738554a@JRWUBU2> <57DC0C82-2C14-43B3-BED7-5C5C03F0FCAA@lboro.ac.uk> <713142F1-22AF-479B-9DD8-9A317EBD608B@lboro.ac.uk>

<3BF8C43A-297D-4E3F-82E2-B585614B3788@evertype.com> Message-ID: 2018-01-19 14:47 GMT+01:00 Michael Everson via Unicode : > There?s no redeeming this orthography. This is not a redeeming, the Kazakh government currently has not made any assesment of how to encode their proposed system. Who said that was was proposed by them was an "apostrophe" ? May be they jsut wanted to use the ASCII apostrophe for compatibility with their legacy systems (but it's like the hack used in legacy ASCII-only system to represent [?] as [e'] : it's a workaround but this caused enough serious problems that we then all used the correct encoding of an acute accent, as a separate combining character or precombined with letters). And here we were suggesting several other characters. For me U+0315 is the best match for what they propose. -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Jan 19 08:16:05 2018 From: unicode at unicode.org (=?UTF-8?Q?Christoph_P=C3=A4per?= via Unicode) Date: Fri, 19 Jan 2018 15:16:05 +0100 (CET) Subject: Emoji for major planets at least? In-Reply-To: <9c059394-b594-82c1-1c24-a68bf890ed5f@ix.netcom.com> References: <292f1f76-d469-04cf-6cca-26325084e68a@ix.netcom.com> <9c059394-b594-82c1-1c24-a68bf890ed5f@ix.netcom.com> Message-ID: <1796514713.89209.1516371365598@ox.hosteurope.de> Asmus Freytag: > > Saturn, with its rings (even though it's no longer the only one known > with rings) also is iconic and highly recognizable. I lack imagination > as to when someone would want to use it in communication, but I have the > same issue with quite a few recent emoji, some of which are far less > iconic or recognizable. I think it does lend itself to describe a > "non-earth" type planet, or even the generic idea of a planet (as > opposed to a star/sun). For what it's worth, the Sky Web logo was a planet with a ring or orbit and it was included in the J-Phone, later Vodafone then SoftBank, emoji set at position F-75 (next to the paperplane for their Skywalker service). As a proprietary logo, it was not included in the final proposal emerging from the emoji4unicode project, but it was documented as e-E78, EMOJI COMPATIBILITY SYMBOL-58. The image was animated where possible. -------------- next part -------------- A non-text attachment was scrubbed... Name: F75.gif Type: image/gif Size: 296 bytes Desc: not available URL: From unicode at unicode.org Fri Jan 19 08:23:29 2018 From: unicode at unicode.org (Michael Everson via Unicode) Date: Fri, 19 Jan 2018 14:23:29 +0000 Subject: 0027, 02BC, 2019, or a new character? In-Reply-To: References: <175e07ea-9092-6c22-9bb4-3d817fa37dbe@efele.net>