From tfujiwar at redhat.com Fri Jul 1 09:10:07 2016 From: tfujiwar at redhat.com (Takao Fujiwara) Date: Fri, 1 Jul 2016 23:10:07 +0900 Subject: Emoji and Annotation data In-Reply-To: <412dcbbd-f803-df98-7172-489bec525335@redhat.com> References: <07d3a922-b9a3-e8cb-df09-746796c8e0d4@redhat.com> <2e621021-44bd-12f0-932a-a1d6b50c361b@redhat.com> <412dcbbd-f803-df98-7172-489bec525335@redhat.com> Message-ID: <113458b8-a537-45c9-7712-1dd4caa193d2@redhat.com> I tested emoji.json but unfortunately it's less useful than emoji-list.html. 1. "name" element is too long for the dictionary, E.g. "grinning face with smiling eyes" but I need both single word and words, E.g. "tower" and "united states". 2. Some keywords are adjective but I need noun. E.g. "smile" but not "smiley", "kiss" but not "kissing". Now I will try to get the annotation list from unicode.org. Thanks, Fujiwara On 06/27/16 17:00, Takao Fujiwara-san wrote: > On 06/27/16 15:58, Ori Avtalion-san wrote: >> On Mon, Jun 27, 2016 at 7:13 AM, Takao Fujiwara wrote: >>> Why you don't use only annotations? E.g. "us" hits too many Emoji. >> >> It's for all kinds of Unicode symbols, not just those that have emoji >> representation. >> Sometimes I find myself searching by the "real" Unicode name, and >> sometimes by keyword, if I don't know what I'm looking for. > > It's a bit strange for me to type "us" and hits "bus" and "muscle". > The following the current implementation in IBus core: > https://github.com/ibus/ibus/commit/160d3c975a > > Fujiwara > >> >> I keep tweaking it to provide better results, and I'm pretty pleased >> with its current state. >> It currently has a ranking algorithm based on what it matched on >> (name, annotation/emojione keyword), and how successfully. >> > > From charupdate at orange.fr Fri Jul 15 23:14:45 2016 From: charupdate at orange.fr (Marcel Schneider) Date: Sat, 16 Jul 2016 06:14:45 +0200 (CEST) Subject: Public review of draft repertoire for ISO/IEC 10646 In-Reply-To: <57621022.1070209@unicode.org> References: <57621022.1070209@unicode.org> Message-ID: <502559172.247.1468642485728.JavaMail.www@wwinf1p24> On Wed, 15 Jun 2016 19:34:10 -0700, Rick McGowan wrote: > The UTC would appreciate feedback on new repertoire that is currently > under ballot for future additions to ISO/IEC 10646. This includes > repertoire that has already been reviewed and approved by the UTC, but > which will not be published until next year, as part of Version 10.0 of > the Unicode Standard. > > This is your opportunity to review the planned new repertoire for > possible problems, and to make any suggestions you might have about > improvements for glyphs or character names. Thanks. Indeed this matches the ?alpha review? that I?ve asked for. Sadly I missed the previous ones. I note that now that the Unicode repertoire is built at cruise speed, few to no feedback items are reported.[1][2] Best regards, Marcel [1] http://www.unicode.org/review/pri327/feedback.html [2] http://www.unicode.org/review/pri328/feedback.html From ken.shirriff at gmail.com Sat Jul 23 11:26:41 2016 From: ken.shirriff at gmail.com (Ken Shirriff) Date: Sat, 23 Jul 2016 09:26:41 -0700 Subject: Running text requirement? Message-ID: Someone asked me about the requirement for evidence that proposed new characters are used in running text. I thought it was in the Symbol Guidelines (http://www.unicode.org/pending/symbol-guidelines.html) or the Character Proposals document (http://unicode.org/pending/proposals.html) but it's not there. Is there a written requirement for running text somewhere or is it "tradition"? Ken -------------- next part -------------- An HTML attachment was scrubbed... URL: From roozbeh at unicode.org Sat Jul 23 19:09:15 2016 From: roozbeh at unicode.org (Roozbeh Pournader) Date: Sat, 23 Jul 2016 17:09:15 -0700 Subject: Running text requirement? In-Reply-To: References: Message-ID: In my experience, no such requirement is a binary yes/no. If you have a good character candidate, run it by the list, or just write a proposal. UTC tends to look at all the merits together, instead of a list of things that should all be there or else there won't be a character. On Sat, Jul 23, 2016 at 9:26 AM, Ken Shirriff wrote: > Someone asked me about the requirement for evidence that proposed new > characters are used in running text. I thought it was in the Symbol > Guidelines (http://www.unicode.org/pending/symbol-guidelines.html) or the > Character Proposals document (http://unicode.org/pending/proposals.html) > but it's not there. Is there a written requirement for running text > somewhere or is it "tradition"? > > Ken > -------------- next part -------------- An HTML attachment was scrubbed... URL: From petercon at microsoft.com Sun Jul 24 01:27:19 2016 From: petercon at microsoft.com (Peter Constable) Date: Sun, 24 Jul 2016 06:27:19 +0000 Subject: Running text requirement? In-Reply-To: References: Message-ID: If it?s a symbol / pictograph, then UTC will want to be convinced that it?s needed/appropriate for use in running text. There are lots of symbols that get used in different kinds of presentation but that are not necessarily used in text. Depending on the symbol, it may or may not be obvious. It doesn?t hurt to include samples of attested usage in running text. But as Roozbeh says, you can float it first to get feedback on whether additional evidence is needed. Peter From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Roozbeh Pournader Sent: Saturday, July 23, 2016 5:09 PM To: Ken Shirriff Cc: Unicode Public Subject: Re: Running text requirement? In my experience, no such requirement is a binary yes/no. If you have a good character candidate, run it by the list, or just write a proposal. UTC tends to look at all the merits together, instead of a list of things that should all be there or else there won't be a character. On Sat, Jul 23, 2016 at 9:26 AM, Ken Shirriff > wrote: Someone asked me about the requirement for evidence that proposed new characters are used in running text. I thought it was in the Symbol Guidelines (http://www.unicode.org/pending/symbol-guidelines.html) or the Character Proposals document (http://unicode.org/pending/proposals.html) but it's not there. Is there a written requirement for running text somewhere or is it "tradition"? Ken -------------- next part -------------- An HTML attachment was scrubbed... URL: From rwhlk142 at gmail.com Tue Jul 26 20:12:38 2016 From: rwhlk142 at gmail.com (Robert Wheelock) Date: Tue, 26 Jul 2016 21:12:38 -0400 Subject: Numerical fractions written in Arabic script Message-ID: Hello again, y?all! How do Arabs, Iranians, Afghans, Pakistanis, Urdu ... all write their equivalents of common numerical fractions (consisting of a numerator, a separator character, and a denominator)?!?! Considering that Arabic written script reads from right to left (like in Hebrew, Syro-Aramaic, and the fantasy language of Tsoly?ni), would they use a normal right-facing foreslash (1/2), a left-facing backslash (1\2), or do they align numerator above|demoniator below a horizontal fraction bar?!?! Notice that these people would use the native Arabic-based digits in them; nonewithstanding, the forms for |4 5 6| (and?sometimes?those for |2 7|) do look quite different from the canonical Arabic forms. Just something to think about... Thank You! -------------- next part -------------- An HTML attachment was scrubbed... URL: From frederic.grosshans at gmail.com Wed Jul 27 07:29:39 2016 From: frederic.grosshans at gmail.com (=?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Grosshans?=) Date: Wed, 27 Jul 2016 14:29:39 +0200 Subject: Numerical fractions written in Arabic script In-Reply-To: References: Message-ID: Le 27/07/2016 ? 03:12, Robert Wheelock a ?crit : > How do Arabs, Iranians, Afghans, Pakistanis, Urdu ... all write their > equivalents of common numerical fractions (consisting of a numerator, > a separator character, and a denominator)?!?! > Considering that Arabic written script reads from right to left (like > in Hebrew, Syro-Aramaic, and the fantasy language of Tsoly?ni), would > they use a normal right-facing foreslash (1/2), a left-facing > backslash (1\2), or do they align numerator above|demoniator below a > horizontal fraction bar?!?! > Notice that these people would use the native Arabic-based digits in > them; nonewithstanding, the forms for |4 5 6| (and?sometimes?those for > |2 7|) do look quite different from the canonical Arabic forms. The subject of modern arabic notation is quite complex, mixing RTL and LTR consideration, as well as latin/arabic/greek/math mixing, with several different approaches. A W3C document on this (https://www.w3.org/TR/arabic-math/) enumerates 4 styles (Moroccan/Maghreb/Machrek/Persian). It also contains the following paragraph, which answers your question: Finally, although stacked fractions are rendered the same way in both European and Arabic, bevelled fractions in RTL Arabic will appear, as one would expect, with the terms in RTL order, i.e. A divided by B would appear as "B/A". In some locales, the preference is for the slash to also be mirrored, as "B\A". For these cases, we suggest that authors employ explicit markup using the REVERSE SOLIDUS \ Fr?d?ric From frederic.grosshans at gmail.com Wed Jul 27 08:03:54 2016 From: frederic.grosshans at gmail.com (=?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Grosshans?=) Date: Wed, 27 Jul 2016 15:03:54 +0200 Subject: Numerical fractions written in Arabic script In-Reply-To: References: Message-ID: <1091492d-a536-1d0f-5bd0-d5c74a34cc47@gmail.com> Le 27/07/2016 ? 14:29, Fr?d?ric Grosshans a ?crit : > Le 27/07/2016 ? 03:12, Robert Wheelock a ?crit : >> How do Arabs, Iranians, Afghans, Pakistanis, Urdu ... all write their >> equivalents of common numerical fractions (consisting of a numerator, >> a separator character, and a denominator)?!?! >> Considering that Arabic written script reads from right to left (like >> in Hebrew, Syro-Aramaic, and the fantasy language of Tsoly?ni), would >> they use a normal right-facing foreslash (1/2), a left-facing >> backslash (1\2), or do they align numerator above|demoniator below a >> horizontal fraction bar?!?! >> Notice that these people would use the native Arabic-based digits in >> them; nonewithstanding, the forms for |4 5 6| (and?sometimes?those >> for |2 7|) do look quite different from the canonical Arabic forms. > > The subject of modern arabic notation is quite complex, mixing RTL and > LTR consideration, as well as latin/arabic/greek/math mixing, with > several different approaches. A W3C document on this > (https://www.w3.org/TR/arabic-math/) enumerates 4 styles > (Moroccan/Maghreb/Machrek/Persian). It also contains the following > paragraph, which answers your question: > > Finally, although stacked fractions are rendered the same way in > both European and Arabic, bevelled fractions in RTL Arabic will > appear, as one would expect, with the terms in RTL order, i.e. A > divided by B would appear as "B/A". In some locales, the preference > is for the slash to also be mirrored, as "B\A". For these cases, we > suggest that authors employ explicit markup using the REVERSE > SOLIDUS \ Looking at wikipedia (+ some google translate) gives you some examples : If you look at the arabic wikipedia page on fraction https://ar.wikipedia.org/wiki/%D9%83%D8%B3%D8%B1_(%D8%B1%D9%8A%D8%A7%D8%B6%D9%8A%D8%A7%D8%AA), you will see the following sentence : .??? ???? (????): ?? ????? ???? ??? ????? ???? ?? ??????? ????? 10/6 ? 3/2 ? 5/4 According to google translate, all the numerators are smaller than the denominator. A bit below, 2 4/5 is written :5/4 2, which is an interesting mixture of RTL and LTR, as is often the case for numbers in arabic script. On the equivalent Persian wikipedia page https://fa.wikipedia.org/wiki/%DA%A9%D8%B3%D8%B1, 3/4 is written ?/?, that is LTR 3/4 in persian digits, even if the text is RTL. The opposite convention is used. The Hebrew ( https://he.wikipedia.org/wiki/%D7%A9%D7%91%D7%A8_(%D7%9E%D7%AA%D7%9E%D7%98%D7%99%D7%A7%D7%94) ) and Yiddish ( https://he.wikipedia.org/wiki/%D7%A9%D7%91%D7%A8_(%D7%9E%D7%AA%D7%9E%D7%98%D7%99%D7%A7%D7%94) ) equivalent pages seem to avoid the ambiguity by using exclusively vertically stacked fraction (with the excetion of ?/4 in the Hebrew page) From jr at qsm.co.il Wed Jul 27 10:40:28 2016 From: jr at qsm.co.il (Jonathan Rosenne) Date: Wed, 27 Jul 2016 15:40:28 +0000 Subject: Numerical fractions written in Arabic script In-Reply-To: <1091492d-a536-1d0f-5bd0-d5c74a34cc47@gmail.com> References: <1091492d-a536-1d0f-5bd0-d5c74a34cc47@gmail.com> Message-ID: Regarding Hebrew, please note in the Wikipedia page referred to: ???? ???? ????? ,m/n i.e. LTR with a slash. This is the standard usage. Best Regards, Jonathan Rosenne -----Original Message----- From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Fr?d?ric Grosshans Sent: Wednesday, July 27, 2016 4:04 PM To: unicode Subject: Re: Numerical fractions written in Arabic script Le 27/07/2016 ? 14:29, Fr?d?ric Grosshans a ?crit : > Le 27/07/2016 ? 03:12, Robert Wheelock a ?crit : >> How do Arabs, Iranians, Afghans, Pakistanis, Urdu ... all write their >> equivalents of common numerical fractions (consisting of a numerator, >> a separator character, and a denominator)?!?! >> Considering that Arabic written script reads from right to left (like >> in Hebrew, Syro-Aramaic, and the fantasy language of Tsoly?ni), would >> they use a normal right-facing foreslash (1/2), a left-facing >> backslash (1\2), or do they align numerator above|demoniator below a >> horizontal fraction bar?!?! >> Notice that these people would use the native Arabic-based digits in >> them; nonewithstanding, the forms for |4 5 6| (and?sometimes?those >> for |2 7|) do look quite different from the canonical Arabic forms. > > The subject of modern arabic notation is quite complex, mixing RTL and > LTR consideration, as well as latin/arabic/greek/math mixing, with > several different approaches. A W3C document on this > (https://www.w3.org/TR/arabic-math/) enumerates 4 styles > (Moroccan/Maghreb/Machrek/Persian). It also contains the following > paragraph, which answers your question: > > Finally, although stacked fractions are rendered the same way in > both European and Arabic, bevelled fractions in RTL Arabic will > appear, as one would expect, with the terms in RTL order, i.e. A > divided by B would appear as "B/A". In some locales, the preference > is for the slash to also be mirrored, as "B\A". For these cases, we > suggest that authors employ explicit markup using the REVERSE > SOLIDUS \ Looking at wikipedia (+ some google translate) gives you some examples : If you look at the arabic wikipedia page on fraction https://ar.wikipedia.org/wiki/%D9%83%D8%B3%D8%B1_(%D8%B1%D9%8A%D8%A7%D8%B6%D9%8A%D8%A7%D8%AA), you will see the following sentence : .??? ???? (????): ?? ????? ???? ??? ????? ???? ?? ??????? ????? 10/6 ? 3/2 ? 5/4 According to google translate, all the numerators are smaller than the denominator. A bit below, 2 4/5 is written :5/4 2, which is an interesting mixture of RTL and LTR, as is often the case for numbers in arabic script. On the equivalent Persian wikipedia page https://fa.wikipedia.org/wiki/%DA%A9%D8%B3%D8%B1, 3/4 is written ?/?, that is LTR 3/4 in persian digits, even if the text is RTL. The opposite convention is used. The Hebrew ( https://he.wikipedia.org/wiki/%D7%A9%D7%91%D7%A8_(%D7%9E%D7%AA%D7%9E%D7%98%D7%99%D7%A7%D7%94) ) and Yiddish ( https://he.wikipedia.org/wiki/%D7%A9%D7%91%D7%A8_(%D7%9E%D7%AA%D7%9E%D7%98%D7%99%D7%A7%D7%94) ) equivalent pages seem to avoid the ambiguity by using exclusively vertically stacked fraction (with the excetion of ?/4 in the Hebrew page) From khaledhosny at eglug.org Wed Jul 27 13:31:30 2016 From: khaledhosny at eglug.org (Khaled Hosny) Date: Wed, 27 Jul 2016 20:31:30 +0200 Subject: Numerical fractions written in Arabic script In-Reply-To: References: Message-ID: <20160727183130.GB22592@macbook> On Tue, Jul 26, 2016 at 09:12:38PM -0400, Robert Wheelock wrote: > Hello again, y?all! > > How do Arabs, Iranians, Afghans, Pakistanis, Urdu ... all write their > equivalents of common numerical fractions (consisting of a numerator, a > separator character, and a denominator)?!?! > Considering that Arabic written script reads from right to left (like in > Hebrew, Syro-Aramaic, and the fantasy language of Tsoly?ni), would they use > a normal right-facing foreslash (1/2), a left-facing backslash (1\2), or do > they align numerator above|demoniator below a horizontal fraction bar?!?! In Arabic, beveled fractions are written from left to right with a right facing slash. Also the integer is written to the left of the fraction (whether it is a nut or beveled fraction). Regards, Khaled