From unicode at unicode.org Wed Oct 4 04:14:18 2017 From: unicode at unicode.org (Mathias Bynens via Unicode) Date: Wed, 4 Oct 2017 05:14:18 -0400 Subject: HTTPS In-Reply-To: References: Message-ID: unicode.org and www.unicode.org are now available over HTTPS. E.g. https://unicode.org/Public/10.0.0/ On Thu, Mar 6, 2014 at 3:54 PM, Robbert wrote: > Hi, > > For tools that rely on the Unicode database it would be great if the > databases were available over HTTPS as well: > https://www.unicode.org/Public/6.3.0/ > > In addition to this it would be helpful if the archive also contains > SHA512 checksum files for each Unicode version to verify the integrity of > databases that have already been downloaded (over HTTP), e.g.: > > https://www.unicode.org/Public/6.3.0/SHA512SUMS > > Mozilla already offers such checksums, although unfortunately not over > HTTPS, but they can serve as an example. > > http://releases.mozilla.org/pub/mozilla.org/firefox/ > releases/27.0/SHA512SUMS > > I think this would improve the security of many libraries that directly > and indirectly depend on Unicode. > > Kind regards, > Robbert Broersma > _______________________________________________ > Unicode mailing list > Unicode at unicode.org > http://unicode.org/mailman/listinfo/unicode > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Oct 4 08:08:26 2017 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Wed, 4 Oct 2017 15:08:26 +0200 Subject: HTTPS In-Reply-To: References: Message-ID: At least ! It was important. Thanks for promoting HTTPS everywhere. 2017-10-04 11:14 GMT+02:00 Mathias Bynens via Unicode : > unicode.org and www.unicode.org are now available over HTTPS. E.g. > https://unicode.org/Public/10.0.0/ > > On Thu, Mar 6, 2014 at 3:54 PM, Robbert wrote: > >> Hi, >> >> For tools that rely on the Unicode database it would be great if the >> databases were available over HTTPS as well: >> https://www.unicode.org/Public/6.3.0/ >> >> In addition to this it would be helpful if the archive also contains >> SHA512 checksum files for each Unicode version to verify the integrity of >> databases that have already been downloaded (over HTTP), e.g.: >> >> https://www.unicode.org/Public/6.3.0/SHA512SUMS >> >> Mozilla already offers such checksums, although unfortunately not over >> HTTPS, but they can serve as an example. >> >> http://releases.mozilla.org/pub/mozilla.org/firefox/releases >> /27.0/SHA512SUMS >> >> I think this would improve the security of many libraries that directly >> and indirectly depend on Unicode. >> >> Kind regards, >> Robbert Broersma >> _______________________________________________ >> Unicode mailing list >> Unicode at unicode.org >> http://unicode.org/mailman/listinfo/unicode >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Oct 4 02:39:26 2017 From: unicode at unicode.org (via Unicode) Date: Wed, 4 Oct 2017 07:39:26 +0000 Subject: Question about Karabakh Characters Message-ID: <5b7f17b80a304c59bff4bf4a5c7f2c09@bethel.jw.org> Hi there, The Karabakh language uses Armenian characters, but the following characters do not have a Unicode assigned. (image1.JPG attached) They are pronounced "Yi", "Ini" and "Eh" and used with several combinations. (Image2.JPG attached) Is there any reason these characters are not supported by Unicode? I would appreciate any related information. Thank you! Kazunari Tsuboi -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image1.jpg Type: image/jpeg Size: 10500 bytes Desc: image1.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image2.jpg Type: image/jpeg Size: 20792 bytes Desc: image2.jpg URL: From unicode at unicode.org Wed Oct 4 09:30:38 2017 From: unicode at unicode.org (Michael Everson via Unicode) Date: Wed, 4 Oct 2017 15:30:38 +0100 Subject: Question about Karabakh Characters In-Reply-To: <5b7f17b80a304c59bff4bf4a5c7f2c09@bethel.jw.org> References: <5b7f17b80a304c59bff4bf4a5c7f2c09@bethel.jw.org> Message-ID: They are not encoded, but that example is not sufficient. If you?d like to contact me offline we can discuss this further. Michael Everson > On 4 Oct 2017, at 08:39, via Unicode wrote: > > Hi there, > > The Karabakh language uses Armenian characters, but the following characters do not have a Unicode assigned. (image1.JPG attached) > They are pronounced ?Yi?, ?Ini? and ?Eh? and used with several combinations. (Image2.JPG attached) > > Is there any reason these characters are not supported by Unicode? > I would appreciate any related information. > > Thank you! > > Kazunari Tsuboi > From unicode at unicode.org Wed Oct 4 14:59:13 2017 From: unicode at unicode.org (Asmus Freytag via Unicode) Date: Wed, 4 Oct 2017 12:59:13 -0700 Subject: HTTPS In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Oct 4 15:43:25 2017 From: unicode at unicode.org (Steven R. Loomis via Unicode) Date: Wed, 4 Oct 2017 13:43:25 -0700 Subject: HTTPS In-Reply-To: References: Message-ID: Also just a public note. please do NOT fetch from unicode.org/Public as part of continuous builds (Jenkins, travis, etc). That's too much load for files that change *yearly*. Fetch one copy of the data and use your own copy until it is time to update. Yes, shasums and signatures are great. ICU (now part of Unicode) has been doing this for years. I just signed up this morning to provide such for CLDR data. So let's see about UCD data also. -s On Wed, Oct 4, 2017 at 2:14 AM, Mathias Bynens via Unicode < unicode at unicode.org> wrote: > unicode.org and www.unicode.org are now available over HTTPS. E.g. > https://unicode.org/Public/10.0.0/ > > On Thu, Mar 6, 2014 at 3:54 PM, Robbert wrote: > >> Hi, >> >> For tools that rely on the Unicode database it would be great if the >> databases were available over HTTPS as well: >> https://www.unicode.org/Public/6.3.0/ >> >> In addition to this it would be helpful if the archive also contains >> SHA512 checksum files for each Unicode version to verify the integrity of >> databases that have already been downloaded (over HTTP), e.g.: >> >> https://www.unicode.org/Public/6.3.0/SHA512SUMS >> >> Mozilla already offers such checksums, although unfortunately not over >> HTTPS, but they can serve as an example. >> >> http://releases.mozilla.org/pub/mozilla.org/firefox/releases >> /27.0/SHA512SUMS >> >> I think this would improve the security of many libraries that directly >> and indirectly depend on Unicode. >> >> Kind regards, >> Robbert Broersma >> _______________________________________________ >> Unicode mailing list >> Unicode at unicode.org >> http://unicode.org/mailman/listinfo/unicode >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Oct 4 15:55:20 2017 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Wed, 4 Oct 2017 22:55:20 +0200 Subject: HTTPS In-Reply-To: References: Message-ID: continuousbuilds may just check the statue of the short shasums files to know when one has changed, this would not use lot of bandwidth. Anyway if your website supports HTTP mime requests for conditional downloads , or if clients are using HEAD ratrher than GET requests to get metadata, this saves a lot, without having to download again the same copy of large files. 2017-10-04 22:43 GMT+02:00 Steven R. Loomis via Unicode : > Also just a public note. please do NOT fetch from unicode.org/Public as > part of continuous builds (Jenkins, travis, etc). That's too much load for > files that change *yearly*. Fetch one copy of the data and use your own > copy until it is time to update. > > Yes, shasums and signatures are great. ICU (now part of Unicode) has been > doing this for years. I just signed up this morning to provide such for > CLDR data. So let's see about UCD data also. > > -s > > > On Wed, Oct 4, 2017 at 2:14 AM, Mathias Bynens via Unicode < > unicode at unicode.org> wrote: > >> unicode.org and www.unicode.org are now available over HTTPS. E.g. >> https://unicode.org/Public/10.0.0/ >> >> On Thu, Mar 6, 2014 at 3:54 PM, Robbert wrote: >> >>> Hi, >>> >>> For tools that rely on the Unicode database it would be great if the >>> databases were available over HTTPS as well: >>> https://www.unicode.org/Public/6.3.0/ >>> >>> In addition to this it would be helpful if the archive also contains >>> SHA512 checksum files for each Unicode version to verify the integrity of >>> databases that have already been downloaded (over HTTP), e.g.: >>> >>> https://www.unicode.org/Public/6.3.0/SHA512SUMS >>> >>> Mozilla already offers such checksums, although unfortunately not over >>> HTTPS, but they can serve as an example. >>> >>> http://releases.mozilla.org/pub/mozilla.org/firefox/releases >>> /27.0/SHA512SUMS >>> >>> I think this would improve the security of many libraries that directly >>> and indirectly depend on Unicode. >>> >>> Kind regards, >>> Robbert Broersma >>> _______________________________________________ >>> Unicode mailing list >>> Unicode at unicode.org >>> http://unicode.org/mailman/listinfo/unicode >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Oct 5 00:09:28 2017 From: unicode at unicode.org (via Unicode) Date: Thu, 5 Oct 2017 05:09:28 +0000 Subject: Question about Karabakh Characters In-Reply-To: References: <5b7f17b80a304c59bff4bf4a5c7f2c09@bethel.jw.org> Message-ID: Thank you for your reply. I am currently handling technical support to publish in multi-language. This was found when we were handling a project on the Karabakh language. I was informed that Karabakh has a dictionary containing over 40,000 words that was produced in 2013 which employs the three characters. I personally have not seen this dictionary, but it seems that are ones that need these characters. So I decided to make a post. Kazunari Tsuboi -----Original Message----- From: Michael Everson [mailto:everson at evertype.com] Sent: Wednesday, October 4, 2017 11:31 PM To: Tsuboi, Kazunari Cc: unicode Unicode Discussion Subject: Re: Question about Karabakh Characters They are not encoded, but that example is not sufficient. If you?d like to contact me offline we can discuss this further. Michael Everson > On 4 Oct 2017, at 08:39, via Unicode wrote: > > Hi there, > > The Karabakh language uses Armenian characters, but the following > characters do not have a Unicode assigned. (image1.JPG attached) They > are pronounced ?Yi?, ?Ini? and ?Eh? and used with several > combinations. (Image2.JPG attached) > > Is there any reason these characters are not supported by Unicode? > I would appreciate any related information. > > Thank you! > > Kazunari Tsuboi > From unicode at unicode.org Thu Oct 5 03:10:09 2017 From: unicode at unicode.org (Michael Everson via Unicode) Date: Thu, 5 Oct 2017 09:10:09 +0100 Subject: Question about Karabakh Characters In-Reply-To: References: <5b7f17b80a304c59bff4bf4a5c7f2c09@bethel.jw.org> Message-ID: <306C3F57-7962-42FA-BEA6-C547564A1C98@evertype.com> It is legitimate to add characters for Armenian dialectology, and if you can provide additional evidence of usage in lexicography and (if possible) in other literature, we can see if a proposal can be made. We may do this offline so as to save the list from to many files. I look forward to hearing from you. Nothing will happen, though, without further information. Michael > On 5 Oct 2017, at 06:09, via Unicode wrote: > > Thank you for your reply. > I am currently handling technical support to publish in multi-language. > > This was found when we were handling a project on the Karabakh language. > I was informed that Karabakh has a dictionary containing over 40,000 words that was produced in 2013 which employs the three characters. > I personally have not seen this dictionary, but it seems that are ones that need these characters. > So I decided to make a post. > > Kazunari Tsuboi > > -----Original Message----- > From: Michael Everson [mailto:everson at evertype.com] > Sent: Wednesday, October 4, 2017 11:31 PM > To: Tsuboi, Kazunari > Cc: unicode Unicode Discussion > Subject: Re: Question about Karabakh Characters > > They are not encoded, but that example is not sufficient. If you?d like to contact me offline we can discuss this further. > > Michael Everson > >> On 4 Oct 2017, at 08:39, via Unicode wrote: >> >> Hi there, >> >> The Karabakh language uses Armenian characters, but the following >> characters do not have a Unicode assigned. (image1.JPG attached) They >> are pronounced ?Yi?, ?Ini? and ?Eh? and used with several >> combinations. (Image2.JPG attached) >> >> Is there any reason these characters are not supported by Unicode? >> I would appreciate any related information. >> >> Thank you! >> >> Kazunari Tsuboi >> > > From unicode at unicode.org Mon Oct 9 03:37:39 2017 From: unicode at unicode.org (=?UTF-8?Q?Martin_J._D=c3=bcrst?= via Unicode) Date: Mon, 9 Oct 2017 17:37:39 +0900 Subject: Interesting UTF-8 decoder Message-ID: <8e326913-353b-2c76-fcab-1a0a2b3947e7@it.aoyama.ac.jp> A friend of mine sent me a pointer to http://nullprogram.com/blog/2017/10/06/, a branchless UTF-8 decoder. Regards, Martin. From unicode at unicode.org Mon Oct 9 03:57:50 2017 From: unicode at unicode.org (J Decker via Unicode) Date: Mon, 9 Oct 2017 01:57:50 -0700 Subject: Interesting UTF-8 decoder In-Reply-To: <8e326913-353b-2c76-fcab-1a0a2b3947e7@it.aoyama.ac.jp> References: <8e326913-353b-2c76-fcab-1a0a2b3947e7@it.aoyama.ac.jp> Message-ID: that's interesting; however it will segfault if the string ends on a memory allocation boundary. will have to make sure strings are always allocated with 3 extra bytes. 2017-10-09 1:37 GMT-07:00 Martin J. D?rst via Unicode : > A friend of mine sent me a pointer to > http://nullprogram.com/blog/2017/10/06/, a branchless UTF-8 decoder. > > Regards, Martin. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Oct 9 06:16:03 2017 From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode) Date: Mon, 9 Oct 2017 13:16:03 +0200 Subject: Interesting UTF-8 decoder In-Reply-To: References: <8e326913-353b-2c76-fcab-1a0a2b3947e7@it.aoyama.ac.jp> Message-ID: The paper points out that the input buffer needs to be padded with 3 null bytes as a precondition. Mark On Mon, Oct 9, 2017 at 10:57 AM, J Decker via Unicode wrote: > that's interesting; however it will segfault if the string ends on a > memory allocation boundary. will have to make sure strings are always > allocated with 3 extra bytes. > > 2017-10-09 1:37 GMT-07:00 Martin J. D?rst via Unicode >: > >> A friend of mine sent me a pointer to >> http://nullprogram.com/blog/2017/10/06/, a branchless UTF-8 decoder. >> >> Regards, Martin. >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Oct 10 14:00:12 2017 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Tue, 10 Oct 2017 20:00:12 +0100 Subject: Normalise Tai Tham or not? Message-ID: <20171010200012.05ed4230@JRWUBU2> I'm preparing to share a spell-checker for Northern Thai in the Tai Tham script, and I'm having difficulty deciding whether to offer corrections in NFC/NFD or unnormalised. The problem arises in closed syllables with tone marks. For example, ????? /kin/ 'smell', has two canonically equivalent encodings respecting the principle of phonetic ordering: the unnormalised , which matches the glyph structure of four glyphs: , , and , and the NFC and NFD form . The issues I see are: 1) The unnormalised form is a natural and easy form to type. To type the normalised form character by character does not come naturally, and an input method would be more complex. 2) The unnormalised form is easier for a rendering engine. HarfBuzz actually presents the font with a non-standard canonical form so that the invisible stacker, SAKOT, is reordered to before the subscrpt consonant. The USE of Microsoft would more naturally accommodate the unnormalised form, which would have a natural unit of '' as an alternative to an indivisible final consonant. The USE is not designed to respect canonical equivalence. 3) The normalised form is the form preferred for the Web, but the pressure to use it has decreased. 4) The pressure on search tools to respect canonical equivalence is now relatively low. Some editors do (e.g. LibreOffice); others don't (e.g Emacs, so far as I am aware). Therefore, the dictionary suggestions should match what the input method produces. So, should I offer normalised corrections or unnormalised corrections? Should the spell-checker accept spellings with the dispreferred state (normalised v. unnormalised)? Richard. From unicode at unicode.org Tue Oct 10 14:46:20 2017 From: unicode at unicode.org (Eli Zaretskii via Unicode) Date: Tue, 10 Oct 2017 22:46:20 +0300 Subject: Normalise Tai Tham or not? In-Reply-To: <20171010200012.05ed4230@JRWUBU2> (message from Richard Wordingham via Unicode on Tue, 10 Oct 2017 20:00:12 +0100) References: <20171010200012.05ed4230@JRWUBU2> Message-ID: <83d15upz03.fsf@gnu.org> > Date: Tue, 10 Oct 2017 20:00:12 +0100 > From: Richard Wordingham via Unicode > > 4) The pressure on search tools to respect canonical equivalence is now > relatively low. Some editors do (e.g. LibreOffice); others don't (e.g > Emacs, so far as I am aware). Emacs lately introduced character-folding in searches, but it's turned off by default, as many users objected. From unicode at unicode.org Tue Oct 10 15:51:55 2017 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Tue, 10 Oct 2017 21:51:55 +0100 Subject: Normalise Tai Tham or not? In-Reply-To: <83d15upz03.fsf@gnu.org> References: <20171010200012.05ed4230@JRWUBU2> <83d15upz03.fsf@gnu.org> Message-ID: <20171010215155.0d8dd887@JRWUBU2> On Tue, 10 Oct 2017 22:46:20 +0300 Eli Zaretskii via Unicode wrote: > > Date: Tue, 10 Oct 2017 20:00:12 +0100 > > From: Richard Wordingham via Unicode > > > > 4) The pressure on search tools to respect canonical equivalence is > > now relatively low. Some editors do (e.g. LibreOffice); others > > don't (e.g Emacs, so far as I am aware). > > Emacs lately introduced character-folding in searches, but it's turned > off by default, as many users objected. I don't see how that helps with this problem. If I search for the Northern Thai word /kin/ with the low tone, which means 'smell', I want to find it whichever way round SAKOT and TONE-1 are, and I don't want to find /kin/ with the rising tone, which is implied by having no tone mark and means 'to eat'. Richard. From unicode at unicode.org Wed Oct 11 05:10:26 2017 From: unicode at unicode.org (Eli Zaretskii via Unicode) Date: Wed, 11 Oct 2017 13:10:26 +0300 Subject: Normalise Tai Tham or not? In-Reply-To: <20171010215155.0d8dd887@JRWUBU2> (message from Richard Wordingham via Unicode on Tue, 10 Oct 2017 21:51:55 +0100) References: <20171010200012.05ed4230@JRWUBU2> <83d15upz03.fsf@gnu.org> <20171010215155.0d8dd887@JRWUBU2> Message-ID: <834lr6ouzx.fsf@gnu.org> > Date: Tue, 10 Oct 2017 21:51:55 +0100 > From: Richard Wordingham via Unicode > > > Emacs lately introduced character-folding in searches, but it's turned > > off by default, as many users objected. > > I don't see how that helps with this problem. If I search for the > Northern Thai word /kin/ with the low tone, which means 'smell', I want > to find it whichever way round SAKOT and TONE-1 are, and I don't want to > find /kin/ with the rising tone, which is implied by having no tone > mark and means 'to eat'. That's what this feature is supposed to allow, see char-fold.el in the Emacs sources. From unicode at unicode.org Wed Oct 11 16:01:32 2017 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Wed, 11 Oct 2017 22:01:32 +0100 Subject: Normalise Tai Tham or not? In-Reply-To: <834lr6ouzx.fsf@gnu.org> References: <20171010200012.05ed4230@JRWUBU2> <83d15upz03.fsf@gnu.org> <20171010215155.0d8dd887@JRWUBU2> <834lr6ouzx.fsf@gnu.org> Message-ID: <20171011220132.4228ec61@JRWUBU2> On Wed, 11 Oct 2017 13:10:26 +0300 Eli Zaretskii via Unicode wrote: > > Date: Tue, 10 Oct 2017 21:51:55 +0100 > > From: Richard Wordingham via Unicode > > > > > Emacs lately introduced character-folding in searches, but it's > > > turned off by default, as many users objected. > > > > I don't see how that helps with this problem. If I search for the > > Northern Thai word /kin/ with the low tone, which means 'smell', I > > want to find it whichever way round SAKOT and TONE-1 are, and I > > don't want to find /kin/ with the rising tone, which is implied by > > having no tone mark and means 'to eat'. > > That's what this feature is supposed to allow, see char-fold.el in the > Emacs sources. I downloaded Emacs 25.3.1, and set variable search-default-mode to "char-fold search". Then, as intended, an incremental search for the one character string found the string . The description I had found undersold the noble intention. Instead, having looked at the code, I can see why it should handle the problem of search text and text string being normalised differently - in my example, an NFC search string being used on NFD text. Unfortunately, it doesn't work in general with unnormalised text. The NFC and NFD sequence ??? is canonically equivalent to , but the pair provides an example of the failure to match, in both directions. Thai computing originally dealt with the problem by setting up input rules which prevent one from entering what is now the unnormalised form. The email client I use won't let me type in the unnormalised form - text is converted to NFC on input, both as email text and in search strings. (Latin text and Tai Tham also get normalised on input - this is not special treatment for Thai.) Emacs seems to deal with the issue for Thai by misrendering the unnormalised form. Compulsive normalisers do strengthen the argument for the spell checker standardising ion the normalised form. Incidentally, another example of an editor that won't match canonically equivalent strings is Word in Microsoft Office Standard 2010 - I tried it with the Tai Tham pair. Richard. From unicode at unicode.org Thu Oct 12 01:31:34 2017 From: unicode at unicode.org (Eli Zaretskii via Unicode) Date: Thu, 12 Oct 2017 09:31:34 +0300 Subject: Normalise Tai Tham or not? In-Reply-To: <20171011220132.4228ec61@JRWUBU2> (message from Richard Wordingham via Unicode on Wed, 11 Oct 2017 22:01:32 +0100) References: <20171010200012.05ed4230@JRWUBU2> <83d15upz03.fsf@gnu.org> <20171010215155.0d8dd887@JRWUBU2> <834lr6ouzx.fsf@gnu.org> <20171011220132.4228ec61@JRWUBU2> Message-ID: <83bmlcop15.fsf@gnu.org> > Date: Wed, 11 Oct 2017 22:01:32 +0100 > From: Richard Wordingham via Unicode > > The description I had found undersold the noble intention. If you mean that the documentation doesn't describe the feature well enough, I'd welcome a documentation bug report. > Unfortunately, it doesn't work in general with unnormalised text. Indeed, it wasn't supposed to, on the assumption that those are relatively rare cases, and for an initial implementation could be left unsupported. From unicode at unicode.org Sat Oct 28 06:11:30 2017 From: unicode at unicode.org (Andre Schappo via Unicode) Date: Sat, 28 Oct 2017 11:11:30 +0000 Subject: Emoji anomaly Message-ID: <43499DD0-A67E-47F2-9782-05D666378997@lboro.ac.uk> I am working on a Blog Article ( https://schappo.blogspot.co.uk/2017/10/computer-science-internationalization.html ) and do not currently have access to OSX High Sierra, I am using OSX Sierra. I would appreciate some help from someone using OSX High Sierra. Using Sierra's Chinese Simplified Input Method the Emoji ??? and ??? have an unnecessary U+FE0F variation selector appended. The other Emoji I have tested with Sierra's Chinese Simplified Input Method do not have the variation selector appended. Could someone please check if the same happens with High Sierra Thank you Andr? ?? ?? ?? Andr? Schappo https://schappo.blogspot.co.uk https://twitter.com/andreschappo https://weibo.com/andreschappo https://groups.google.com/forum/#!forum/computer-science-curriculum-internationalization -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Sat Oct 28 18:20:08 2017 From: unicode at unicode.org (Peter Edberg via Unicode) Date: Sat, 28 Oct 2017 16:20:08 -0700 Subject: Emoji anomaly In-Reply-To: <43499DD0-A67E-47F2-9782-05D666378997@lboro.ac.uk> References: <43499DD0-A67E-47F2-9782-05D666378997@lboro.ac.uk> Message-ID: <5F968E8E-E165-412D-BF2E-78BFFD04E4CD@unicode.org> This is about characters U+1F327,U+1F326 The variation selector FE0F is *not* unnecessary in with these. Looking at https://www.unicode.org/Public/emoji/5.0/emoji-data.txt those characters do *not* have the Emoji-Presentation property set, and they do have variation sequences defined. From https://www.unicode.org/reports/tr51/#Emoji_Variation_Selector_Notes , such singleton emoji characters ?should have emoji presentation selectors on base characters with Emoji_Presentation=No whenever an emoji presentation is desired? - Peter E > On Oct 28, 2017, at 4:11 AM, Andre Schappo via Unicode wrote: > > > I am working on a Blog Article ( https://schappo.blogspot.co.uk/2017/10/computer-science-internationalization.html ) and do not currently have access to OSX High Sierra, I am using OSX Sierra. I would appreciate some help from someone using OSX High Sierra. > > Using Sierra's Chinese Simplified Input Method the Emoji ??? and ??? have an unnecessary U+FE0F variation selector appended. The other Emoji I have tested with Sierra's Chinese Simplified Input Method do not have the variation selector appended. Could someone please check if the same happens with High Sierra > > Thank you > > Andr? > ?? ?? ?? > Andr? Schappo > https://schappo.blogspot.co.uk > https://twitter.com/andreschappo > https://weibo.com/andreschappo > https://groups.google.com/forum/#!forum/computer-science-curriculum-internationalization > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Sun Oct 29 08:47:51 2017 From: unicode at unicode.org (Andre Schappo via Unicode) Date: Sun, 29 Oct 2017 13:47:51 +0000 Subject: Emoji anomaly In-Reply-To: <5F968E8E-E165-412D-BF2E-78BFFD04E4CD@unicode.org> References: <43499DD0-A67E-47F2-9782-05D666378997@lboro.ac.uk> <5F968E8E-E165-412D-BF2E-78BFFD04E4CD@unicode.org> Message-ID: Peter Thank you very much for your informative response. I see that U+1F321 ? U+1F32C do not have Emoji_Presentation property set. Time for me to do some reading to determine why. Andr? On 29 Oct 2017, at 00:20, Peter Edberg > wrote: This is about characters U+1F327,U+1F326 The variation selector FE0F is *not* unnecessary in with these. Looking at https://www.unicode.org/Public/emoji/5.0/emoji-data.txt those characters do *not* have the Emoji-Presentation property set, and they do have variation sequences defined. From https://www.unicode.org/reports/tr51/#Emoji_Variation_Selector_Notes, such singleton emoji characters ?should have emoji presentation selectors on base characters with Emoji_Presentation=No whenever an emoji presentation is desired? - Peter E On Oct 28, 2017, at 4:11 AM, Andre Schappo via Unicode > wrote: I am working on a Blog Article ( https://schappo.blogspot.co.uk/2017/10/computer-science-internationalization.html ) and do not currently have access to OSX High Sierra, I am using OSX Sierra. I would appreciate some help from someone using OSX High Sierra. Using Sierra's Chinese Simplified Input Method the Emoji ??? and ??? have an unnecessary U+FE0F variation selector appended. The other Emoji I have tested with Sierra's Chinese Simplified Input Method do not have the variation selector appended. Could someone please check if the same happens with High Sierra Thank you Andr? ?? ?? ?? Andr? Schappo https://schappo.blogspot.co.uk https://twitter.com/andreschappo https://weibo.com/andreschappo https://groups.google.com/forum/#!forum/computer-science-curriculum-internationalization -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Sun Oct 29 12:12:44 2017 From: unicode at unicode.org (Peter Edberg via Unicode) Date: Sun, 29 Oct 2017 10:12:44 -0700 Subject: Emoji anomaly In-Reply-To: References: <43499DD0-A67E-47F2-9782-05D666378997@lboro.ac.uk> <5F968E8E-E165-412D-BF2E-78BFFD04E4CD@unicode.org> Message-ID: Hi Andr?, > U+1F321 ? U+1F32C do not have Emoji_Presentation property set. Time for me to do some reading to determine why. From https://www.unicode.org/emoji/charts-5.0/emoji-versions-sources.html you can see that these characters came into Unicode as a result of their being in the Webdings/Wingdings set, where they had a prior history of being non-emoji text characters. That is why they have Emoji_Presentation=No by default. - Peter E > On Oct 29, 2017, at 6:47 AM, Andre Schappo via Unicode wrote: > > Peter > > Thank you very much for your informative response. I see that U+1F321 ? U+1F32C do not have Emoji_Presentation property set. Time for me to do some reading to determine why. > > Andr? > >> On 29 Oct 2017, at 00:20, Peter Edberg > wrote: >> >> This is about characters U+1F327,U+1F326 >> >> The variation selector FE0F is *not* unnecessary in with these. Looking at >> https://www.unicode.org/Public/emoji/5.0/emoji-data.txt >> those characters do *not* have the Emoji-Presentation property set, and they do have variation sequences defined. >> >> From https://www.unicode.org/reports/tr51/#Emoji_Variation_Selector_Notes , such singleton emoji characters >> ?should have emoji presentation selectors on base characters with Emoji_Presentation=No whenever an emoji presentation is desired? >> >> - Peter E >> >>> On Oct 28, 2017, at 4:11 AM, Andre Schappo via Unicode > wrote: >>> >>> >>> I am working on a Blog Article ( https://schappo.blogspot.co.uk/2017/10/computer-science-internationalization.html ) and do not currently have access to OSX High Sierra, I am using OSX Sierra. I would appreciate some help from someone using OSX High Sierra. >>> >>> Using Sierra's Chinese Simplified Input Method the Emoji ??? and ??? have an unnecessary U+FE0F variation selector appended. The other Emoji I have tested with Sierra's Chinese Simplified Input Method do not have the variation selector appended. Could someone please check if the same happens with High Sierra >>> >>> Thank you >>> >>> Andr? >>> ?? ?? ?? >>> Andr? Schappo >>> https://schappo.blogspot.co.uk >>> https://twitter.com/andreschappo >>> https://weibo.com/andreschappo >>> https://groups.google.com/forum/#!forum/computer-science-curriculum-internationalization >>> >>> > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: