From richard.wordingham at ntlworld.com Sat Nov 1 10:10:50 2014 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sat, 1 Nov 2014 15:10:50 +0000 Subject: Looking for a standard on historical countries In-Reply-To: References: Message-ID: <20141101151050.48c4a871@JRWUBU2> On Fri, 31 Oct 2014 20:43:19 +0100 Philippe Verdy wrote: > How is ths related to Unicode ? One possibility is though the Regional Indicators, but they are defined by the unstable ISO 3166-1 alpha-2 codes. > May be it's associated to CLDR for former regional classifcation of > languages, but I doubt this will ever create any standardization for > historic data that should remain as is without changes in their old > sources for which there are no more any active maintainers, just > interested people (basically historians that may comment about them > the way they want or could invent their new terminology for analysts > and archivists). A lot of useful historic information is missing from CLDR. For example, I believe line-breaking and word-boundary rules are completely missing for 'Sumero-Akkadian' Cuneiform writing systems. The rules were not uniform. On the other hand, an entry for the Assyrian for 'English' as used in the Assyrian homeland would be meaningless. The precise territory covered by a country is not useful within the Unicode domains, nor are debates about independence, nor whether tribute was paid regularly. In general, a more useful division may be by date, but that is barely covered by a system designed for present-day languages. If this thread is of to be of any immediate use, what is the intended use of the information? Richard. From doug at ewellic.org Sat Nov 1 12:28:23 2014 From: doug at ewellic.org (Doug Ewell) Date: Sat, 1 Nov 2014 11:28:23 -0600 Subject: [OT] Re: Looking for a standard on historical countries Message-ID: J?rg Knappen wrote: There was a French experimental standard, AFNOR XP Z 44-002, "Code for the representation of names of historical countries" (August 1997), that seems like it might be what you are looking for: http://www.freestd.us/soft2/638771.htm I had heard that this standard was withdrawn, but I can't be sure about that. Richard Wordingham replied to a reply: >> How is ths related to Unicode ? > > One possibility is though the Regional Indicators, but they are > defined by the unstable ISO 3166-1 alpha-2 codes. ISO 3166-1 alpha-2 code elements, once withdrawn, are not reused for 50 years. That seems relatively stable to me. -- Doug Ewell | Thornton, CO, USA | http://ewellic.org From verdy_p at wanadoo.fr Sat Nov 1 14:35:46 2014 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sat, 1 Nov 2014 20:35:46 +0100 Subject: [OT] Re: Looking for a standard on historical countries In-Reply-To: References: Message-ID: Attention ! You've only isted the catalog entry, but not the actua reference http://www.boutique.afnor.org/norme/xp-z44-002/code-pour-la-representation-des-noms-de-pays-historiques/article/748631/fa046190 Which gives information about context of use (mainly for bibliographic purpose, not for linguistic/terminologic purpose) and with a limited timeframe (starting from 1815 up to but excluding current countries encoced in ISO 3166-1 and their divisions). The standard was not even considered for international use, each national library may have its own classification system, notably for important legislation texts still applicable (such as international treaties or treaties of union and ratifiation instruments thar were used for creating or modifyng the territory of the current country, or about countries for which another third party country is maintaining an official copy in its official archives, a copy that can be enforced in front of existing international courts; but that will be kept for an indefinite time unless all ratifying parties have agreed to obsolete these texts). And there are a lot of very old treties or bilateral agreements around the world which are still enforcable even if the countries have changed their poitical regime and a successor was designated (e.g. there are old treaties from the Kingdom of France ratified and deposited in other EUropean countries with specific clauses which are against the standard national law but still applicable in their area; and it was not in the interest of the Republic after the revolution to cancel these treaties with the risk of splitting the territory; under national rules the Constitution protects all international treaties ratified by France of whose the French Republic is recognized as a successor, sometimes with a shared succession in some regions). These old enforcable texts are very complex to classify and it's normal for a country to organize this with a national standard for its official libraries. Beside this, those countries also have their own team of historians in public research departments and universities and there are needs also for genealogists for today's private successions and it's important of being able to locate and retrieve these old documents. 2014-11-01 18:28 GMT+01:00 Doug Ewell : > J?rg Knappen wrote: > > There was a French experimental standard, AFNOR XP Z 44-002, "Code for the > representation of names of historical countries" (August 1997), that seems > like it might be what you are looking for: > > http://www.freestd.us/soft2/638771.htm > > I had heard that this standard was withdrawn, but I can't be sure about > that. > > Richard Wordingham replied to > a reply: > > How is ths related to Unicode ? >>> >> >> One possibility is though the Regional Indicators, but they are >> defined by the unstable ISO 3166-1 alpha-2 codes. >> > > ISO 3166-1 alpha-2 code elements, once withdrawn, are not reused for 50 > years. That seems relatively stable to me. > > -- > Doug Ewell | Thornton, CO, USA | http://ewellic.org > _______________________________________________ > Unicode mailing list > Unicode at unicode.org > http://unicode.org/mailman/listinfo/unicode > -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Sat Nov 1 15:42:41 2014 From: doug at ewellic.org (Doug Ewell) Date: Sat, 1 Nov 2014 14:42:41 -0600 Subject: [OT] Re: Looking for a standard on historical countries In-Reply-To: References: Message-ID: Philippe Verdy wrote: > Attention ! You've only isted the catalog entry, but not the actua > reference > http://www.boutique.afnor.org/norme/xp-z44-002/code-pour-la-representation-des-noms-de-pays-historiques/article/748631/fa046190 It was a start. Your reference is clearly better. > Which gives information about context of use (mainly for bibliographic > purpose, not for linguistic/terminologic purpose) I don't think J?rg specified the exact purpose to which he wanted to apply this information. > and with a limited timeframe (starting from 1815 up to but excluding > current countries encoced in ISO 3166-1 and their divisions). J?rg said, "anything going beyond 1974 (ISO 3166-3) will be better than nothing." 1815 qualifies. -- Doug Ewell | Thornton, CO, USA | http://ewellic.org From srl at icu-project.org Sat Nov 1 23:57:06 2014 From: srl at icu-project.org (Steven R. Loomis) Date: Sat, 01 Nov 2014 21:57:06 -0700 Subject: Looking for a standard on historical countries In-Reply-To: <20141101151050.48c4a871@JRWUBU2> References: <20141101151050.48c4a871@JRWUBU2> Message-ID: <5455B9A2.9030903@icu-project.org> On 11/1/2014 8:10 AM, Richard Wordingham wrote: > On Fri, 31 Oct 2014 20:43:19 +0100 > Philippe Verdy wrote: > >> How is ths related to Unicode ? > One possibility is though the Regional Indicators, but they are defined > by the unstable ISO 3166-1 alpha-2 codes. It was noted as "off topic". It.s relevant because CLDR is relevant. >> May be it's associated to CLDR for former regional classifcation of >> languages, but I doubt this will ever create any standardization for >> historic data that should remain as is without changes in their old >> sources for which there are no more any active maintainers, just >> interested people (basically historians that may comment about them >> the way they want or could invent their new terminology for analysts >> and archivists). > A lot of useful historic information is missing from CLDR. For example, > I believe line-breaking and word-boundary rules are completely missing > for 'Sumero-Akkadian' Cuneiform writing systems. The rules were not > uniform. On the other hand, an entry for the Assyrian for 'English' as > used in the Assyrian homeland would be meaningless. A lot of speculation happened some time back with the assumption that CLDR would a priori reject historic language contributions such as Latin (it wouldn't). Zero bugs were even filed, let alone any data submitted for Latin. Besides Sumero-Akkadian, we could probably add break rules for, say, Oromo, Slovak, Spanish, and Dutch ( http://unicode.org/cldr/trac/ticket/2992 ). > The precise territory covered by a country is not useful within the > Unicode domains, nor are debates about independence, nor whether tribute > was paid regularly. In general, a more useful division may be by date, > but that is barely covered by a system designed for present-day > languages. Sure. It would need to be a differnet namespace from ISO-3166 and probably IETF BCP 47. I wonder if you could use Linked Open Data sets (come hear about it Monday at IUC38!) to look for ontology/Country that doesn't have a 3166 code, something like the following. You could extract start/end date, successor country, etc. > If this thread is of to be of any immediate use, what is the intended > use of the information? The original post made it sound like it was related to book publishing. "all countries where there was a printing press would be optimal coverage". -s -- IBMer but all opinions are mine. // GPG: 9731166CD8E23A83BEE7C6D3ACA5DBE1FD8FABF1 https://www.ohloh.net/accounts/srl295 // https://ssl.icu-project.org/trac/wiki/Srl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 473 bytes Desc: OpenPGP digital signature URL: From public at khwilliamson.com Wed Nov 5 11:03:20 2014 From: public at khwilliamson.com (Karl Williamson) Date: Wed, 05 Nov 2014 10:03:20 -0700 Subject: New Unicode Emoji draft, available for review In-Reply-To: <54584553.2030808@unicode.org> References: <54584553.2030808@unicode.org> Message-ID: <545A5858.7050705@khwilliamson.com> On 11/03/2014 08:17 PM, announcements at unicode.org wrote: > egg hatching emoji The Unicode Consortium has released the draft > ?Unicode Emoji ? > document, whose main goal is to help improve the interoperability of > emoji characters across implementations by providing guidelines and data. > > This draft document also includes a section on Diversity, with a > mechanism using 5 new proposed characters to provide a variety of skin > tones for existing emoji characters. > > tr51 table 2-2 > > The document is in ?Proposed Draft? state, and made available for public > review and comment. > > http://unicode-inc.blogspot.com/2014/11/new-unicode-emoji-draft-available-for.html > I forwarded this on, and the only response I got was a question regarding if the Fitzpatrick modifiers applied to U+1F4A9? I answered that they only apply to specified emoji. But I wonder if the question was in fact a commentary on what they think of the proposal. From rick at unicode.org Wed Nov 5 15:48:01 2014 From: rick at unicode.org (Rick McGowan) Date: Wed, 05 Nov 2014 13:48:01 -0800 Subject: New Unicode Emoji draft, available for review Message-ID: <545A9B11.4000003@unicode.org> FYI, Posting this on behalf of Mark Davis... Something in his original reply message is apparently toxic to our mail gateway that it can't get through. (Investigating.) May be the literal U+1F4A9, which I have (I'm sorry) redacted below. Rick ------------ > Could be either one [U+1F4A9] > > The exact contents of minimal and optional characters is something that we > want to get feedback on. But I don't think [U+1F4A9] is in the running! > > BTW, I'm seeing about 250 new news articles on this, per hour (in English). > https://www.google.com/search?q=emoji+unicode&tbm=nws&tbs=qdr:h > > Plus a scattering of others, s.a. > http://www.spiegel.de/netzwelt/web/unicode-consortium-emojis-demnaechst-fuer-alle-hautfarben-a-1001125.html From shervinafshar at gmail.com Wed Nov 5 18:11:40 2014 From: shervinafshar at gmail.com (Shervin Afshar) Date: Wed, 5 Nov 2014 16:11:40 -0800 Subject: New Unicode Emoji draft, available for review In-Reply-To: <545A5858.7050705@khwilliamson.com> References: <54584553.2030808@unicode.org> <545A5858.7050705@khwilliamson.com> Message-ID: > > I forwarded this on, and the only response I got was a question regarding > if the Fitzpatrick modifiers applied to U+1F4A9? I answered that they only > apply to specified emoji. But I wonder if the question was in fact a > commentary on what they think of the proposal. > Oh...I think that question is already answered by another emoji which - unlike 1F4A9 - is actually fitz-optional[1]. [1]: http://www.unicode.org/Public/emoji/1.0/emoji-annotations.html#fitz-optional ? Shervin On Wed, Nov 5, 2014 at 9:03 AM, Karl Williamson wrote: > On 11/03/2014 08:17 PM, announcements at unicode.org wrote: > >> egg hatching emoji The Unicode Consortium has released the draft >> ?Unicode Emoji ? >> document, whose main goal is to help improve the interoperability of >> emoji characters across implementations by providing guidelines and data. >> >> This draft document also includes a section on Diversity, with a >> mechanism using 5 new proposed characters to provide a variety of skin >> tones for existing emoji characters. >> >> tr51 table 2-2 >> >> The document is in ?Proposed Draft? state, and made available for public >> review and comment. >> >> http://unicode-inc.blogspot.com/2014/11/new-unicode-emoji- >> draft-available-for.html >> >> > > I forwarded this on, and the only response I got was a question regarding > if the Fitzpatrick modifiers applied to U+1F4A9? I answered that they only > apply to specified emoji. But I wonder if the question was in fact a > commentary on what they think of the proposal. > _______________________________________________ > Unicode mailing list > Unicode at unicode.org > http://unicode.org/mailman/listinfo/unicode > -------------- next part -------------- An HTML attachment was scrubbed... URL: From maiku.fabian at gmail.com Thu Nov 6 02:32:01 2014 From: maiku.fabian at gmail.com (Mike FABIAN) Date: Thu, 06 Nov 2014 09:32:01 +0100 Subject: Question about "Uppercase" in DerivedCoreProperties.txt Message-ID: I have a question about ?Uppercase? in DerivedCoreProperties.txt: U+1F80 ? GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI is listed as ?Lowercase? in http://www.unicode.org/Public/7.0.0/ucd/DerivedCoreProperties.txt : 1F80..1F87 ; Lowercase # L& [8] GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI..GREEK SMALL LETTER ALPHA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI But ?U+1F88 ? GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI? is *not* listed as ?Uppercase? in http://www.unicode.org/Public/7.0.0/ucd/DerivedCoreProperties.txt . Although U+1F80 seems to be Uppercase according to http://www.unicode.org/Public/7.0.0/ucd/UnicodeData.txt because it has a tolower mapping to U+1F80: 1F80;GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI;Ll;0;L;1F00 0345;;;;N;;;1F88;;1F88 1F88;GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI;Lt;0;L;1F08 0345;;;;N;;;;1F80; Is the information in DerivedCoreProperties.txt correct or could this be a bug in DerivedCoreProperties.txt? The above is not only the case for U+1F88, but for several more characters. All the characters listed below have a tolower mapping in http://www.unicode.org/Public/7.0.0/ucd/UnicodeData.txt but are not listed in DerivedCoreProperties.txt as ?Uppercase?: U+1F88 ? has a tolower mapping to U+1F80 ? U+1F89 ? has a tolower mapping to U+1F81 ? U+1F8A ? has a tolower mapping to U+1F82 ? U+1F8B ? has a tolower mapping to U+1F83 ? U+1F8C ? has a tolower mapping to U+1F84 ? U+1F8D ? has a tolower mapping to U+1F85 ? U+1F8E ? has a tolower mapping to U+1F86 ? U+1F8F ? has a tolower mapping to U+1F87 ? U+1F98 ? has a tolower mapping to U+1F90 ? U+1F99 ? has a tolower mapping to U+1F91 ? U+1F9A ? has a tolower mapping to U+1F92 ? U+1F9B ? has a tolower mapping to U+1F93 ? U+1F9C ? has a tolower mapping to U+1F94 ? U+1F9D ? has a tolower mapping to U+1F95 ? U+1F9E ? has a tolower mapping to U+1F96 ? U+1F9F ? has a tolower mapping to U+1F97 ? U+1FA8 ? has a tolower mapping to U+1FA0 ? U+1FA9 ? has a tolower mapping to U+1FA1 ? U+1FAA ? has a tolower mapping to U+1FA2 ? U+1FAB ? has a tolower mapping to U+1FA3 ? U+1FAC ? has a tolower mapping to U+1FA4 ? U+1FAD ? has a tolower mapping to U+1FA5 ? U+1FAE ? has a tolower mapping to U+1FA6 ? U+1FAF ? has a tolower mapping to U+1FA7 ? U+1FBC ? has a tolower mapping to U+1FB3 ? U+1FCC ? has a tolower mapping to U+1FC3 ? U+1FFC ? has a tolower mapping to U+1FF3 ? Is that correct or a bug? -- Mike FABIAN ? Office: +49-69-365051027, internal 8875027 ????????????? From maiku.fabian at gmail.com Thu Nov 6 06:12:32 2014 From: maiku.fabian at gmail.com (Mike FABIAN) Date: Thu, 06 Nov 2014 13:12:32 +0100 Subject: Conflicts between UnicodeData.txt and EastAsianWidth.txt? Message-ID: http://www.unicode.org/Public/7.0.0/ucd/EastAsianWidth.txt contains: 302A..302D;W # Mn [4] IDEOGRAPHIC LEVEL TONE MARK..IDEOGRAPHIC ENTERING TONE MARK which gives us a width of 2 for these 4 characters (because of ?W?). But http://www.unicode.org/Public/7.0.0/ucd/UnicodeData.txt contains: 302A;IDEOGRAPHIC LEVEL TONE MARK;Mn;218;NSM;;;;;N;;;;; 302B;IDEOGRAPHIC RISING TONE MARK;Mn;228;NSM;;;;;N;;;;; 302C;IDEOGRAPHIC DEPARTING TONE MARK;Mn;232;NSM;;;;;N;;;;; 302D;IDEOGRAPHIC ENTERING TONE MARK;Mn;222;NSM;;;;;N;;;;; Doesn?t ?NSM? (non spacing mark) imply a with of 0? Is that a contradition or is this on purpose? -- Mike FABIAN ????????????? From maiku.fabian at gmail.com Thu Nov 6 06:35:40 2014 From: maiku.fabian at gmail.com (Mike FABIAN) Date: Thu, 06 Nov 2014 13:35:40 +0100 Subject: Conflicts between UnicodeData.txt and EastAsianWidth.txt? Message-ID: http://www.unicode.org/Public/7.0.0/ucd/EastAsianWidth.txt contains: 302A..302D;W # Mn [4] IDEOGRAPHIC LEVEL TONE MARK..IDEOGRAPHIC ENTERING TONE MARK which gives us a width of 2 for these 4 characters (because of ?W?). But http://www.unicode.org/Public/7.0.0/ucd/UnicodeData.txt contains: 302A;IDEOGRAPHIC LEVEL TONE MARK;Mn;218;NSM;;;;;N;;;;; 302B;IDEOGRAPHIC RISING TONE MARK;Mn;228;NSM;;;;;N;;;;; 302C;IDEOGRAPHIC DEPARTING TONE MARK;Mn;232;NSM;;;;;N;;;;; 302D;IDEOGRAPHIC ENTERING TONE MARK;Mn;222;NSM;;;;;N;;;;; Doesn?t ?NSM? (non spacing mark) imply a with of 0? Is that a contradition or is this on purpose? -- Mike FABIAN ????????????? From maiku.fabian at gmail.com Thu Nov 6 09:55:05 2014 From: maiku.fabian at gmail.com (Mike FABIAN) Date: Thu, 06 Nov 2014 16:55:05 +0100 Subject: Question about =?utf-8?B?4oCcVXBwZXJjYXNl4oCd?= in DerivedCoreProperties.txt Message-ID: I have a question about ?Uppercase? in DerivedCoreProperties.txt: U+1F80 ? GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI is listed as ?Lowercase? in http://www.unicode.org/Public/7.0.0/ucd/DerivedCoreProperties.txt : 1F80..1F87 ; Lowercase # L& [8] GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI..GREEK SMALL LETTER ALPHA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI But ?U+1F88 ? GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI? is *not* listed as ?Uppercase? in http://www.unicode.org/Public/7.0.0/ucd/DerivedCoreProperties.txt . Although U+1F80 seems to be Uppercase according to http://www.unicode.org/Public/7.0.0/ucd/UnicodeData.txt because it has a tolower mapping to U+1F80: 1F80;GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI;Ll;0;L;1F00 0345;;;;N;;;1F88;;1F88 1F88;GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI;Lt;0;L;1F08 0345;;;;N;;;;1F80; Is the information in DerivedCoreProperties.txt correct or could this be a bug in DerivedCoreProperties.txt? The above is not only the case for U+1F88, but for several more characters. All the characters listed below have a tolower mapping in http://www.unicode.org/Public/7.0.0/ucd/UnicodeData.txt but are not listed in DerivedCoreProperties.txt as ?Uppercase?: U+1F88 ? has a tolower mapping to U+1F80 ? U+1F89 ? has a tolower mapping to U+1F81 ? U+1F8A ? has a tolower mapping to U+1F82 ? U+1F8B ? has a tolower mapping to U+1F83 ? U+1F8C ? has a tolower mapping to U+1F84 ? U+1F8D ? has a tolower mapping to U+1F85 ? U+1F8E ? has a tolower mapping to U+1F86 ? U+1F8F ? has a tolower mapping to U+1F87 ? U+1F98 ? has a tolower mapping to U+1F90 ? U+1F99 ? has a tolower mapping to U+1F91 ? U+1F9A ? has a tolower mapping to U+1F92 ? U+1F9B ? has a tolower mapping to U+1F93 ? U+1F9C ? has a tolower mapping to U+1F94 ? U+1F9D ? has a tolower mapping to U+1F95 ? U+1F9E ? has a tolower mapping to U+1F96 ? U+1F9F ? has a tolower mapping to U+1F97 ? U+1FA8 ? has a tolower mapping to U+1FA0 ? U+1FA9 ? has a tolower mapping to U+1FA1 ? U+1FAA ? has a tolower mapping to U+1FA2 ? U+1FAB ? has a tolower mapping to U+1FA3 ? U+1FAC ? has a tolower mapping to U+1FA4 ? U+1FAD ? has a tolower mapping to U+1FA5 ? U+1FAE ? has a tolower mapping to U+1FA6 ? U+1FAF ? has a tolower mapping to U+1FA7 ? U+1FBC ? has a tolower mapping to U+1FB3 ? U+1FCC ? has a tolower mapping to U+1FC3 ? U+1FFC ? has a tolower mapping to U+1FF3 ? Is that correct or a bug? -- ?? Mike FABIAN ????????????? From verdy_p at wanadoo.fr Thu Nov 6 13:46:11 2014 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Thu, 6 Nov 2014 20:46:11 +0100 Subject: =?UTF-8?Q?Re=3A_Question_about_=E2=80=9CUppercase=E2=80=9D_in_DerivedCorePro?= =?UTF-8?Q?perties=2Etxt?= In-Reply-To: References: Message-ID: this is a "feature" of the Greek alphabet that the lowercase iota subscript can be capitalized in two different ways : either as a subscript below the uppercase main letter, or as a standard iota capitalized. The subscript form is a combining character, but not the non-subscript form. There shouls exist a special contextual rule for language specific casings, there's one already for the final sigma; but not the iota. It is not evident to handle and in fact the choice of case mapping is not specifically a lingusitic rule but a rendering style rule : for carved inscriptions, which are generally using only capitals, the combining forms are generally avoided and a reduced alphabet is used. For handwritten and cursive styles, the extended alphabet is used and this enables contextual forms including the small iota subscript and final small sigma an many combining signs (this also allows other placement rules for accents. For printing purpose or disp?lay there's no rule, the document author enables or disables the extended alphabet (disabled geerally for rendering with small resolutions). The simple case mappngs however should preserve the distinctions present on the extended alphabet, but simple uppercasing text should not convert lowercase to all uppercase with an appended uppercase iota, even if this maps a lowercase letter to a titlecase one (it would be lossy, simplet casing rules should be lossless). case mappings in the ?ain UCD however ignore the contextual rules and language-s?pecific and style specific rules. But even if they are wrong this cannot be changed. The simple mappings in the main UCD file should not be assumed to be lossless. Actual case mappers do not use just these basic rules which are just the most frequent mappings assumed (anyway any kinds of case concersions introduces a loss, the degree of los is variable when mappings are not concerned by just a single pair of simple letters, see also the old difficulties about the German ess-tsett or sharp sign, and about many ligatures that became plain letters in some contexts, including the ampersand '&" sign which originates from the "et" ligature, or the German umlaut which inherits some old behavior of the superscripted small latin letter "e" behaving like the Greek iota script in Fraktur font styles) 2014-11-06 16:55 GMT+01:00 Mike FABIAN : > > I have a question about ?Uppercase? in DerivedCoreProperties.txt: > > U+1F80 ? GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI > is listed as ?Lowercase? in > http://www.unicode.org/Public/7.0.0/ucd/DerivedCoreProperties.txt : > > 1F80..1F87 ; Lowercase # L& [8] GREEK SMALL LETTER ALPHA WITH > PSILI AND YPOGEGRAMMENI..GREEK SMALL LETTER ALPHA WITH DASIA AND > PERISPOMENI AND YPOGEGRAMMENI > > But > > ?U+1F88 ? GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI? > is *not* listed as ?Uppercase? in > http://www.unicode.org/Public/7.0.0/ucd/DerivedCoreProperties.txt . > > Although U+1F80 seems to be Uppercase according to > http://www.unicode.org/Public/7.0.0/ucd/UnicodeData.txt > because it has a tolower mapping to U+1F80: > > 1F80;GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI;Ll;0;L;1F00 > 0345;;;;N;;;1F88;;1F88 > 1F88;GREEK CAPITAL LETTER ALPHA WITH PSILI AND > PROSGEGRAMMENI;Lt;0;L;1F08 0345;;;;N;;;;1F80; > > Is the information in DerivedCoreProperties.txt correct or > could this be a bug in DerivedCoreProperties.txt? > > The above is not only the case for U+1F88, but for several more characters. > > All the characters listed below have a tolower mapping in > http://www.unicode.org/Public/7.0.0/ucd/UnicodeData.txt > but are not listed in DerivedCoreProperties.txt as ?Uppercase?: > > U+1F88 ? has a tolower mapping to U+1F80 ? > U+1F89 ? has a tolower mapping to U+1F81 ? > U+1F8A ? has a tolower mapping to U+1F82 ? > U+1F8B ? has a tolower mapping to U+1F83 ? > U+1F8C ? has a tolower mapping to U+1F84 ? > U+1F8D ? has a tolower mapping to U+1F85 ? > U+1F8E ? has a tolower mapping to U+1F86 ? > U+1F8F ? has a tolower mapping to U+1F87 ? > U+1F98 ? has a tolower mapping to U+1F90 ? > U+1F99 ? has a tolower mapping to U+1F91 ? > U+1F9A ? has a tolower mapping to U+1F92 ? > U+1F9B ? has a tolower mapping to U+1F93 ? > U+1F9C ? has a tolower mapping to U+1F94 ? > U+1F9D ? has a tolower mapping to U+1F95 ? > U+1F9E ? has a tolower mapping to U+1F96 ? > U+1F9F ? has a tolower mapping to U+1F97 ? > U+1FA8 ? has a tolower mapping to U+1FA0 ? > U+1FA9 ? has a tolower mapping to U+1FA1 ? > U+1FAA ? has a tolower mapping to U+1FA2 ? > U+1FAB ? has a tolower mapping to U+1FA3 ? > U+1FAC ? has a tolower mapping to U+1FA4 ? > U+1FAD ? has a tolower mapping to U+1FA5 ? > U+1FAE ? has a tolower mapping to U+1FA6 ? > U+1FAF ? has a tolower mapping to U+1FA7 ? > U+1FBC ? has a tolower mapping to U+1FB3 ? > U+1FCC ? has a tolower mapping to U+1FC3 ? > U+1FFC ? has a tolower mapping to U+1FF3 ? > > Is that correct or a bug? > > -- > ?? Mike FABIAN > ????????????? > _______________________________________________ > Unicode mailing list > Unicode at unicode.org > http://unicode.org/mailman/listinfo/unicode > -------------- next part -------------- An HTML attachment was scrubbed... URL: From liancu at microsoft.com Thu Nov 6 15:00:30 2014 From: liancu at microsoft.com (Laurentiu Iancu) Date: Thu, 6 Nov 2014 21:00:30 +0000 Subject: Conflicts between UnicodeData.txt and EastAsianWidth.txt? In-Reply-To: References: Message-ID: Hello, It is not a contradiction. The East_Asian_Width property values assigned to combining marks are described in Section 6.2 of UAX #11, at http://www.unicode.org/reports/tr11/#Combining: ?In particular, nonspacing marks do not possess actual advance width. Therefore, even when displaying combining marks, the East_Asian_Width property cannot be related to the advance width of these characters. However, it can be useful in determining the encoding length in a legacy encoding, or the choice of font for the range of characters including that nonspacing mark. The width of the glyph image of a nonspacing mark should always be chosen as the appropriate one for the width of the base character.? The nonspacing kana voicing marks, U+3099 and U+309A, have the same classification: gc=Mn and ea=W. Regards, L. -----Original Message----- From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Mike FABIAN Sent: Thursday, November 6, 2014 4:13 AM To: unicode at unicode.org Subject: Conflicts between UnicodeData.txt and EastAsianWidth.txt? http://www.unicode.org/Public/7.0.0/ucd/EastAsianWidth.txt contains: 302A..302D;W # Mn [4] IDEOGRAPHIC LEVEL TONE MARK..IDEOGRAPHIC ENTERING TONE MARK which gives us a width of 2 for these 4 characters (because of ?W?). But http://www.unicode.org/Public/7.0.0/ucd/UnicodeData.txt contains: 302A;IDEOGRAPHIC LEVEL TONE MARK;Mn;218;NSM;;;;;N;;;;; 302B;IDEOGRAPHIC RISING TONE MARK;Mn;228;NSM;;;;;N;;;;; 302C;IDEOGRAPHIC DEPARTING TONE MARK;Mn;232;NSM;;;;;N;;;;; 302D;IDEOGRAPHIC ENTERING TONE MARK;Mn;222;NSM;;;;;N;;;;; Doesn?t ?NSM? (non spacing mark) imply a with of 0? Is that a contradition or is this on purpose? -- Mike FABIAN > ????????????? _______________________________________________ Unicode mailing list Unicode at unicode.org http://unicode.org/mailman/listinfo/unicode -------------- next part -------------- An HTML attachment was scrubbed... URL: From liancu at microsoft.com Thu Nov 6 16:31:37 2014 From: liancu at microsoft.com (Laurentiu Iancu) Date: Thu, 6 Nov 2014 22:31:37 +0000 Subject: Question about "Uppercase" in DerivedCoreProperties.txt In-Reply-To: References: Message-ID: Hello, The property Uppercase is a binary, informative property derived from General_Category (gc=Lu) and Other_Uppercase (OUpper=Y), as documented in Section 5.3 of UAX #44, at http://www.unicode.org/reports/tr44/#Uppercase. All of the characters you enumerated are titlecase letters (gc=Lt) rather than uppercase letters (gc=Lu), and they are not specifically assigned Other_Uppercase (which would otherwise contradict their General_Category). Following the derivation, they do not have the Uppercase binary property. For a visualization of the set of characters assigned the binary property Uppercase in relation to the set of Uppercase_Letter characters (gc=Lu), you can use the UnicodeSet comparison tool at http://www.unicode.org/cldr/utility/unicodeset.jsp. Enter ?[:gc=Lu:]? in one input field and ?[:Uppercase:]? in the other field, then click on Compare. Regards, L. -----Original Message----- From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Mike FABIAN Sent: Thursday, November 6, 2014 12:32 AM To: unicode at unicode.org Subject: Question about "Uppercase" in DerivedCoreProperties.txt I have a question about ?Uppercase? in DerivedCoreProperties.txt: U+1F80 ? GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI is listed as ?Lowercase? in http://www.unicode.org/Public/7.0.0/ucd/DerivedCoreProperties.txt : 1F80..1F87 ; Lowercase # L& [8] GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI..GREEK SMALL LETTER ALPHA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI But ?U+1F88 ? GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI? is *not* listed as ?Uppercase? in http://www.unicode.org/Public/7.0.0/ucd/DerivedCoreProperties.txt . Although U+1F80 seems to be Uppercase according to http://www.unicode.org/Public/7.0.0/ucd/UnicodeData.txt because it has a tolower mapping to U+1F80: 1F80;GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI;Ll;0;L;1F00 0345;;;;N;;;1F88;;1F88 1F88;GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI;Lt;0;L;1F08 0345;;;;N;;;;1F80; Is the information in DerivedCoreProperties.txt correct or could this be a bug in DerivedCoreProperties.txt? The above is not only the case for U+1F88, but for several more characters. All the characters listed below have a tolower mapping in http://www.unicode.org/Public/7.0.0/ucd/UnicodeData.txt but are not listed in DerivedCoreProperties.txt as ?Uppercase?: U+1F88 ? has a tolower mapping to U+1F80 ? U+1F89 ? has a tolower mapping to U+1F81 ? U+1F8A ? has a tolower mapping to U+1F82 ? U+1F8B ? has a tolower mapping to U+1F83 ? U+1F8C ? has a tolower mapping to U+1F84 ? U+1F8D ? has a tolower mapping to U+1F85 ? U+1F8E ? has a tolower mapping to U+1F86 ? U+1F8F ? has a tolower mapping to U+1F87 ? U+1F98 ? has a tolower mapping to U+1F90 ? U+1F99 ? has a tolower mapping to U+1F91 ? U+1F9A ? has a tolower mapping to U+1F92 ? U+1F9B ? has a tolower mapping to U+1F93 ? U+1F9C ? has a tolower mapping to U+1F94 ? U+1F9D ? has a tolower mapping to U+1F95 ? U+1F9E ? has a tolower mapping to U+1F96 ? U+1F9F ? has a tolower mapping to U+1F97 ? U+1FA8 ? has a tolower mapping to U+1FA0 ? U+1FA9 ? has a tolower mapping to U+1FA1 ? U+1FAA ? has a tolower mapping to U+1FA2 ? U+1FAB ? has a tolower mapping to U+1FA3 ? U+1FAC ? has a tolower mapping to U+1FA4 ? U+1FAD ? has a tolower mapping to U+1FA5 ? U+1FAE ? has a tolower mapping to U+1FA6 ? U+1FAF ? has a tolower mapping to U+1FA7 ? U+1FBC ? has a tolower mapping to U+1FB3 ? U+1FCC ? has a tolower mapping to U+1FC3 ? U+1FFC ? has a tolower mapping to U+1FF3 ? Is that correct or a bug? -- Mike FABIAN > ? Office: +49-69-365051027, internal 8875027 ????????????? _______________________________________________ Unicode mailing list Unicode at unicode.org http://unicode.org/mailman/listinfo/unicode -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrea.giammarchi at gmail.com Thu Nov 6 17:27:53 2014 From: andrea.giammarchi at gmail.com (Andrea Giammarchi) Date: Thu, 6 Nov 2014 23:27:53 +0000 Subject: Open Source Emoji for the Web Message-ID: I'd like to thank those that helped me a while ago figuring out variants and emoji behavior. Today we are open sourcing a relatively small JS library and 800+ CDN based assets able to bring unified emoji in every WebView capable device and browser. We are also planning to implement the recently introduced "diversity" for the Unicode 8 draft as soon as we'll figure out a good approach for it ( and btw, the default fallback is great! ) This effort and collaboration is between Twitter [1], MaxCDN [2], and Wordpress [3]. Any comment or suggestion will be more than welcome and appreciated. Thanks again and Best Regards [1] https://blog.twitter.com/2014/open-sourcing-twitter-emoji-for-everyone [2] https://www.maxcdn.com/blog/emojis-ftw/ [3] http://en.blog.wordpress.com/2014/11/06/emoji-everywhere/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From alolita.sharma at gmail.com Thu Nov 6 17:35:33 2014 From: alolita.sharma at gmail.com (Alolita Sharma) Date: Thu, 6 Nov 2014 15:35:33 -0800 Subject: Open Source Emoji for the Web In-Reply-To: References: Message-ID: What a great idea! Thanks for open sourcing this project. Will spread the call for contribution! Please share the URL for open source code + asset repo :-) Best, Alolita On Thu, Nov 6, 2014 at 3:27 PM, Andrea Giammarchi < andrea.giammarchi at gmail.com> wrote: > I'd like to thank those that helped me a while ago figuring out variants > and emoji behavior. > > Today we are open sourcing a relatively small JS library and 800+ CDN > based assets able to bring unified emoji in every WebView capable device > and browser. > > We are also planning to implement the recently introduced "diversity" for > the Unicode 8 draft as soon as we'll figure out a good approach for it ( > and btw, the default fallback is great! ) > > This effort and collaboration is between Twitter [1], MaxCDN [2], and > Wordpress [3]. > > Any comment or suggestion will be more than welcome and appreciated. > > Thanks again and Best Regards > > [1] https://blog.twitter.com/2014/open-sourcing-twitter-emoji-for-everyone > [2] https://www.maxcdn.com/blog/emojis-ftw/ > [3] http://en.blog.wordpress.com/2014/11/06/emoji-everywhere/ > > > _______________________________________________ > Unicode mailing list > Unicode at unicode.org > http://unicode.org/mailman/listinfo/unicode > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Thu Nov 6 18:06:47 2014 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Thu, 6 Nov 2014 16:06:47 -0800 Subject: keynote Message-ID: As an experiment, we recorded the keynote at the Unicode Conference. I posted them at http://macchiati.blogspot.com/2014/11/unicode-emoji.html Mark *? Il meglio ? l?inimico del bene ?* -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrea.giammarchi at gmail.com Thu Nov 6 18:16:08 2014 From: andrea.giammarchi at gmail.com (Andrea Giammarchi) Date: Fri, 7 Nov 2014 00:16:08 +0000 Subject: Open Source Emoji for the Web In-Reply-To: References: Message-ID: I can't believe I forgot the most important one: https://github.com/twitter/twemoji Apologies for the confusion and glad you appreciated the initiative. Regards On Thu, Nov 6, 2014 at 11:35 PM, Alolita Sharma wrote: > What a great idea! Thanks for open sourcing this project. Will spread the > call for contribution! > > Please share the URL for open source code + asset repo :-) > > Best, > Alolita > > > > On Thu, Nov 6, 2014 at 3:27 PM, Andrea Giammarchi < > andrea.giammarchi at gmail.com> wrote: > >> I'd like to thank those that helped me a while ago figuring out variants >> and emoji behavior. >> >> Today we are open sourcing a relatively small JS library and 800+ CDN >> based assets able to bring unified emoji in every WebView capable device >> and browser. >> >> We are also planning to implement the recently introduced "diversity" for >> the Unicode 8 draft as soon as we'll figure out a good approach for it ( >> and btw, the default fallback is great! ) >> >> This effort and collaboration is between Twitter [1], MaxCDN [2], and >> Wordpress [3]. >> >> Any comment or suggestion will be more than welcome and appreciated. >> >> Thanks again and Best Regards >> >> [1] >> https://blog.twitter.com/2014/open-sourcing-twitter-emoji-for-everyone >> [2] https://www.maxcdn.com/blog/emojis-ftw/ >> [3] http://en.blog.wordpress.com/2014/11/06/emoji-everywhere/ >> >> >> _______________________________________________ >> Unicode mailing list >> Unicode at unicode.org >> http://unicode.org/mailman/listinfo/unicode >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Thu Nov 6 18:18:03 2014 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Thu, 6 Nov 2014 16:18:03 -0800 Subject: Open Source Emoji for the Web In-Reply-To: References: Message-ID: Very nice. I'd have one suggestion. People appear to be converging on similar file names for the emoji. - Lowercase hex numbers, - at least 4 digits, - otherwise no leading zeros, - multiple code points separated by _, - with optional prefix/suffix. Like "dcm_0030_20e3.png". I'd suggest using that convention. Not a big thing, but makes it more consistent in tooling. Mark *? Il meglio ? l?inimico del bene ?* On Thu, Nov 6, 2014 at 3:27 PM, Andrea Giammarchi < andrea.giammarchi at gmail.com> wrote: > I'd like to thank those that helped me a while ago figuring out variants > and emoji behavior. > > Today we are open sourcing a relatively small JS library and 800+ CDN > based assets able to bring unified emoji in every WebView capable device > and browser. > > We are also planning to implement the recently introduced "diversity" for > the Unicode 8 draft as soon as we'll figure out a good approach for it ( > and btw, the default fallback is great! ) > > This effort and collaboration is between Twitter [1], MaxCDN [2], and > Wordpress [3]. > > Any comment or suggestion will be more than welcome and appreciated. > > Thanks again and Best Regards > > [1] https://blog.twitter.com/2014/open-sourcing-twitter-emoji-for-everyone > [2] https://www.maxcdn.com/blog/emojis-ftw/ > [3] http://en.blog.wordpress.com/2014/11/06/emoji-everywhere/ > > > _______________________________________________ > Unicode mailing list > Unicode at unicode.org > http://unicode.org/mailman/listinfo/unicode > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alolita.sharma at gmail.com Thu Nov 6 18:20:51 2014 From: alolita.sharma at gmail.com (Alolita Sharma) Date: Thu, 6 Nov 2014 16:20:51 -0800 Subject: Open Source Emoji for the Web In-Reply-To: References: Message-ID: Thanks :-) Best, Alolita On Thu, Nov 6, 2014 at 4:16 PM, Andrea Giammarchi < andrea.giammarchi at gmail.com> wrote: > I can't believe I forgot the most important one: > https://github.com/twitter/twemoji > > Apologies for the confusion and glad you appreciated the initiative. > > Regards > > On Thu, Nov 6, 2014 at 11:35 PM, Alolita Sharma > wrote: > >> What a great idea! Thanks for open sourcing this project. Will spread the >> call for contribution! >> >> Please share the URL for open source code + asset repo :-) >> >> Best, >> Alolita >> >> >> >> On Thu, Nov 6, 2014 at 3:27 PM, Andrea Giammarchi < >> andrea.giammarchi at gmail.com> wrote: >> >>> I'd like to thank those that helped me a while ago figuring out variants >>> and emoji behavior. >>> >>> Today we are open sourcing a relatively small JS library and 800+ CDN >>> based assets able to bring unified emoji in every WebView capable device >>> and browser. >>> >>> We are also planning to implement the recently introduced "diversity" >>> for the Unicode 8 draft as soon as we'll figure out a good approach for it >>> ( and btw, the default fallback is great! ) >>> >>> This effort and collaboration is between Twitter [1], MaxCDN [2], and >>> Wordpress [3]. >>> >>> Any comment or suggestion will be more than welcome and appreciated. >>> >>> Thanks again and Best Regards >>> >>> [1] >>> https://blog.twitter.com/2014/open-sourcing-twitter-emoji-for-everyone >>> [2] https://www.maxcdn.com/blog/emojis-ftw/ >>> [3] http://en.blog.wordpress.com/2014/11/06/emoji-everywhere/ >>> >>> >>> _______________________________________________ >>> Unicode mailing list >>> Unicode at unicode.org >>> http://unicode.org/mailman/listinfo/unicode >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrea.giammarchi at gmail.com Fri Nov 7 01:30:08 2014 From: andrea.giammarchi at gmail.com (Andrea Giammarchi) Date: Fri, 7 Nov 2014 07:30:08 +0000 Subject: Open Source Emoji for the Web In-Reply-To: References: Message-ID: Thanks Mark, I will consider this change with CDN chaps too since that would invalidate already a lot of cached content at the time it'll ship :-/ We should have paid more attention, on the other side if you need assets locally instead of via CDN a script capable of renaming assets from current form to your suggested one seems straight forward to me. Would that (sort of) work? Thanks On Fri, Nov 7, 2014 at 12:18 AM, Mark Davis ?? wrote: > Very nice. > > I'd have one suggestion. People appear to be converging on similar file > names for the emoji. > > - Lowercase hex numbers, > - at least 4 digits, > - otherwise no leading zeros, > - multiple code points separated by _, > - with optional prefix/suffix. > > Like "dcm_0030_20e3.png". I'd suggest using that convention. > > Not a big thing, but makes it more consistent in tooling. > > > Mark > > *? Il meglio ? l?inimico del bene ?* > > On Thu, Nov 6, 2014 at 3:27 PM, Andrea Giammarchi < > andrea.giammarchi at gmail.com> wrote: > >> I'd like to thank those that helped me a while ago figuring out variants >> and emoji behavior. >> >> Today we are open sourcing a relatively small JS library and 800+ CDN >> based assets able to bring unified emoji in every WebView capable device >> and browser. >> >> We are also planning to implement the recently introduced "diversity" for >> the Unicode 8 draft as soon as we'll figure out a good approach for it ( >> and btw, the default fallback is great! ) >> >> This effort and collaboration is between Twitter [1], MaxCDN [2], and >> Wordpress [3]. >> >> Any comment or suggestion will be more than welcome and appreciated. >> >> Thanks again and Best Regards >> >> [1] >> https://blog.twitter.com/2014/open-sourcing-twitter-emoji-for-everyone >> [2] https://www.maxcdn.com/blog/emojis-ftw/ >> [3] http://en.blog.wordpress.com/2014/11/06/emoji-everywhere/ >> >> >> _______________________________________________ >> Unicode mailing list >> Unicode at unicode.org >> http://unicode.org/mailman/listinfo/unicode >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From johannes at bergerhausen.com Fri Nov 7 03:07:43 2014 From: johannes at bergerhausen.com (Johannes Bergerhausen) Date: Fri, 7 Nov 2014 10:07:43 +0100 Subject: keynote In-Reply-To: References: Message-ID: Great! Thank you! Johannes + + + + + + + + + + + + + + + + + + + Bergerhausen Konzeption und Entwurf Gladbacher Stra?e 40, D-50672 K?ln, Germany + + + + + + + + + + + + + + + + + + + Prof. Bergerhausen Hochschule Mainz, School of Design University of Applied Sciences Holzstra?e 36, D-55116 Mainz, Germany T ++49 (0) 6131 - 628 - 2233 www.designinmainz.de www.decodeunicode.org www.gutenberg-intermedia.de www.hs-mainz.de/gestaltung -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Fri Nov 7 07:57:37 2014 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Fri, 7 Nov 2014 14:57:37 +0100 Subject: =?UTF-8?Q?Re=3A_Question_about_=E2=80=9CUppercase=E2=80=9D_in_DerivedCorePro?= =?UTF-8?Q?perties=2Etxt?= In-Reply-To: References: Message-ID: note that tolower() and toupper() can only work one 1-character level, it is not recommended for use for changing case of plain text. Its purpose should be limited to use cases where letters can be safely isolated from their context, for example when handling letters as numbers (e.g. section numbering). For correct handling of locales, to upper and toupper should be replaced by strtolower and strtoupper (or their aliases) which will be able to process character clusters and contextual casing rules needed for a language or orthographic style (such as monotonic and polytonic Greek, or for specific locales intended for medieval texts or old classic scriptures). strupper and strlower can then perform MORE mappings that tolower and toupper cannot perform using only simple mappings. So precombined Greek letters with iota subscripts can only be converted by preserving the iota subscript (for which islower() and isupper() are BOTH false when it is encoded separately and not precombined). When a Greek letter precombined with a iota subscript is found, the letter case of this iota subscript should be ignored, and only the lettercase of the base letter will be considered, and this means that it will only be possible for toupper() and toupper() to map one orthographic style: the style that preserves the subscript but not the classic Greek or modern monotonic style that doesn't "know" anything about this "medieval" extension of the Greek alphabet, which was still in use in the begining of the 1970's (handling polytonic Greek with tolower() and toupper(), or with islower() and isupper() will not produce the correct result). For modern Greek, there's no use of this iota subscript, so we are in the same situation as classic Greek (before the Christian era), except that modern Greek still uses a few accents (notably the "tonos" equivalent in Unicode to the acute accent, even if its placement over Greek capitals is preferably before the letter rather than above it as it could be suggested by its assigned combining class). 2014-11-07 12:32 GMT+01:00 Mike FABIAN : > Philippe Verdy ????????: > > > this is a "feature" of the Greek alphabet that the lowercase iota > subscript > > can be capitalized in two different ways : either as a subscript below > the > > uppercase main letter, or as a standard iota capitalized. The subscript > > form is a combining character, but not the non-subscript form. > > Laurentiu> All of the characters you enumerated are titlecase letters > Laurentiu> (gc=Lt) rather than uppercase letters (gc=Lu), > > U+1F80 ? is something like ?? and could be capitalized as ?? or as ?. > ? is something like ?? so I understand now that ? can be considered as > titlecase (gc=Lt). > Note that for modern Greek there's still a difficulty about the special final form of lowercase sigma: it is effectively lowercase (islower should return true), not titlecase, and toupper will map it to a standard capital Sigma. But the reverse conversion will only be able to convert the uppercase sigma to a standard lowercase sigma, ignoring the final form. To handle the final form correctly, don't use tolower() character per character, but use strtolower() and use a decent library that supports contextual rules (the same will be true for the German ess-tsett which was capitalized as a two S but not reversible, even if recently an "uppercase" variant of ess-tsett was added in Unicode, but it is still extremely rarely used: it is extremly difficult to determine how to convert a double capital S and most libraries will only convert it to a double lowercase s, and some locales deliberatly decide not to alter the lowercase ess-tsett with loupper or strtoupper; this is still correct if those libraries have not be updated to use the capital ess-tsett now supported in more recent versions of Unicode, but not found in any other legacy encodings). We still have a difficulty with the ampersand "&" because it has been encoded only as a symbol, assuming that for most used locales it is just used in isolation as an abbreviated form of a word. But in some locales it was still considered a letter and used everywhere "et" could be used including in abreviations like "etc." == "&c.", or in the middle of words like "caret" == "car&" or "comm&tre" == "commettre"). But the modern use of ampersand implies there's a word break before and after the symbol an we should have a separate encoding for "&" as a lowercase ligature, and we should even have an uppercase variant like the German ess-tsett, as there are glyphic variants of the ligature for uppercased titles where the modern "&" ampersand does not fit very well, or where it should be mapped to a non-ligatured "ET" letter pair, distinct from the mapping (with spaces around) to " ET " in French or to " AND " in English, as implied by the modern meaning of the current symbol as a separate word by itself. With a distinct encoding of the ligature, the common abreviation "etc." ligatured as "&c." would correctly map to uppercase "&C." with the uppercase ligature, or "ETC." without adding any space. Note that "&" was even considered in some classic European alphabets as an extra letter (with letter forms exhibiting more evidently its origin from "et"/"ET" ligatured), just like the German ess-tsett "?", or the French "?"/"?" (distinguised semantically from "oe"/"OE" letter pairs, which allow a syllable break in the middle and allow titlecasing as "Oe" : in French the titlecased common term "Oeuf" is semantically and graphically incorrect, it should be "?uf" where "?" is fully uppercase in the ligature and not mixed-cased), or the Latin "?"/"?" ligature (also used in other classic European languages) or the Dutch ligature "?"/"?". -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfabian at redhat.com Fri Nov 7 05:32:05 2014 From: mfabian at redhat.com (Mike FABIAN) Date: Fri, 07 Nov 2014 12:32:05 +0100 Subject: Question about =?utf-8?B?4oCcVXBwZXJjYXNl4oCd?= in DerivedCoreProperties.txt In-Reply-To: (Philippe Verdy's message of "Thu, 6 Nov 2014 20:46:11 +0100") References: Message-ID: Philippe Verdy ????????: > this is a "feature" of the Greek alphabet that the lowercase iota subscript > can be capitalized in two different ways : either as a subscript below the > uppercase main letter, or as a standard iota capitalized. The subscript > form is a combining character, but not the non-subscript form. Now I understand why these are titlecase letters, as Laurentiu explained: Laurentiu> All of the characters you enumerated are titlecase letters Laurentiu> (gc=Lt) rather than uppercase letters (gc=Lu), U+1F80 ? is something like ?? and could be capitalized as ?? or as ?. ? is something like ?? so I understand now that ? can be considered as titlecase (gc=Lt). Thank you very much, Phillipe and Laurentiu for explaining! I stumbled on this question because I am trying to update the character class data for glibc for Unicode 7.0.0. glibc has character classes ?upper? and ?lower? but not ?title?. Bruno Haible?s program to generate the character class data from UnicodeData.txt tries to enforce that every character which has a ?toupper? mapping *must* be in either ?upper? or ?lower?. https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/gen-unicode-ctype.c;h=0c001b299d4601a375a1e814fd2ab06b0536b337;hb=HEAD#l660 I think Bruno?s program does this because ISO C 99 (ISO/IEC 9899 - Programming languages - C) http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf contains: > 7.4.2.2 The toupper function > > [...] > > If the argument is a character for which islower is true and there are > one or more corresponding characters, as specified by the current > locale, for which isupper is true, the toupper function returns one of > the corresponding characters (always the same one for any given locale); > otherwise, the argument is returned unchanged. which seems to require that toupper should only do something for characters where islower is true. Therefore, Bruno?s program puts title case characters like U+1F88 ? or U+01C5 ? into *both*, ?upper? and ?lower?. Which does not look so unreasonable, given the limitations of C99. So it looks like because of this limitation, we have to continue using this approach because ISO C 99 requires it, we cannot use the ?Uppercase? property from DerivedCoreProperties.txt for this. But the ?Alphabetic? property from DerivedCoreProperties.txt can probably be used to generate the ?alpha? character class for glibc. I hope this is correct. -- Mike FABIAN ? Office: +49-69-365051027, internal 8875027 ????????????? From mark at macchiato.com Fri Nov 7 10:55:16 2014 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Fri, 07 Nov 2014 16:55:16 +0000 Subject: Open Source Emoji for the Web References: Message-ID: One can definitely script it; if you hadn't had compat issues it would be convenient to have the same convention. On Thu Nov 06 2014 at 11:30:09 PM Andrea Giammarchi < andrea.giammarchi at gmail.com> wrote: > Thanks Mark, > I will consider this change with CDN chaps too since that would > invalidate already a lot of cached content at the time it'll ship :-/ > > We should have paid more attention, on the other side if you need assets > locally instead of via CDN a script capable of renaming assets from current > form to your suggested one seems straight forward to me. > > Would that (sort of) work? > > Thanks > > > > On Fri, Nov 7, 2014 at 12:18 AM, Mark Davis ?? wrote: > >> Very nice. >> >> I'd have one suggestion. People appear to be converging on similar file >> names for the emoji. >> >> - Lowercase hex numbers, >> - at least 4 digits, >> - otherwise no leading zeros, >> - multiple code points separated by _, >> - with optional prefix/suffix. >> >> Like "dcm_0030_20e3.png". I'd suggest using that convention. >> >> Not a big thing, but makes it more consistent in tooling. >> >> >> Mark >> >> *? Il meglio ? l?inimico del bene ?* >> >> On Thu, Nov 6, 2014 at 3:27 PM, Andrea Giammarchi < >> andrea.giammarchi at gmail.com> wrote: >> >>> I'd like to thank those that helped me a while ago figuring out variants >>> and emoji behavior. >>> >>> Today we are open sourcing a relatively small JS library and 800+ CDN >>> based assets able to bring unified emoji in every WebView capable device >>> and browser. >>> >>> We are also planning to implement the recently introduced "diversity" >>> for the Unicode 8 draft as soon as we'll figure out a good approach for it >>> ( and btw, the default fallback is great! ) >>> >>> This effort and collaboration is between Twitter [1], MaxCDN [2], and >>> Wordpress [3]. >>> >>> Any comment or suggestion will be more than welcome and appreciated. >>> >>> Thanks again and Best Regards >>> >>> [1] >>> https://blog.twitter.com/2014/open-sourcing-twitter-emoji-for-everyone >>> [2] https://www.maxcdn.com/blog/emojis-ftw/ >>> [3] http://en.blog.wordpress.com/2014/11/06/emoji-everywhere/ >>> >>> >>> _______________________________________________ >>> Unicode mailing list >>> Unicode at unicode.org >>> http://unicode.org/mailman/listinfo/unicode >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From karl-pentzlin at acssoft.de Fri Nov 7 16:52:58 2014 From: karl-pentzlin at acssoft.de (Karl Pentzlin) Date: Fri, 7 Nov 2014 23:52:58 +0100 Subject: Emoji skin tone modifiers on the website of a leading German daily newspaper Message-ID: <135798969.20141107235258@acssoft.de> FYI: On 2014-11-05, a report on Emoji skin tone modifiers was published on the website of the "Frankfurter Allgemeine", a leading German daily newspaper: http://www.faz.net/aktuell/gesellschaft/emoticons-smileys-bald-in-fuenf-hautfarben-13249783.html - Karl Pentzlin From gwalla at gmail.com Fri Nov 7 16:39:58 2014 From: gwalla at gmail.com (Garth Wallace) Date: Fri, 7 Nov 2014 14:39:58 -0800 Subject: Terms for rotations Message-ID: Hello, I'm currently working towards a proposal to encode a set of symbols used in fairy chess and chess variants, and I have a question about naming conventions. Several of the symbols are rotations of already encoded symbols. According to the FAQ, "turned" is preferred for 180? rotations, and "rotated" for 90? rotations, but "rotated" is mabiguous as to whether the rotation is clockwise or counterclockwise, and in this set the two directions are treated as semantically distinctive. "Clockwise" and "antilockwise" appear in the code charts to be used only for curved arrows. Arrows seem to make use of "upwards", "leftwards", "rightwards", and "downwards", but these symbols are not arrows and do not really have a *direction*, just an orientation. It's even more unclear when it comes to intermediate rotations in 45? increments (I'm not sure if I will include these in any proposal; I'm still doing research, gathering evidence of use and determining my scope). Arrows seem to use ordinal compass directions (e.g. NORTH EAST ARROW), but again these are not arrows. The names FAQ is silent on this. I'm leaning towards "turned", "left rotated", and "right rotated" for the cardinal orientations, and have no idea what (if anything) to do about intermediate ones. Are there any more or less official preferences? From ken.whistler at sap.com Fri Nov 7 17:26:25 2014 From: ken.whistler at sap.com (Whistler, Ken) Date: Fri, 7 Nov 2014 23:26:25 +0000 Subject: Terms for rotations In-Reply-To: References: Message-ID: Garth Wallace asked: > I'm currently working towards a proposal to encode a set of symbols > used in fairy chess and chess variants, and I have a question about > naming conventions. Several of the symbols are rotations of already > encoded symbols. ... > > It's even more unclear when it comes to intermediate rotations in 45? > increments (I'm not sure if I will include these in any proposal; I'm > still doing research, gathering evidence of use and determining my > scope). Arrows seem to use ordinal compass directions (e.g. NORTH EAST > ARROW), but again these are not arrows. The names FAQ is silent on > this. > > I'm leaning towards "turned", "left rotated", and "right rotated" for > the cardinal orientations, and have no idea what (if anything) to do > about intermediate ones. Are there any more or less official > preferences? When you start talking about sets of symbols rotated into 8 orientations each, doubled again by chirality, then you really are into the realm of a notational system -- perhaps not best handled for encoding by simply separately encoding each symbolic unit in each possible visual orientation. I suggest first taking a look at what was done for analyzing a similar problem of rotation of symbols for the SignWriting notation system. See: http://www.unicode.org/L2/L2012/12321-n4342-signwriting.pdf That should give you some ideas about possible alternative approaches for the material you are dealing with. --Ken From prosfilaes at gmail.com Fri Nov 7 18:14:26 2014 From: prosfilaes at gmail.com (David Starner) Date: Fri, 7 Nov 2014 16:14:26 -0800 Subject: Terms for rotations In-Reply-To: References: Message-ID: I don't think sign writing is the best analogy. Fairy chess starts with the basic set of six chess symbols, like a lot of linguists start with the 26 basic Latin characters. Likewise, because fairy chess has a smaller printing budget then even linguistics, instead of creating new characters, old ones are rotated and flipped. It's not systematic from what I've seen; a pawn rotated left means that we have a pawn like piece and we've already used the pawn flipped. A fairy symbol would be awesome, but that would take hiring some capable of drawing an abstract fairy consistent with the rest of the notation. So they rotate and turn even when it's not helping clarity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From public at khwilliamson.com Sat Nov 8 00:10:26 2014 From: public at khwilliamson.com (Karl Williamson) Date: Fri, 07 Nov 2014 23:10:26 -0700 Subject: New Unicode Emoji draft, available for review In-Reply-To: <545A9B11.4000003@unicode.org> References: <545A9B11.4000003@unicode.org> Message-ID: <545DB3D2.6040704@khwilliamson.com> On 11/05/2014 02:48 PM, Rick McGowan wrote: > FYI, Posting this on behalf of Mark Davis... Something in his original > reply message is apparently toxic to our mail gateway that it can't get > through. (Investigating.) > > May be the literal U+1F4A9, which I have (I'm sorry) redacted below. > > Rick The first icon was not U+1F4A9, but U+1F60F SMIRKING FACE. Remarkably, Rick's message seems to me to indicate that some emoji encoded in Unicode are considered by some servers to be obscene! I never considered the possibility of an obscene code point before. FWIW, my respondent, hopefully satirically, mentioned this as a basis for encoding further modifier characters, suitable for 1F4A9: https://en.wikipedia.org/wiki/Bristol_stool_scale > ------------ > > > Could be either one [U+1F4A9] > > > > The exact contents of minimal and optional characters is something > that we > > want to get feedback on. But I don't think [U+1F4A9] is in the running! > > > > BTW, I'm seeing about 250 new news articles on this, per hour (in > English). > > https://www.google.com/search?q=emoji+unicode&tbm=nws&tbs=qdr:h > > > > Plus a scattering of others, s.a. > > > http://www.spiegel.de/netzwelt/web/unicode-consortium-emojis-demnaechst-fuer-alle-hautfarben-a-1001125.html > > > > > > > > > > > _______________________________________________ > Unicode mailing list > Unicode at unicode.org > http://unicode.org/mailman/listinfo/unicode > From mfabian at redhat.com Sat Nov 8 03:22:10 2014 From: mfabian at redhat.com (Mike FABIAN) Date: Sat, 08 Nov 2014 10:22:10 +0100 Subject: Question about =?utf-8?B?4oCcVXBwZXJjYXNl4oCd?= in DerivedCoreProperties.txt In-Reply-To: (Philippe Verdy's message of "Fri, 7 Nov 2014 14:57:37 +0100") References: Message-ID: Philippe Verdy ????????: > note that tolower() and toupper() can only work one 1-character level, it > is not recommended for use for changing case of plain text. > > For correct handling of locales, to upper and toupper should be replaced by > strtolower and strtoupper (or their aliases) which will be able to process > character clusters and contextual casing rules needed for a language or > orthographic style Yes, thank you for explaining this. But these details of upper and lower casing cannot be expressed in the ?i18n? file of glibc: https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/locales/i18n For toupper and tolower, this file just has character -> character mapping tables, for example the ?tolower? table contains only (,) (i.e. mapping ? U+03A3 -> ? U+03C3, never to the final sigma ? U+03C2). More correct, detailed information about upper and lower case must come from elsewhere, not from this ?i18n? file in glibc. Using only the information from this ?i18n? file, not even the Greek sigma can be handled correctly. Pravin and me want to update this ?i18n? file to the latest data from Unicode 7.0.0, doing it as correct as possible within the limitations caused by this file and the ISO C standard. -- Mike FABIAN ? Office: +49-69-365051027, internal 8875027 ????????????? From mark at macchiato.com Sat Nov 8 14:14:16 2014 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Sat, 8 Nov 2014 12:14:16 -0800 Subject: Emoji skin tone modifiers on the website of a leading German daily newspaper In-Reply-To: <135798969.20141107235258@acssoft.de> References: <135798969.20141107235258@acssoft.de> Message-ID: As far as I can tell it is garnering interest all over.. Several German publications, including Spiegel, to French and Italian regional papers, to Indonesian, Vietnamese.... http://www.spiegel.de/netzwelt/web/unicode-consortium-emojis-demnaechst-fuer-alle-hautfarben-a-1001125.html http://m.baohay.vn/chuyen-de/cong-nghe/961227/Bieu-tuong-Emoji-se-co-mau-da-thay-doi.html {phone} On Nov 8, 2014 12:04 AM, "Karl Pentzlin" wrote: > FYI: On 2014-11-05, a report on Emoji skin tone modifiers was published on > the website of the "Frankfurter Allgemeine", a leading German daily > newspaper: > > http://www.faz.net/aktuell/gesellschaft/emoticons-smileys-bald-in-fuenf-hautfarben-13249783.html > - Karl Pentzlin > > _______________________________________________ > Unicode mailing list > Unicode at unicode.org > http://unicode.org/mailman/listinfo/unicode > -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Sat Nov 8 17:50:07 2014 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sun, 9 Nov 2014 00:50:07 +0100 Subject: =?UTF-8?Q?Re=3A_Question_about_=E2=80=9CUppercase=E2=80=9D_in_DerivedCorePro?= =?UTF-8?Q?perties=2Etxt?= In-Reply-To: References: Message-ID: Do not try to get consisant results with only a character to character mapping, it does not work with all letters, because sometimes you need 1->2 or 2->1 mappings (not all composable characters exist in precombined forms, or sometimes the combination must be split into its canonical decomposed equivalent prior to map the base character) or other mappings. toupper() and tolower() should not be used for something else than just mapping number-like sequences (e.g. to convert hexadecimal numbers). Use strupper() and strlower() (or equivalent functions not alocating memory but writing to a given buffer or stream, and similiar functions to other languages than C/C++) to perform mappings on full strings so that the string length can safely change. - this is needed for example to convert city names or people names to capitals in a postal address, or to style a book title or chapter heading). - it is needed as well to perform case insensitive searches (using "case folding", which is different from converting to lowercase or to uppercase) to match input, or to implement some input completion UI to locate possible matches within a known dictionnary or input history. 2014-11-08 10:22 GMT+01:00 Mike FABIAN : > Philippe Verdy ????????: > > > note that tolower() and toupper() can only work one 1-character level, it > > is not recommended for use for changing case of plain text. > > > > For correct handling of locales, to upper and toupper should be replaced > by > > strtolower and strtoupper (or their aliases) which will be able to > process > > character clusters and contextual casing rules needed for a language or > > orthographic style > > Yes, thank you for explaining this. > > But these details of upper and lower casing cannot be expressed in the > ?i18n? file of glibc: > > https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/locales/i18n > > For toupper and tolower, this file just has character -> character > mapping tables, for example the ?tolower? table contains only > > (,) > > (i.e. mapping ? U+03A3 -> ? U+03C3, never to the final sigma ? > U+03C2). > > More correct, detailed information about upper and lower case must come > from elsewhere, not from this ?i18n? file in glibc. Using only the > information from this ?i18n? file, not even the Greek sigma can be > handled correctly. > > Pravin and me want to update this ?i18n? file to the latest > data from Unicode 7.0.0, doing it as correct as possible within > the limitations caused by this file and the ISO C standard. > > -- > Mike FABIAN > ? Office: +49-69-365051027, internal 8875027 > ????????????? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjsvance at gmail.com Sat Nov 8 18:45:38 2014 From: cjsvance at gmail.com (Christopher Vance) Date: Sun, 9 Nov 2014 11:45:38 +1100 Subject: =?UTF-8?Q?Re=3A_Question_about_=E2=80=9CUppercase=E2=80=9D_in_DerivedCorePro?= =?UTF-8?Q?perties=2Etxt?= In-Reply-To: References: Message-ID: So glibc is broken. This doesn't make it a Unicode problem. On Sat, Nov 8, 2014 at 8:22 PM, Mike FABIAN wrote: > Philippe Verdy ????????: > > > note that tolower() and toupper() can only work one 1-character level, it > > is not recommended for use for changing case of plain text. > > > > For correct handling of locales, to upper and toupper should be replaced > by > > strtolower and strtoupper (or their aliases) which will be able to > process > > character clusters and contextual casing rules needed for a language or > > orthographic style > > Yes, thank you for explaining this. > > But these details of upper and lower casing cannot be expressed in the > ?i18n? file of glibc: > > https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/locales/i18n > > For toupper and tolower, this file just has character -> character > mapping tables, for example the ?tolower? table contains only > > (,) > > (i.e. mapping ? U+03A3 -> ? U+03C3, never to the final sigma ? > U+03C2). > > More correct, detailed information about upper and lower case must come > from elsewhere, not from this ?i18n? file in glibc. Using only the > information from this ?i18n? file, not even the Greek sigma can be > handled correctly. > > Pravin and me want to update this ?i18n? file to the latest > data from Unicode 7.0.0, doing it as correct as possible within > the limitations caused by this file and the ISO C standard. > > -- > Mike FABIAN > ? Office: +49-69-365051027, internal 8875027 > ????????????? > _______________________________________________ > Unicode mailing list > Unicode at unicode.org > http://unicode.org/mailman/listinfo/unicode > -- Christopher Vance -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Sun Nov 9 00:19:24 2014 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sun, 9 Nov 2014 07:19:24 +0100 Subject: =?UTF-8?Q?Re=3A_Question_about_=E2=80=9CUppercase=E2=80=9D_in_DerivedCorePro?= =?UTF-8?Q?perties=2Etxt?= In-Reply-To: References: Message-ID: glibc is not more borken and any other C library implementing toupper and tolower from the legacy "ctype" standard library. These are old APIs that are just widely used and still have valid contexts were they are simple and safe to use. But they are not meant to convert text. The i18n data just shows the mappings used for tolower, toupper (and totile) but it is clearly not enough to implement strtolower and strtoupper which require more rules (notably 1 to 2 or 2 to 1 mappings, plus support for normalisation/composition/decomposition and recognizing canonical equivalents, in all possible reorderings, and more data for contextual rules such as the final form of sigma). Such data may be be easily expressible in some cases with such tabular format, and could be implemented by locale-specific code, for example to handle some dictionary lookups (as required with some Asian scripts for word breaking, and implicilty needed for the Korean script whose normalisation is not handle by table lookups but algorithmically by code only within the normalizer) I don't see anything wrong with existing glibc "18n" data. Glibc would be wrong however if it *only* used tolower/toupper to implement strtolower/strtoupper (but this was what was still done in the past since the creation of the "standard" C library on Unix and even later on DOS, MacOS, Windows and most other systems... before the creation of Unicode and its development to support more languages, scripts, and orthographic systems.) Modern i18n libraries (for various programming languages) contain more advanced support API for correct case mappings on full strings (including M-to-N mappings, contextual rules and support of canonical equivalences), and these API no longer assume that the output string will be the same length as the input and only 1:1 mappings will be performed over each character (even if this is still what is done when using the "C" root locale working only for a few languages and only with simple texts using restricted alphabets without all the possible Unicode extensions, needed now to support more than the native language but also many proper names and "foreign" toponyms, or texts containing small citations in another language, or any multilingual document). 2014-11-09 1:45 GMT+01:00 Christopher Vance : > So glibc is broken. This doesn't make it a Unicode problem. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Sun Nov 9 10:08:37 2014 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sun, 9 Nov 2014 17:08:37 +0100 Subject: Emoji skin tone modifiers on the website of a leading German daily newspaper In-Reply-To: References: <135798969.20141107235258@acssoft.de> Message-ID: In French: - Huffington Post (online news): http://www.huffingtonpost.fr/2014/11/05/emoji-couleur-peau-smiley-telephone-apple-unicode_n_6105394.html - RFi (Radio France International): http://www.rfi.fr/technologies/20141106-emoji-diversite-messenger-facebook-sms-emoticone-sticker/ - Le Figaro (news magazine): http://www.lefigaro.fr/secteur/high-tech/2014/11/06/01007-20141106ARTFIG00007-de-nouvelles-couleurs-de-peau-pour-les-emojis-de-vos-smartphones.php - i>T?l? (news TV channel) http://www.itele.fr/culture/video/pour-favoriser-la-diversite-de-nouvelles-emoticones-arriveraient-en-2015-99688 - Metro News (free daily newspaper) http://www.metronews.fr/high-tech/astuce-geek-activez-les-emoticones-sur-votre-iphone-ou-ipad/mnjn!eYmG96FYNz08g/ ... 2014-11-08 21:14 GMT+01:00 Mark Davis ?? : > As far as I can tell it is garnering interest all over.. Several German > publications, including Spiegel, to French and Italian regional papers, to > Indonesian, Vietnamese.... > > > http://www.spiegel.de/netzwelt/web/unicode-consortium-emojis-demnaechst-fuer-alle-hautfarben-a-1001125.html > > > http://m.baohay.vn/chuyen-de/cong-nghe/961227/Bieu-tuong-Emoji-se-co-mau-da-thay-doi.html > > {phone} > On Nov 8, 2014 12:04 AM, "Karl Pentzlin" wrote: > >> FYI: On 2014-11-05, a report on Emoji skin tone modifiers was published on >> the website of the "Frankfurter Allgemeine", a leading German daily >> newspaper: >> >> http://www.faz.net/aktuell/gesellschaft/emoticons-smileys-bald-in-fuenf-hautfarben-13249783.html >> - Karl Pentzlin >> >> _______________________________________________ >> Unicode mailing list >> Unicode at unicode.org >> http://unicode.org/mailman/listinfo/unicode >> > > _______________________________________________ > Unicode mailing list > Unicode at unicode.org > http://unicode.org/mailman/listinfo/unicode > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jf at colson.eu Sun Nov 9 13:28:13 2014 From: jf at colson.eu (=?UTF-8?B?SmVhbi1GcmFuw6dvaXMgQ29sc29u?=) Date: Sun, 09 Nov 2014 20:28:13 +0100 Subject: Terms for rotations In-Reply-To: References: Message-ID: <545FC04D.3030605@colson.eu> Le 08/11/14 00:26, Whistler, Ken a ?crit : > Garth Wallace asked: > >> I'm currently working towards a proposal to encode a set of symbols >> used in fairy chess and chess variants, and I have a question about >> naming conventions. Several of the symbols are rotations of already >> encoded symbols. ... >> >> It's even more unclear when it comes to intermediate rotations in 45? >> increments (I'm not sure if I will include these in any proposal; I'm >> still doing research, gathering evidence of use and determining my >> scope). Arrows seem to use ordinal compass directions (e.g. NORTH EAST >> ARROW), but again these are not arrows. The names FAQ is silent on >> this. >> >> I'm leaning towards "turned", "left rotated", and "right rotated" for >> the cardinal orientations, and have no idea what (if anything) to do >> about intermediate ones. Are there any more or less official >> preferences? > When you start talking about sets of symbols rotated into 8 > orientations each, doubled again by chirality, then you really > are into the realm of a notational system -- perhaps not best > handled for encoding by simply separately encoding each symbolic > unit in each possible visual orientation. > > I suggest first taking a look at what was done for analyzing a similar > problem of rotation of symbols for the SignWriting notation > system. See: > > http://www.unicode.org/L2/L2012/12321-n4342-signwriting.pdf > > That should give you some ideas about possible alternative approaches > for the material you are dealing with. > > --Ken Could the characters SWR2 to SWR8 be applied to chess symbols or should new rotation modifiers be created for them? From sdaoden at yandex.com Mon Nov 10 06:41:30 2014 From: sdaoden at yandex.com (Steffen Nurpmeso) Date: Mon, 10 Nov 2014 13:41:30 +0100 Subject: Question about =?UTF-8?Q?=E2=80=9CUppercase=E2=80=9D?= in DerivedCoreProperties.txt In-Reply-To: References: Message-ID: <20141110124130.JxGJfu38%sdaoden@yandex.com> Philippe Verdy wrote: |glibc is not more borken and any other C library implementing toupper and |tolower from the legacy "ctype" standard library. These are old APIs that |are just widely used and still have valid contexts were they are simple and |safe to use. But they are not meant to convert text. Hah! Legacy is good.. I'd wish a usable successor were already standardized by ISO C. --steffen From doug at ewellic.org Mon Nov 10 10:07:50 2014 From: doug at ewellic.org (Doug Ewell) Date: Mon, 10 Nov 2014 09:07:50 -0700 Subject: Question about "Uppercase" in DerivedCoreProperties.txt Message-ID: <20141110090750.665a7a7059d7ee80bb4d670165c8327d.3cf158c8db.wbe@email03.secureserver.net> Philippe Verdy wrote: > glibc is not more borken and any other C library implementing toupper > and tolower from the legacy "ctype" standard library. These are old > APIs that are just widely used and still have valid contexts were they > are simple and safe to use. But they are not meant to convert text. Well, of course they are *meant* to convert text. They're just not very good at it. -- Doug Ewell | Thornton, CO, USA | http://ewellic.org From verdy_p at wanadoo.fr Mon Nov 10 12:10:08 2014 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Mon, 10 Nov 2014 19:10:08 +0100 Subject: =?UTF-8?Q?Re=3A_Question_about_=E2=80=9CUppercase=E2=80=9D_in_DerivedCorePro?= =?UTF-8?Q?perties=2Etxt?= In-Reply-To: <20141110124130.JxGJfu38%sdaoden@yandex.com> References: <20141110124130.JxGJfu38%sdaoden@yandex.com> Message-ID: Successors to convert strings instead of just isolated "characters" (sorry, they are NOT what we need to handle "texts", they are not even equivalent to Unicode characters, they are just code units, most often 8-bit with "char" or 16-bit only with "wchar_t" !) already exist in all C libraries (including glibc), under different names unfortunately (this is the main cause why there are complex header files trying to find the appropriate name, and providing a "default" basic implementation that just scans individual characters to filter them with tolower and toupper: this is a bad practice, Good libraries should all contain a safe implementation of case conversion of strings, and softwares should use them (and not reinvent this old bad trick, just because this works with basic English). 2014-11-10 13:41 GMT+01:00 Steffen Nurpmeso : > Philippe Verdy wrote: > |glibc is not more borken and any other C library implementing toupper and > |tolower from the legacy "ctype" standard library. These are old APIs that > |are just widely used and still have valid contexts were they are simple > and > |safe to use. But they are not meant to convert text. > > Hah! Legacy is good.. I'd wish a usable successor were already > standardized by ISO C. > > --steffen > _______________________________________________ > Unicode mailing list > Unicode at unicode.org > http://unicode.org/mailman/listinfo/unicode > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ken.whistler at sap.com Mon Nov 10 12:30:37 2014 From: ken.whistler at sap.com (Whistler, Ken) Date: Mon, 10 Nov 2014 18:30:37 +0000 Subject: Terms for rotations In-Reply-To: <545FC04D.3030605@colson.eu> References: <545FC04D.3030605@colson.eu> Message-ID: > > http://www.unicode.org/L2/L2012/12321-n4342-signwriting.pdf > > > > That should give you some ideas about possible alternative approaches > > for the material you are dealing with. > > > > --Ken > > Could the characters SWR2 to SWR8 be applied to chess symbols or should > new rotation modifiers be created for them? They aren't currently defined to do so -- and there is certainly a danger in opening up the applicability to other sets of symbols, as that would start down the road of trying to turn them into generic rotation mechanisms. I'm inclined here to trust Garth's assessment that for a relatively small set of rotated chess symbols, the simpler solution is just to enumerate the set of rotated forms needed for these odd chess notations as unitary symbols. I just wanted to make sure that the precedent for SignWriting was part of the consideration, given the fact that a notation involving rotation of symbols was the topic. --Ken From sdaoden at yandex.com Mon Nov 10 12:55:14 2014 From: sdaoden at yandex.com (Steffen Nurpmeso) Date: Mon, 10 Nov 2014 19:55:14 +0100 Subject: Question about =?UTF-8?Q?=E2=80=9CUppercase=E2=80=9D?= in DerivedCoreProperties.txt In-Reply-To: References: <20141110124130.JxGJfu38%sdaoden@yandex.com> Message-ID: <20141110185514.ZLVV9a5A%sdaoden@yandex.com> Philippe Verdy wrote: |Successors to convert strings instead of just isolated "characters" (sorry, |they are NOT what we need to handle "texts", they are not even equivalent |to Unicode characters, they are just code units, most often 8-bit with |"char" or 16-bit only with "wchar_t" !) already exist in all C libraries |(including glibc), under different names unfortunately (this is the main |cause why there are complex header files trying to find the appropriate |name, and providing a "default" basic implementation that just scans |individual characters to filter them with tolower and toupper: this is a |bad practice, glibc is the _only_ standard C library i know of that supports its own homebrew functionality regarding the issue (and in a way that i personally don't want to and will never work with). Even the newest ISO C doesn't give just any hand, so that no ISO C programmer can expect to use any standard facility before 2020, if that is the time, and then operating systems have to adhere to that standard, and then programmers have to be convinced to use those functions. Until then different solutions will have to be used. --steffen From verdy_p at wanadoo.fr Mon Nov 10 13:08:40 2014 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Mon, 10 Nov 2014 20:08:40 +0100 Subject: =?UTF-8?Q?Re=3A_Question_about_=E2=80=9CUppercase=E2=80=9D_in_DerivedCorePro?= =?UTF-8?Q?perties=2Etxt?= In-Reply-To: <20141110185514.ZLVV9a5A%sdaoden@yandex.com> References: <20141110124130.JxGJfu38%sdaoden@yandex.com> <20141110185514.ZLVV9a5A%sdaoden@yandex.com> Message-ID: The equivalent of strtolower() and strtoupper() is implemented in all C libraries I know (yes, including glibc) and I have worked with on various OSes (and since very long!), even if their names change (because of the unfortunate lack of standardization about their interaction with C locales). The standardisation of these two functions should have already been made since very long, even if the locales support could be limited to the legacy basic C locale with limited functionality, where these functions would just scan characters through strings to convert them with toupper() and to lower(). But then glibc and other libraries wiould have implemented this standard. For now, we still need complex "config" scripts to detect the correct headers to include, or to provide a basic implementation via various macros. The standard C++ "string" package could have then used this standard internally in the methods exposed in its API. I cannot understand this simple effort was never done on such basic functionality needed and used in almost all softwares and OSes. 2014-11-10 19:55 GMT+01:00 Steffen Nurpmeso : > Philippe Verdy wrote: > |Successors to convert strings instead of just isolated "characters" > (sorry, > |they are NOT what we need to handle "texts", they are not even equivalent > |to Unicode characters, they are just code units, most often 8-bit with > |"char" or 16-bit only with "wchar_t" !) already exist in all C libraries > |(including glibc), under different names unfortunately (this is the main > |cause why there are complex header files trying to find the appropriate > |name, and providing a "default" basic implementation that just scans > |individual characters to filter them with tolower and toupper: this is a > |bad practice, > > glibc is the _only_ standard C library i know of that supports its > own homebrew functionality regarding the issue (and in a way that > i personally don't want to and will never work with). > Even the newest ISO C doesn't give just any hand, so that no ISO C > programmer can expect to use any standard facility before 2020, if > that is the time, and then operating systems have to adhere to > that standard, and then programmers have to be convinced to use > those functions. > Until then different solutions will have to be used. > > --steffen > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdaoden at yandex.com Mon Nov 10 14:19:14 2014 From: sdaoden at yandex.com (Steffen Nurpmeso) Date: Mon, 10 Nov 2014 21:19:14 +0100 Subject: Question about =?UTF-8?Q?=E2=80=9CUppercase=E2=80=9D?= in DerivedCoreProperties.txt In-Reply-To: References: <20141110124130.JxGJfu38%sdaoden@yandex.com> <20141110185514.ZLVV9a5A%sdaoden@yandex.com> Message-ID: <20141110201914.gXuCfm8i%sdaoden@yandex.com> Philippe Verdy wrote: |The standard C++ "string" package could have then used this standard |internally in the methods exposed in its API. I cannot understand this |simple effort was never done on such basic functionality needed and used in |almost all softwares and OSes. There are plenty of other things one can bang his head on as necessary, _that_ is for sure. Even overwhelmingly, the pessimistic may say. --steffen From nospam-abuse at ilyaz.org Mon Nov 10 15:36:54 2014 From: nospam-abuse at ilyaz.org (Ilya Zakharevich) Date: Mon, 10 Nov 2014 13:36:54 -0800 Subject: Terms for rotations In-Reply-To: References: Message-ID: <20141110213654.GA20741@math.berkeley.edu> On Fri, Nov 07, 2014 at 02:39:58PM -0800, Garth Wallace wrote: > I'm leaning towards "turned", "left rotated", and "right rotated" for > the cardinal orientations, ? Please keep in mind that left/right are especially bad terms to describe rotations. When you rotate the character cell about its center, some parts move to the right, some parts move to the left???both when the rotation is clockwise and counterclockwise. Which of the words left/right LOOKS better suited to describe a particular rotation depends on whether the top or the bottom OF WHAT YOU ROTATE is more ?visually important?. (We saw it many times when discussing the math of the rotations with small kids.) Try to rotate ? left ;-]. (I believe that people associate left ? counterclockwise etc only because for many shapes, visually, the bottom is just a pedestal for the top. So you ?grab? the shape ?on top?.] Hope this helps, Ilya From nospam-abuse at ilyaz.org Mon Nov 10 16:01:09 2014 From: nospam-abuse at ilyaz.org (Ilya Zakharevich) Date: Mon, 10 Nov 2014 14:01:09 -0800 Subject: Terms for rotations In-Reply-To: References: <545FC04D.3030605@colson.eu> Message-ID: <20141110220109.GB20741@math.berkeley.edu> On Mon, Nov 10, 2014 at 06:30:37PM +0000, Whistler, Ken wrote: > > Could the characters SWR2 to SWR8 be applied to chess symbols or should > > new rotation modifiers be created for them? > > They aren't currently defined to do so -- and there is certainly a danger in > opening up the applicability to other sets of symbols, as that would start > down the road of trying to turn them into generic rotation mechanisms. > > I'm inclined here to trust Garth's assessment that for a relatively small > set of rotated chess symbols, the simpler solution is just to enumerate > the set of rotated forms needed for these odd chess notations as > unitary symbols. I just wanted to make sure that the precedent for > SignWriting was part of the consideration, given the fact that > a notation involving rotation of symbols was the topic. Sorry, I had no time (and no clear way to express things) when what I?m going to write could have been more relevant???anyway, this is about SignWriting. I consider the precedent of SignWriting as an especially bad model to become a base for other encodings of extensive collections???and not since it uses ?many mechanisms?, but FEW mechanisms. I think that the same functionality could have been implemented using a tiny handful of new characters???while making the encoded SignWriting text readable EVEN WITHOUT SPECIAL FONTS and/or shaping engines. See, for example, the Mr Potato Head font http://www.unicode.org/mail-arch/unicode-ml/y2014-m09/0003.html ; using the same principles, one could encode most (all?) of the hand symbols as SignWriting FACE STARTER CHARACTER + upper/lower-script modifiers For example, hand with fingers 1,2 extended, 3,4 crossed, and 5 bent halfway could have been encoded as ???????? A specialized font would show the needed glyph. Without a specialized font, one could see a representation which allows one able to visualize the shape???and, at least, see a certain distinctive rendition. As far as I checked (about 60% into the SignWriting proposal) this approach would enable all of SignWriting functionality with about 10 base characters needed. As for rotation modifiers, we already have 24 (?) clock face symbols???and they allow granularity of 15? when specifying the rotation. Yous, Ilya From nospam-abuse at ilyaz.org Mon Nov 10 16:16:43 2014 From: nospam-abuse at ilyaz.org (Ilya Zakharevich) Date: Mon, 10 Nov 2014 14:16:43 -0800 Subject: Rotations, SignWriting, and Mr Potato Head In-Reply-To: <20141110220109.GB20741@math.berkeley.edu> References: <545FC04D.3030605@colson.eu> <20141110220109.GB20741@math.berkeley.edu> Message-ID: <20141110221643.GA21344@math.berkeley.edu> Oups, I forgot to update the subject, AND made a misprint On Mon, Nov 10, 2014 at 02:01:09PM -0800, I wrote: > See, for example, the Mr Potato Head font > http://www.unicode.org/mail-arch/unicode-ml/y2014-m09/0003.html > ; using the same principles, one could encode most (all?) of the hand > symbols as > SignWriting FACE STARTER CHARACTER + upper/lower-script modifiers > For example, hand with fingers 1,2 extended, 3,4 crossed, and 5 bent > halfway could have been encoded as > ???????? Of course, it should have been SignWriting HAND STARTER CHARACTER + upper/lower-script modifiers ^^^^ (but the same holds for SignWriting faces etc). And ? is supposed to be just a placeholder glyph for the HAND STARTER. Sorry, Ilya From jf at colson.eu Mon Nov 10 17:43:05 2014 From: jf at colson.eu (=?UTF-8?B?SmVhbi1GcmFuw6dvaXMgQ29sc29u?=) Date: Tue, 11 Nov 2014 00:43:05 +0100 Subject: Terms for rotations In-Reply-To: <20141110213654.GA20741@math.berkeley.edu> References: <20141110213654.GA20741@math.berkeley.edu> Message-ID: <54614D89.9060300@colson.eu> Le 10/11/14 22:36, Ilya Zakharevich a ?crit : > On Fri, Nov 07, 2014 at 02:39:58PM -0800, Garth Wallace wrote: >> I'm leaning towards "turned", "left rotated", and "right rotated" for >> the cardinal orientations, > ? > > Please keep in mind that left/right are especially bad terms to > describe rotations. When you rotate the character cell about its > center, some parts move to the right, some parts move to the > left???both when the rotation is clockwise and counterclockwise. > > Which of the words left/right LOOKS better suited to describe a > particular rotation depends on whether the top or the bottom OF WHAT > YOU ROTATE is more ?visually important?. (We saw it many times when > discussing the math of the rotations with small kids.) Try to rotate > ? left ;-]. > > (I believe that people associate left ? counterclockwise etc only > because for many shapes, visually, the bottom is just a pedestal > for the top. So you ?grab? the shape ?on top?.] Look at this picture: http://www.permisecole.com/code-route/priorites/faux-carrefour-a-sens-giratoire.jpg Imagine you sit in this car and you want to turn RIGHT. What will you do? Will you turn the driving wheel clockwise or counterclockwise? From jf at colson.eu Mon Nov 10 17:53:35 2014 From: jf at colson.eu (=?UTF-8?B?SmVhbi1GcmFuw6dvaXMgQ29sc29u?=) Date: Tue, 11 Nov 2014 00:53:35 +0100 Subject: Terms for rotations In-Reply-To: <54614D89.9060300@colson.eu> References: <20141110213654.GA20741@math.berkeley.edu> <54614D89.9060300@colson.eu> Message-ID: <54614FFF.1030505@colson.eu> Le 11/11/14 00:43, Jean-Fran?ois Colson a ?crit : > > Le 10/11/14 22:36, Ilya Zakharevich a ?crit : >> On Fri, Nov 07, 2014 at 02:39:58PM -0800, Garth Wallace wrote: >>> I'm leaning towards "turned", "left rotated", and "right rotated" for >>> the cardinal orientations, >> ? >> >> Please keep in mind that left/right are especially bad terms to >> describe rotations. When you rotate the character cell about its >> center, some parts move to the right, some parts move to the >> left???both when the rotation is clockwise and counterclockwise. >> >> Which of the words left/right LOOKS better suited to describe a >> particular rotation depends on whether the top or the bottom OF WHAT >> YOU ROTATE is more ?visually important?. (We saw it many times when >> discussing the math of the rotations with small kids.) Try to rotate >> ? left ;-]. >> >> (I believe that people associate left ? counterclockwise etc only >> because for many shapes, visually, the bottom is just a pedestal >> for the top. So you ?grab? the shape ?on top?.] > > Look at this picture: > http://www.permisecole.com/code-route/priorites/faux-carrefour-a-sens-giratoire.jpg > Imagine you sit in this car and you want to turn RIGHT. What will you > do? Will you turn the driving wheel I meant ?steering wheel?? > clockwise or counterclockwise? > > _______________________________________________ > Unicode mailing list > Unicode at unicode.org > http://unicode.org/mailman/listinfo/unicode From ken.whistler at sap.com Mon Nov 10 18:12:26 2014 From: ken.whistler at sap.com (Whistler, Ken) Date: Tue, 11 Nov 2014 00:12:26 +0000 Subject: Terms for rotations In-Reply-To: <54614D89.9060300@colson.eu> References: <20141110213654.GA20741@math.berkeley.edu> <54614D89.9060300@colson.eu> Message-ID: > Look at this picture: > http://www.permisecole.com/code-route/priorites/faux-carrefour-a-sens-giratoire.jpg > Imagine you sit in this car and you want to turn RIGHT. What will you > do? Will you turn the driving wheel clockwise or counterclockwise? And now imagine that you are motoring in a 1904 Cyklonette. Which way would you move the tiller? ;-) Seriously, I think that Ilya's point is well-taken. Although in English there is a strong association of the phrase "turn to the right" with clockwise motion for control devices which rotate, if you take the phrase out of that mechanical context and just talk about the orientation of pictures on paper, there can be some ambiguity based on the conceptual confusion with the concept of "turning to[wards] facing the right", which can mean something very different for symbols which seem to have built-in directions, like arrows. --Ken From nospam-abuse at ilyaz.org Mon Nov 10 18:14:01 2014 From: nospam-abuse at ilyaz.org (Ilya Zakharevich) Date: Mon, 10 Nov 2014 16:14:01 -0800 Subject: Terms for rotations In-Reply-To: <54614D89.9060300@colson.eu> References: <20141110213654.GA20741@math.berkeley.edu> <54614D89.9060300@colson.eu> Message-ID: <20141111001401.GA22161@math.berkeley.edu> On Tue, Nov 11, 2014 at 12:43:05AM +0100, Jean-Fran?ois Colson wrote: > > (I believe that people associate left ? counterclockwise etc only > > because for many shapes, visually, the bottom is just a pedestal > > for the top. So you ?grab? the shape ?on top?.] > > Look at this picture: http://www.permisecole.com/code-route/priorites/faux-carrefour-a-sens-giratoire.jpg > Imagine you sit in this car and you want to turn RIGHT. What will > you do? Will you turn the driving wheel clockwise or > counterclockwise? It is not clear what you mean here. Should I take into account that the car is parked (judging by the hands being not on the steering wheel in 1:51 position)? (And parked where parking is more or less clearly illegal?) Should I take into account that the previous stretch of the road is curving right, but the current short segment is straight? [You see: currently, I teach very small kids, and try to make my problems as unambiguous as possible. ;-] Ilya From prosfilaes at gmail.com Mon Nov 10 19:17:00 2014 From: prosfilaes at gmail.com (David Starner) Date: Mon, 10 Nov 2014 17:17:00 -0800 Subject: Terms for rotations In-Reply-To: References: <20141110213654.GA20741@math.berkeley.edu> <54614D89.9060300@colson.eu> Message-ID: On Mon, Nov 10, 2014 at 4:12 PM, Whistler, Ken wrote: > Seriously, I think that Ilya's point is well-taken. Although in English > there is a strong association of the phrase "turn to the right" with > clockwise motion for control devices which rotate, if you take the > phrase out of that mechanical context and just talk about the > orientation of pictures on paper, there can be some ambiguity > based on the conceptual confusion with the concept of > "turning to[wards] facing the right", which can mean something > very different for symbols which seem to have built-in > directions, like arrows. So is there anything wrong with CLOCKWISE and COUNTERCLOCKWISE? TURNED COUNTERCLOCKWISE seems a little verbose. WIDDERSHINS is shorter then COUNTERCLOCKWISE, but is not exactly a common term, especially in technical English. -- Kie ekzistas vivo, ekzistas espero. From ken.whistler at sap.com Mon Nov 10 19:32:49 2014 From: ken.whistler at sap.com (Whistler, Ken) Date: Tue, 11 Nov 2014 01:32:49 +0000 Subject: Terms for rotations In-Reply-To: References: <20141110213654.GA20741@math.berkeley.edu> <54614D89.9060300@colson.eu> Message-ID: > WIDDERSHINS is shorter then > COUNTERCLOCKWISE, but is not exactly a common term, especially in > technical English. Aye, but laddie, then we'd have to use DEASIL for CLOCKWISE! And we'd have wiccans after us to spell it "DEOSIL" instead. ;-) --Ken From petercon at microsoft.com Mon Nov 10 19:48:36 2014 From: petercon at microsoft.com (Peter Constable) Date: Tue, 11 Nov 2014 01:48:36 +0000 Subject: Terms for rotations In-Reply-To: References: <20141110213654.GA20741@math.berkeley.edu> <54614D89.9060300@colson.eu> Message-ID: <2be8145b43e24319970e492a3f4efd13@BLUPR03MB120.namprd03.prod.outlook.com> Might also be useful that the primary purpose of the character names is to provide unique, reference identifiers that should be reasonably reflective of the character identity. But they don't need to guarantee unambiguous understanding of the character identity absent of any additional information. In particular, two things that can be assumed when interpreting a character name to understand the character identity are (1) access to the representative glyph for the character from the code charts, and (2) access to the name and representative glyph from the code charts for related characters. So, for example, the identity of 026F LATIN SMALL LETTER TURNED M and 1D1F LATIN SMALL LETTER SIDEWAYS TURNED M can only be clearly understood in reference to the representative glyphs for these characters and to 006D LATIN SMALL LETTER M. Peter -----Original Message----- From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Whistler, Ken Sent: Monday, November 10, 2014 4:12 PM To: Jean-Fran?ois Colson Cc: Whistler, Ken; unicode at unicode.org Subject: RE: Terms for rotations > Look at this picture: > http://www.permisecole.com/code-route/priorites/faux-carrefour-a-sens- > giratoire.jpg Imagine you sit in this car and you want to turn RIGHT. > What will you do? Will you turn the driving wheel clockwise or > counterclockwise? And now imagine that you are motoring in a 1904 Cyklonette. Which way would you move the tiller? ;-) Seriously, I think that Ilya's point is well-taken. Although in English there is a strong association of the phrase "turn to the right" with clockwise motion for control devices which rotate, if you take the phrase out of that mechanical context and just talk about the orientation of pictures on paper, there can be some ambiguity based on the conceptual confusion with the concept of "turning to[wards] facing the right", which can mean something very different for symbols which seem to have built-in directions, like arrows. --Ken _______________________________________________ Unicode mailing list Unicode at unicode.org http://unicode.org/mailman/listinfo/unicode From A.Schappo at lboro.ac.uk Tue Nov 11 05:20:50 2014 From: A.Schappo at lboro.ac.uk (Andre Schappo) Date: Tue, 11 Nov 2014 11:20:50 +0000 Subject: Emoji skin tone modifiers on the website of a leading German daily newspaper In-Reply-To: References: <135798969.20141107235258@acssoft.de> Message-ID: <031C03BF-3951-4178-9F71-047121738438@lboro.ac.uk> 2014-11-08 21:14 GMT+01:00 Mark Davis ?? >: As far as I can tell it is garnering interest all over.. Several German publications, including Spiegel, to French and Italian regional papers, to Indonesian, Vietnamese.... some chinese emoji skin tone modifiers web articles: http://t.w.cn/jingdian/jingdian/1378387.html http://qdaily.com.cn/display/articles/3306 http://www.designboom.cn/??/??????-??-????????/ http://www.chinadaily.com.cn/language_tips/news/2014-11/06/content_18876358.htm Andr? -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at curtisclark.org Thu Nov 13 00:40:42 2014 From: lists at curtisclark.org (Curtis Clark) Date: Wed, 12 Nov 2014 22:40:42 -0800 Subject: Terms for rotations In-Reply-To: References: <20141110213654.GA20741@math.berkeley.edu> <54614D89.9060300@colson.eu> Message-ID: <5464526A.9040106@curtisclark.org> On 2014-11-10 5:32 PM, Whistler, Ken wrote: >> WIDDERSHINS is shorter then > Aye, but laddie, then we'd have to use DEASIL for CLOCKWISE! > > And we'd have wiccans after us to spell it "DEOSIL" instead. ;-) And the Irish would no doubt insist on DEISEAL. -- Curtis Clark, PhD http://www.cpp.edu/~jcclark Professor Emeritus Biological Sciences +1 909 869 4140 Cal Poly Pomona, Pomona CA 91768 Please note new email address: jcclark at cpp.edu From andrewcwest at gmail.com Thu Nov 13 02:41:49 2014 From: andrewcwest at gmail.com (Andrew West) Date: Thu, 13 Nov 2014 08:41:49 +0000 Subject: Terms for rotations In-Reply-To: References: <20141110213654.GA20741@math.berkeley.edu> <54614D89.9060300@colson.eu> Message-ID: On 11 November 2014 01:17, David Starner wrote: > On Mon, Nov 10, 2014 at 4:12 PM, Whistler, Ken wrote: >> Seriously, I think that Ilya's point is well-taken. Although in English >> there is a strong association of the phrase "turn to the right" with >> clockwise motion for control devices which rotate, if you take the >> phrase out of that mechanical context and just talk about the >> orientation of pictures on paper, there can be some ambiguity >> based on the conceptual confusion with the concept of >> "turning to[wards] facing the right", which can mean something >> very different for symbols which seem to have built-in >> directions, like arrows. > > So is there anything wrong with CLOCKWISE and COUNTERCLOCKWISE? TURNED > COUNTERCLOCKWISE seems a little verbose. WIDDERSHINS is shorter then > COUNTERCLOCKWISE, but is not exactly a common term, especially in > technical English. ANTICLOCKWISE is the term used in the UCS (see names for 20D4, 20DA, 21B6, 21BA, 2233, 27F2, 2939, 293A, 293B, 293D, 293F, 2940, 29BC, 2A11, 2B6F, 2B8C, 2B8D, 2B8E, 2B8F, 2B94, 1F504). Andrew From ishida at w3.org Thu Nov 13 04:00:20 2014 From: ishida at w3.org (Richard Ishida) Date: Thu, 13 Nov 2014 10:00:20 +0000 Subject: MONGOLIAN LETTER YA medial second form, incorrect image? Message-ID: <54648134.7000900@w3.org> Before reporting this I want to check I have understood it correctly. If you know something about Mongolian variant selectors, please let me know if my conclusion is correct. I think the image for medial MONGOLIAN LETTER YA second form, 1836 180B, at http://www.unicode.org/Public/UNIDATA/StandardizedVariants.html is incorrect. I think it should have no upturn on the left. The Mongolian Baiti, Mongolian White, Mongolian Writing, and Noto Sans Mongolian fonts all produce a glyph with no upturn in medial position. The glyph with upturn is the default medial glyph (except before i in Mongolian Baiti). Bottom line: I believe that the chart should show the same image for medial as it shows for initial. ri From andrewcwest at gmail.com Thu Nov 13 04:30:19 2014 From: andrewcwest at gmail.com (Andrew West) Date: Thu, 13 Nov 2014 10:30:19 +0000 Subject: MONGOLIAN LETTER YA medial second form, incorrect image? In-Reply-To: <54648134.7000900@w3.org> References: <54648134.7000900@w3.org> Message-ID: On 13 November 2014 10:00, Richard Ishida wrote: > Before reporting this I want to check I have understood it correctly. If you > know something about Mongolian variant selectors, please let me know if my > conclusion is correct. > > I think the image for medial MONGOLIAN LETTER YA second form, 1836 180B, at > http://www.unicode.org/Public/UNIDATA/StandardizedVariants.html is > incorrect. > > I think it should have no upturn on the left. Yes, you are correct. It makes no sense to have an upturn as that would be the same glyph as the first medial form. You can see that the second initial form and the second medial form both have the same glyph with no upturn (ignore the dot, that is a printing artefact) in Prof. Choijinzhab's "Mongolian Encoding": http://www.babelstone.co.uk/Mongolian/MGWBM/MGWBM_C034-C035.jpg Andrew From verdy_p at wanadoo.fr Sat Nov 15 18:56:05 2014 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sun, 16 Nov 2014 01:56:05 +0100 Subject: New Emoji Candidates for Unicode 8.0 In-Reply-To: References: <54667BF6.1070300@unicode.org> Message-ID: Note that this missing "index.html" page should probably just link to http://www.unicode.org/reports/tr51 (the last version ?) or to http://www.unicode.org/reports/tr51/tr51-1.html (version 1) but still contain an index list for the files in that directory 2014-11-16 0:56 GMT+01:00 Philippe Verdy : > One page has disappeared in emoji data: > > http://www.unicode.org/Public/emoji/1.0/index.html > > (it is referenced in all other pages listed in > http://www.unicode.org/Public/emoji/1.0/ and supposed to explain the > format or explain sources...) > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Mon Nov 17 01:35:57 2014 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Mon, 17 Nov 2014 07:35:57 +0000 Subject: The rapid evolution of a wordless tongue Message-ID: http://nymag.com/daily/intelligencer/2014/11/emojis-rapid-evolution.html A more extended article from NY Magazine about the growing usage of emoji, and the ways in which that usage is developing. Has a quote from Peter Constable and (indirect) reference to +Steven R. Loomis.? ?IT?S EASY TO DISMISS EMOJI. They are, at first glance, ridiculous. They are a small invasive cartoon army of faces and vehicles and flags and food and symbols trying to topple the millennia-long reign of words. Emoji are intended to illustrate, or in some cases replace altogether, the words we send each other digitally, whether in a text message, email, or tweet. Taken together, emoji look like the electronic equivalent of those puffy stickers tweens used to ornament their Trapper Keepers. And yet...? -------------- next part -------------- An HTML attachment was scrubbed... URL: From as at signographie.de Mon Nov 17 04:09:00 2014 From: as at signographie.de (=?iso-8859-1?Q?Andreas_St=F6tzner?=) Date: Mon, 17 Nov 2014 11:09:00 +0100 Subject: =?windows-1252?Q?Re=3A_The_rapid_=85_erosion_of_definition_abili?= =?windows-1252?Q?ty?= In-Reply-To: References: Message-ID: <9AED365C-32DC-4AEE-A732-CDBE3B80482E@signographie.de> Am 17.11.2014 um 08:35 schrieb Mark Davis ??: > IT?S EASY TO DISMISS EMOJI. They are, at first glance, ridiculous The only ridiculous thing is to name them ?Emoji? outside Japan. They?re just signs and that?s it. Regards, Andreas St?tzner. _______________________________________________________________________________ Andreas St?tzner Gestaltung Signographie Fontentwicklung Haus des Buches Gerichtsweg 28, Raum 434 04103 Leipzig 0176-86823396 http://stoetzner-gestaltung.prosite.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From leoboiko at namakajiri.net Mon Nov 17 04:46:56 2014 From: leoboiko at namakajiri.net (Leonardo Boiko) Date: Mon, 17 Nov 2014 08:46:56 -0200 Subject: =?UTF-8?Q?Re=3A_The_rapid_=E2=80=A6_erosion_of_definition_ability?= In-Reply-To: <9AED365C-32DC-4AEE-A732-CDBE3B80482E@signographie.de> References: <9AED365C-32DC-4AEE-A732-CDBE3B80482E@signographie.de> Message-ID: "Sign" is too general. The word has no less than 12 meanings, and can refer e.g. to many Unicode characters that are not emojis ("the sharp sign", "the less-than sign").[1] It's useful to have a specialized word referring specifically to the new pictograms used to color electronic messages with emotional inflection. Borrowing is a perfectly adequate and natural strategy to get such a word into a language ? as indeed English did with the word "sign", from Old French *signe *< Latin *signum* ; and as Japanese did with the English word *emotion *, from which the *emo-* in *emoji, *and with Chinese, from which *-ji* "written character". If borrowing words when they're useful is ridiculous, then all languages are ridiculous, and when everything is ridiculous nothing is. [1] http://en.wiktionary.org/wiki/sign 2014-11-17 8:09 GMT-02:00 Andreas St?tzner : > > Am 17.11.2014 um 08:35 schrieb Mark Davis ??: > > IT?S EASY TO DISMISS EMOJI. They are, at first glance, ridiculous > > > The only ridiculous thing is to name them ?Emoji? outside Japan. > They?re just signs and that?s it. > > > Regards, > Andreas St?tzner. > > > > > > _______________________________________________________________________________ > > Andreas St?tzner Gestaltung Signographie Fontentwicklung > > Haus des Buches > Gerichtsweg 28, Raum 434 > 04103 Leipzig > 0176-86823396 > > http://stoetzner-gestaltung.prosite.com > > > > > > > > > > > > > > > > > > > > _______________________________________________ > Unicode mailing list > Unicode at unicode.org > http://unicode.org/mailman/listinfo/unicode > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From as at signographie.de Mon Nov 17 05:10:06 2014 From: as at signographie.de (=?iso-8859-1?Q?Andreas_St=F6tzner?=) Date: Mon, 17 Nov 2014 12:10:06 +0100 Subject: =?windows-1252?Q?Re=3A_The_rapid_=85_erosion_of_definition_abili?= =?windows-1252?Q?ty?= In-Reply-To: References: <9AED365C-32DC-4AEE-A732-CDBE3B80482E@signographie.de> Message-ID: Am 17.11.2014 um 11:46 schrieb Leonardo Boiko: > "Sign" is too general in its generality it is just perfect. The sets of signs in question are most general, covering much more matters, objects and topics than the actual emoticons. The UCS defines the 1F600 set properly as Emoticons. At least, we should (in English) speak of Emoticons and not Emoji. Other ?symbols? (another misnomer i.m.h.o., but that?s another story) of this kind are termed ?Miscellaneous Symbols and Pictographs?. This is not bad but unprecise as well since many of these signs are not pictographs but ideographs. Yeah what the heck ;) We have a long tradition of naming these things rather lousy (?Dingbats?). I am a traditionalist as a matter of fact but if precise terming is tricky I find it better to generalize than to blur. _______________________________________________________________________________ Andreas St?tzner Gestaltung Signographie Fontentwicklung Haus des Buches Gerichtsweg 28, Raum 434 04103 Leipzig 0176-86823396 http://stoetzner-gestaltung.prosite.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From budelberger.richard at wanadoo.fr Mon Nov 17 05:12:01 2014 From: budelberger.richard at wanadoo.fr (Richard BUDELBERGER) Date: Mon, 17 Nov 2014 12:12:01 +0100 (CET) Subject: The rapid evolution of a wordless tongue In-Reply-To: References: Message-ID: <1762247221.5067.1416222721906.JavaMail.www@wwinf1n12> > Message du 17/11/14 08:55 > De : "Mark Davis ??" > A : "Unicode Public" , "UTC" , "emoji-utc" > Objet : The rapid evolution of a wordless tongue > > http://nymag.com/daily/intelligencer/2014/11/emojis-rapid-evolution.html I love the nickname?? ??emoji don?t have official names, just nicknames created by their users?? ? of the oldest ?emoji??: ??Invisible Man With Twirled Mustache??, that is ??The 3,000-year-old tilde??, ??~??? From prosfilaes at gmail.com Mon Nov 17 05:36:58 2014 From: prosfilaes at gmail.com (David Starner) Date: Mon, 17 Nov 2014 03:36:58 -0800 Subject: =?UTF-8?Q?Re=3A_The_rapid_=E2=80=A6_erosion_of_definition_ability?= In-Reply-To: References: <9AED365C-32DC-4AEE-A732-CDBE3B80482E@signographie.de> Message-ID: On Mon, Nov 17, 2014 at 3:10 AM, Andreas St?tzner wrote: > > Am 17.11.2014 um 11:46 schrieb Leonardo Boiko: > > "Sign" is too general > > > in its generality it is just perfect. The sets of signs in question are most > general, covering much more matters, objects and topics than the actual > emoticons. They aren't signs. I can't say that that is true for all dialects of English, but it's certainly true for my idiolect. > The UCS defines the 1F600 set properly as Emoticons. At least, we should (in > English) speak of Emoticons and not Emoji. Why? Why is one better then the other? > Other ?symbols? (another misnomer > i.m.h.o., but that?s another story) A word that dates back to at least the 18th century; e.g. http://books.google.com/books?id=LgJQAAAAYAAJ&pg=PR22 . -- Kie ekzistas vivo, ekzistas espero. From leoboiko at namakajiri.net Mon Nov 17 05:44:36 2014 From: leoboiko at namakajiri.net (Leonardo Boiko) Date: Mon, 17 Nov 2014 09:44:36 -0200 Subject: =?UTF-8?Q?Re=3A_The_rapid_=E2=80=A6_erosion_of_definition_ability?= In-Reply-To: References: <9AED365C-32DC-4AEE-A732-CDBE3B80482E@signographie.de> Message-ID: 2014-11-17 9:08 GMT-02:00 Magnus Bodin ? : > Just to clarify. The transcribed form "ji" in the japanese emoji word > ??? is probably not from mandarin, since ? is pronounced "zi" in mandarin. > Is it pronounced "ji" in an other chinese language? > Japanese doesn't usually borrow from Mandarin. Rather, a large amount of its vocabulary (about 60%) was borrowed from classical and medieval Chinese (much like the way that 58% of English words were borrowed from Latin and French). These words of Chinese origin are called *kango* in Japanese, and *ji *is one of them ? quite naturally, as the concept of ?written character? itself was acquired from China. There are three main layers of Chinese loans into Japanese: a stratum they call *go-on*, which came from Late Old Chinese and Early Middle Chinese (with a Korean flavor); the *kan-on* stratum *, *from the Chang'an dialect of Late Middle Chinese; and a bit of Song/Yuan Late Middle Chinese as *t?s?-on* [1]. The Japanese word *ji *?character? is from *go-on* Chinese, likely developing from Old Chinese *ts??/*dz?h [2] or *dz? [3]. ? may also be pronounced *shi*, which is from the *kan-on* layer. Notice that the Mandarin sound written as ?z? (in ? *z? *) doesn?t denote the [z] consonant but rather [ts] (Mandarin has no voiced consonants like [z] or [d]); and also that the Jap. ?j? isn't English ?j? but the same phoneme as a voiced /ti/ ? /di/ ? [(d)?i]. But this similarity isn't because Japanese borrowed from Mandarin; rather, they're cousins to the same ancestor. [1] Miyake, *Old Japanese: A Phonetic Reconstruction*. [2] Schuessler, *ABC Etymological Dictionary of Old Chinese*. [3] Baxter-Sagart Old Chinese reconstruction. -------------- next part -------------- An HTML attachment was scrubbed... URL: From magnus at bodin.org Mon Nov 17 05:54:16 2014 From: magnus at bodin.org (=?UTF-8?Q?Magnus_Bodin_=E2=98=80?=) Date: Mon, 17 Nov 2014 12:54:16 +0100 Subject: =?UTF-8?Q?Re=3A_The_rapid_=E2=80=A6_erosion_of_definition_ability?= In-Reply-To: References: <9AED365C-32DC-4AEE-A732-CDBE3B80482E@signographie.de> Message-ID: Thanks for a very good clarification. On Mon, Nov 17, 2014 at 12:44 PM, Leonardo Boiko wrote: > 2014-11-17 9:08 GMT-02:00 Magnus Bodin ? : > >> Just to clarify. The transcribed form "ji" in the japanese emoji word >> ??? is probably not from mandarin, since ? is pronounced "zi" in mandarin. >> Is it pronounced "ji" in an other chinese language? >> > > Japanese doesn't usually borrow from Mandarin. Rather, a large amount of > its vocabulary (about 60%) was borrowed from classical and medieval Chinese > (much like the way that 58% of English words were borrowed from Latin and > French). These words of Chinese origin are called *kango* in Japanese, > and *ji *is one of them ? quite naturally, as the concept of ?written > character? itself was acquired from China. > > There are three main layers of Chinese loans into Japanese: a stratum they > call *go-on*, which came from Late Old Chinese and Early Middle Chinese > (with a Korean flavor); the *kan-on* stratum *, *from the Chang'an > dialect of Late Middle Chinese; and a bit of Song/Yuan Late Middle Chinese > as *t?s?-on* [1]. > > The Japanese word *ji *?character? is from *go-on* Chinese, likely > developing from Old Chinese *ts??/*dz?h [2] or *dz? [3]. ? may also be > pronounced *shi*, which is from the *kan-on* layer. > > Notice that the Mandarin sound written as ?z? (in ? *z? *) doesn?t denote > the [z] consonant but rather [ts] (Mandarin has no voiced consonants like > [z] or [d]); and also that the Jap. ?j? isn't English ?j? but the same > phoneme as a voiced /ti/ ? /di/ ? [(d)?i]. But this similarity isn't > because Japanese borrowed from Mandarin; rather, they're cousins to the > same ancestor. > > [1] Miyake, *Old Japanese: A Phonetic Reconstruction*. > [2] Schuessler, *ABC Etymological Dictionary of Old Chinese*. > [3] Baxter-Sagart Old Chinese reconstruction. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From leoboiko at namakajiri.net Mon Nov 17 06:08:59 2014 From: leoboiko at namakajiri.net (Leonardo Boiko) Date: Mon, 17 Nov 2014 10:08:59 -0200 Subject: =?UTF-8?Q?Re=3A_The_rapid_=E2=80=A6_erosion_of_definition_ability?= In-Reply-To: References: <9AED365C-32DC-4AEE-A732-CDBE3B80482E@signographie.de> Message-ID: 2014-11-17 9:10 GMT-02:00 Andreas St?tzner : > [sign] in its generality it is just perfect. [?] At least, we should (in English) speak of Emoticons and not Emoji. [?] if precise terming is tricky I find it better to generalize These are your opinions. I find them to be perfectly valid (exactly as valid as anyone else?s, mine included). However, no single individual's opinion has any special power about what goes into the vocabulary of a language; rather, the lexicon is determined collectively by whatever the community of speakers finds to be useful. Clearly English speakers found "sign" to be too imprecise, and as of now, they seem to prefer "emoji" to "emoticon" (probably because "emoticon" was already in use to denote multi-character pictographs built from non-pictographs, such as ":-)" ? the original use of the coinage). If speakers want a word referring specifically to these new modal pictograms, they will have one and that's it. You're entitled to find linguistic borrowing to be "ridiculous"; but I'm equally entitled to find your moral judgment to be condescending and historically uninformed (unless you want to restrict yourself to Anglo-Saxon words, in which case say goodbye to "generality" (< Lat. *generalis*), "emotion" (< Fr. *?motion*), "icon" (< Greek *eikon*) etc.); and at any rate neither of our opinions will have any effect in what words shall the speakers adopt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Mon Nov 17 06:14:34 2014 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Mon, 17 Nov 2014 12:14:34 +0000 Subject: The rapid ... erosion of definition ability References: <9AED365C-32DC-4AEE-A732-CDBE3B80482E@signographie.de> Message-ID: On Mon Nov 17 2014 at 12:15:08 PM Andreas St?tzner wrote: > > Am 17.11.2014 um 11:46 schrieb Leonardo Boiko: > > "Sign" is too general > > > in its generality it is just perfect. The sets of signs in question are > most general, covering much more matters, objects and topics than the > actual emoticons. > >> They?re just signs and that?s it. The term 'emoji' is certainly a useful term for people to use, denoting a certain kind of symbol. Saying that one should never use it is like saying that one should never say "dog" or "cat", only the generic "animal"... > The UCS defines the 1F600 set properly as Emoticons. At least, we should > (in English) speak of Emoticons and not Emoji. > Not really (and we don't really "define" them as emoticons; that's just the block name?and arguably should should have been different). > Other ?symbols? (another misnomer i.m.h.o., but that?s another story) > Not, at least, in English. > of this kind are termed ?Miscellaneous Symbols and Pictographs?. This is > not bad but unprecise as well since many of these signs are not pictographs > but ideographs. > We warn people in multiple places that the names of blocks are *not* reliable guides to the kinds of characters in the block. > Yeah what the heck ;) > > We have a long tradition of naming these things rather lousy (?Dingbats?). > I am a traditionalist as a matter of fact but if precise terming is tricky > I find it better to generalize than to blur. > I generally agree about the utility of having generic terms in a language. Listening to Swiss newscasts, I find it bizarre to hear pretty clumsy phrasing that is the equivalent of the following (because there is a different form for male and female of many nouns). ? The politicians(m) and politicians(f) met with the directors(m) and directors(f), writers(m) and writers(f), and actors(m) and actresses. We suffer from it much less in English, mostly with "he" and "she", although clearly the use of "they" as a gender-neutral signular is on the upswing (although it's been around for centuries). However, what is most useful is when there are generic terms, *plus* specific ones. > > > > _______________________________________________________________________________ > > Andreas St?tzner Gestaltung Signographie Fontentwicklung > > Haus des Buches > Gerichtsweg 28, Raum 434 > 04103 Leipzig > 0176-86823396 > > http://stoetzner-gestaltung.prosite.com > > > _______________________________________________ > Unicode mailing list > Unicode at unicode.org > http://unicode.org/mailman/listinfo/unicode > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Mon Nov 17 06:15:30 2014 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Mon, 17 Nov 2014 12:15:30 +0000 Subject: The rapid ... erosion of definition ability References: <9AED365C-32DC-4AEE-A732-CDBE3B80482E@signographie.de> Message-ID: I agree (except for the derivation of "emoji"). On Mon Nov 17 2014 at 11:46:58 AM Leonardo Boiko wrote: > "Sign" is too general. The word has no less than 12 meanings, and can > refer e.g. to many Unicode characters that are not emojis ("the sharp > sign", "the less-than sign").[1] > > It's useful to have a specialized word referring specifically to the new > pictograms used to color electronic messages with emotional inflection. > Borrowing is a perfectly adequate and natural strategy to get such a word > into a language ? as indeed English did with the word "sign", from Old > French *signe *< Latin *signum* ; and as Japanese did with the English > word *emotion *, from which the *emo-* in *emoji, *and with Chinese, > from which *-ji* "written character". > > If borrowing words when they're useful is ridiculous, then all languages > are ridiculous, and when everything is ridiculous nothing is. > > > [1] http://en.wiktionary.org/wiki/sign > > > > 2014-11-17 8:09 GMT-02:00 Andreas St?tzner : > >> >> Am 17.11.2014 um 08:35 schrieb Mark Davis ??: >> >> IT?S EASY TO DISMISS EMOJI. They are, at first glance, ridiculous >> >> >> The only ridiculous thing is to name them ?Emoji? outside Japan. >> They?re just signs and that?s it. >> >> >> Regards, >> Andreas St?tzner. >> >> >> >> >> >> _______________________________________________________________________________ >> >> Andreas St?tzner Gestaltung Signographie Fontentwicklung >> >> Haus des Buches >> Gerichtsweg 28, Raum 434 >> 04103 Leipzig >> 0176-86823396 >> >> http://stoetzner-gestaltung.prosite.com >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> Unicode mailing list >> Unicode at unicode.org >> http://unicode.org/mailman/listinfo/unicode >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From leoboiko at namakajiri.net Mon Nov 17 06:23:52 2014 From: leoboiko at namakajiri.net (Leonardo Boiko) Date: Mon, 17 Nov 2014 10:23:52 -0200 Subject: The rapid ... erosion of definition ability In-Reply-To: References: <9AED365C-32DC-4AEE-A732-CDBE3B80482E@signographie.de> Message-ID: 2014-11-17 10:15 GMT-02:00 Mark Davis ?? : > I agree (except for the derivation of "emoji"). > Oh, you're totally right: *e-* ?drawing? plus *-moji *?character?. My mistake! ?? -------------- next part -------------- An HTML attachment was scrubbed... URL: From petercon at microsoft.com Mon Nov 17 23:39:39 2014 From: petercon at microsoft.com (Peter Constable) Date: Tue, 18 Nov 2014 05:39:39 +0000 Subject: =?utf-8?B?UkU6IFRoZSByYXBpZCDigKYgZXJvc2lvbiBvZiBkZWZpbml0aW9uIGFiaWxp?= =?utf-8?Q?ty?= In-Reply-To: <9AED365C-32DC-4AEE-A732-CDBE3B80482E@signographie.de> References: <9AED365C-32DC-4AEE-A732-CDBE3B80482E@signographie.de> Message-ID: That would be a bit like our forebears having said, ?It?s ridiculous to call them ?tomatoes? outside Mexico. They?re just big berries, and that?s it.? That observation may have been true, but also beside the point if, in practice, the Europeans found it convenient and chose to call them ?tomatoes?. Peter From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Andreas St?tzner Sent: Monday, November 17, 2014 2:09 AM To: Mark Davis ?? Cc: unicode at unicode.org Subject: Re: The rapid ? erosion of definition ability Am 17.11.2014 um 08:35 schrieb Mark Davis ??: IT?S EASY TO DISMISS EMOJI. They are, at first glance, ridiculous The only ridiculous thing is to name them ?Emoji? outside Japan. They?re just signs and that?s it. Regards, Andreas St?tzner. _______________________________________________________________________________ Andreas St?tzner Gestaltung Signographie Fontentwicklung Haus des Buches Gerichtsweg 28, Raum 434 04103 Leipzig 0176-86823396 http://stoetzner-gestaltung.prosite.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ishida at w3.org Tue Nov 18 12:12:02 2014 From: ishida at w3.org (Richard Ishida) Date: Tue, 18 Nov 2014 18:12:02 +0000 Subject: MONGOLIAN LETTER YA medial second form, incorrect image? In-Reply-To: <317da4c3b9db4780a39e6dd4e892e72e@BN1PR03MB139.namprd03.prod.outlook.com> References: <54648134.7000900@w3.org> <317da4c3b9db4780a39e6dd4e892e72e@BN1PR03MB139.namprd03.prod.outlook.com> Message-ID: <546B8BF2.4080406@w3.org> thanks for this, Andrew. I'm not quite sure how to read your handwritten notes, but if you consider the drawing to the far right to be the expected outcome for your corrected naming, then you appear to agree with what I was suggesting (ie. that ya+fvs1 should have no upturn), *if* i read them correctly at the far right. That would be different from what you have in the current draft specification, though. First, here is the text of the bug report I sent to Unicode: =============================== http://www.unicode.org/Public/UNIDATA/StandardizedVariants.html (Standardized Variants) shows a glyph for MONGOLIAN LETTER YA in the second medial form with an upturn to the left. I believe this image should show a straight downward line. Reasons: 1. the first initial form has an upturn, and the second initial form is straight 2. Professor Quejingzhabu's chart at http://www.babelstone.co.uk/Mongolian/MGWBM/MGWBM_C034-C035.jpg shows the upturn for the first medial form and the straight line for the second medial form. 3. the Mongolian Baiti, Mongolian White, Mongolian Writing, and Noto Sans Mongolian fonts all produce the upturn for the standard medial form and the straight line when followed by FVS#1 (To test these fonts you can go to http://rishida.net/scripts/block/mongolian.html#char1836 and change the font by opening the blue control at the bottom right of the window. See the top table in that section.) =============================== The Mongolian White and Mongolian Writing fonts were developed by people trying to make Unicode stick for traditional Mongolian text, and they seem pretty good, on the whole, though not perfect. You can download them for free from http://www.mongolfont.com/en/font/index.html (I would have added them as webfonts to the page i'm about to mention, but I couldn't find the licence information to ascertain whether that's allowed.) For font comparisons, the following page may help: http://rishida.net/scripts/block/mongolian#char1836 Near the top of the section you have a table for what I understand to be the currently specified shapes (as images) and the shape for whatever font you currently have loaded to view mongolian text. You can change the font by clicking on the vertical blue bar at the bottom right of the window and selecting the font you want from the selection offered there. 'Form tables' means the table at the top of the YA entry. Below the top table for the YA entry is a table of syllables, for which you can also change the font in the same way (use the 'Mongolian text' control). This allows you to see the shape in combination with any (Mongolian) vowel. Note that per this table i noted that "The initial form and the equivalent medial form (ie. no upturn to the left) are used by the Mongolian Baiti font (but not Mongolian White or Noto Sans Mongolian) when followed by the i vowel." ie. there seems to be some vowel-dependent shaping in the Baiti font that's not in the others. That may be a factor in this. You'll see that, at the bottom of the YA entry, I include "Font rendering notes" about font divergences. In fact you'll find similar notes for all the characters where I noticed a difference in behaviour. This should save you some time for comparing actual implementations. (I think it may help if i allow for any font to be used, rather than just provide a selection in the pull-down. I may be able to implement that change to the page this weekend, if it helps.) I still have some detailed notes from Andrew West to read through, but for now, I think that that is all I can offer in the way of information. I hope it's helpful. cheers, ri On 14/11/2014 18:44, Andrew Glass (WINDOWS) wrote: > Dear Richard and Andrew, > > Microsoft has been working with Michel Suignard to update and correct > the Mongolian specification. Here is the response to this issue from our > Mongolian expert: > > 1.) In the early days of development people took Professor > Que?s instructions to heart and implemented them exactly ? even if there > were errors in the ?documentation?. > > 2.) There is such an error in the spec for U+1836 ? see > attached (img375.jpg). The handwriting is my own from some 6-7 years > back. I made the correction (on the attached img375.jpg) as the name did > not match the far-right column specification of the ZWJ-FVS sequence. I > take the ZWJ sequence to be more correct. BUT, this is not the way that > BAITI is implemented. > > 3.) And that is why I write the current Unicode 8.0 DRAFT > Mongolian Specification as attached (see img376.jpg) > > Given a specification that was inconsistent from the beginning AND given > that some years of development have gone on since the inconsistent spec > was ?given out? AND therefore there should probably be variant > implementations ?out there?, what do we say is the correct > specification? We cannot go back to the specifier (Professor Que) as he > is not a font man. My thought is that the best thing to do is take a > look at the fonts out on the market, do an analysis and comparison and > make a decision from there. If indeed there are similar implementation > in the use of the variant selector at the medial position of 1836, even > if it does go against what the specification ?said? in the beginning, > and that becomes the de facto standard for the 1836 medial variation > selector. > > Can we get a copy of the fonts that Richard references ? Mongolian White > and Mongolian Writing for comparison? > > The NOTO Sans Mongolian seems to be doing the right thing: > > But I expect it is too early for this font to be widely used. > > Cheers, > > Andrew G > > -----Original Message----- > From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Andrew West > Sent: Thursday, November 13, 2014 2:30 AM > To: Richard Ishida > Cc: UnicoDe List > Subject: Re: MONGOLIAN LETTER YA medial second form, incorrect image? > > On 13 November 2014 10:00, Richard Ishida > wrote: > > > Before reporting this I want to check I have understood it correctly. > > > If you know something about Mongolian variant selectors, please let me > > > know if my conclusion is correct. > > > > > > I think the image for medial MONGOLIAN LETTER YA second form, 1836 > > > 180B, at > > > http://www.unicode.org/Public/UNIDATA/StandardizedVariants.html is > incorrect. > > > > > > I think it should have no upturn on the left. > > Yes, you are correct. It makes no sense to have an upturn as that would > be the same glyph as the first medial form. You can see that the second > initial form and the second medial form both have the same glyph with no > upturn (ignore the dot, that is a printing artefact) in Prof. > Choijinzhab's "Mongolian Encoding": > > http://www.babelstone.co.uk/Mongolian/MGWBM/MGWBM_C034-C035.jpg > > Andrew > > _______________________________________________ > > Unicode mailing list > > Unicode at unicode.org > > http://unicode.org/mailman/listinfo/unicode > From bevcorwin at gmail.com Wed Nov 19 17:08:17 2014 From: bevcorwin at gmail.com (Bev Corwin) Date: Wed, 19 Nov 2014 18:08:17 -0500 Subject: subscribe Message-ID: subscribe -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.fynn at gmail.com Mon Nov 24 08:53:58 2014 From: chris.fynn at gmail.com (Christopher Fynn) Date: Mon, 24 Nov 2014 20:53:58 +0600 Subject: subscribe In-Reply-To: References: Message-ID: Ben You can subscribe to the Unicode mailing list on line at: http://unicode.org/mailman/listinfo/unicode (Not by sending a SUBSCRIBE message to the list) On 20/11/2014, Bev Corwin wrote: > subscribe