From unicode at unicode.org Thu Mar 1 04:56:02 2018 From: unicode at unicode.org (James Kass via Unicode) Date: Thu, 1 Mar 2018 02:56:02 -0800 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: <91680448.22170.1519824152519@ox.hosteurope.de> References: <5A95D192.5050608@unicode.org> <91680448.22170.1519824152519@ox.hosteurope.de> Message-ID: Christoph P?per wrote, >> There are approximately 7,000 living human languages, >> but fewer than 100 of these languages are well-supported on computers, >> ... > > Why is the announcement mentioning those numbers of languages at all? > The script coverage of written living human languages, except > for constructed ones, is almost complete in Unicode and rendering > for most of them is reasonably well supported by all modern > operating systems ... This page ... https://www.unicode.org/standard/unsupported.html ... lists several modern scripts which are not yet encoded. (Hanifi Rohingya, Gunjala Gondi, Loma, Medefaidrin, Naxi Dongba (Moso), and Nyiakeng Puachue Hmong.) It's noted that there are additional unencoded "minor modern scripts" shown on the Roadmap, which implies that those listed are also "minor". -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Mar 1 05:11:58 2018 From: unicode at unicode.org (James Kass via Unicode) Date: Thu, 1 Mar 2018 03:11:58 -0800 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: References: <5A95D192.5050608@unicode.org> <91680448.22170.1519824152519@ox.hosteurope.de> Message-ID: Here's a good opening line: "The Unicode Standard encodes scripts rather than languages." https://www.unicode.org/standard/supported.html But, quoting from this page: http://www.unicode.org/consortium/aboutdonations.html " ... and provide universal access for the world's languages?past, present, and future. The Consortium lays the groundwork to enable universal access by encoding the characters for the world?s languages, ..." That's inaccurate. Languages don't use characters, technically. It's more about providing universal access for the world's communication, data, and history. You know, the sum of mankind's knowledge that's been digitized so far. Unicode encodes the characters used for the world's computer data interchange and storage systems. Salesmen and techies have different requirements for accuracy, however. From unicode at unicode.org Thu Mar 1 11:04:05 2018 From: unicode at unicode.org (Tim Partridge via Unicode) Date: Thu, 1 Mar 2018 17:04:05 +0000 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: References: <5A95D192.5050608@unicode.org> <91680448.22170.1519824152519@ox.hosteurope.de> , Message-ID: Perhaps the CLDR work the Consortium does is being referenced. That is by language on this list http://www.unicode.org/cldr/charts/32/supplemental/locale_coverage.html#ee By the time it gets to the 100th entry the Modern percentage has "room for improvement". Regards, Tim ________________________________________ From: Unicode [unicode-bounces at unicode.org] on behalf of James Kass via Unicode [unicode at unicode.org] Sent: 01 March 2018 11:11 To: Unicode Public Subject: Re: Unicode Emoji 11.0 characters now ready for adoption! Here's a good opening line: "The Unicode Standard encodes scripts rather than languages." https://www.unicode.org/standard/supported.html But, quoting from this page: http://www.unicode.org/consortium/aboutdonations.html " ... and provide universal access for the world's languages?past, present, and future. The Consortium lays the groundwork to enable universal access by encoding the characters for the world?s languages, ..." That's inaccurate. Languages don't use characters, technically. It's more about providing universal access for the world's communication, data, and history. You know, the sum of mankind's knowledge that's been digitized so far. Unicode encodes the characters used for the world's computer data interchange and storage systems. Salesmen and techies have different requirements for accuracy, however. From unicode at unicode.org Thu Mar 1 14:10:07 2018 From: unicode at unicode.org (Doug Ewell via Unicode) Date: Thu, 01 Mar 2018 13:10:07 -0700 Subject: Unicode Emoji 11.0 characters now ready for adoption! Message-ID: <20180301131007.665a7a7059d7ee80bb4d670165c8327d.6d7c6e7a10.wbe@email03.godaddy.com> Tim Partridge wrote: > Perhaps the CLDR work the Consortium does is being referenced. That is > by language on this list > http://www.unicode.org/cldr/charts/32/supplemental/locale_coverage.html#ee > By the time it gets to the 100th entry the Modern percentage has "room > for improvement". I think that is a measurement of locale coverage -- whether the collation tables and translations of "a.m." and "p.m." and "a week ago Thursday" are correct and verified -- not character coverage. -- Doug Ewell | Thornton, CO, US | ewellic.org From unicode at unicode.org Fri Mar 2 07:29:46 2018 From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode) Date: Fri, 2 Mar 2018 14:29:46 +0100 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: <20180301131007.665a7a7059d7ee80bb4d670165c8327d.6d7c6e7a10.wbe@email03.godaddy.com> References: <20180301131007.665a7a7059d7ee80bb4d670165c8327d.6d7c6e7a10.wbe@email03.godaddy.com> Message-ID: Right, Doug. I'll say a few more words. In terms of language support, encoding of new characters in Unicode benefits mostly digital heritage languages (via representation of historic languages in Unicode, enabling preservation and scholarly work), although there are some modern-use cases like Hanifi Rohingya. We do include digital heritage under the umbrella of "digitally disadvantaged languages", but we are not consistent in our terminology sometimes. But encoding is just a first step. A vital first step, but just one step. People tend to forget that adding new characters is just a part of what Unicode does. For script support, it is just as important to have correct Unicode algorithms and properties, such as correct values for the Indic_Positional_Category property (which together with the related work in with the Universal Shaping Engine, allows for proper rendering of many languages). Behind the scenes we have people like Ken and Laurentiu who have to dig through the encoding proposals and fill in the many, many gaps to come up with reasonable properties for such basic behavior as line-break. As important as the work is on encoding, properties, and algorithms, when we go up a level we get CLDR and ICU. Those have more impact on language support for far more people in the world than the addition of new scripts does. After all, approaching half of the population of the globe owns smartphones: ICU provides programmatic access to the Unicode encoding, properties, and algorithms, and CLDR + ICU together provide the core language support on essentially every one of those smartphones. But in terms of language coverage, the chart you reference (and the corresponding graph ) show how very far CLDR still has to go. So we are gearing up for ways to extend that graph: to move at least the basic coverage (the lower plateau in that graph) to more languages, and to move basic-coverage languages up to more in-depth coverage. We are focusing on ways to improve the CLDR survey tool backend and frontend, since we know it currently cannot able to handle the number of people that want to contribute, and has glitches in the UI that make it clumsier to use than it should be. Well, this turned out to be more than just a few words... sorry for going on! Mark On Thu, Mar 1, 2018 at 9:10 PM, Doug Ewell via Unicode wrote: > Tim Partridge wrote: > > > Perhaps the CLDR work the Consortium does is being referenced. That is > > by language on this list > > http://www.unicode.org/cldr/charts/32/supplemental/locale_ > coverage.html#ee > > By the time it gets to the 100th entry the Modern percentage has "room > > for improvement". > > I think that is a measurement of locale coverage -- whether the > collation tables and translations of "a.m." and "p.m." and "a week ago > Thursday" are correct and verified -- not character coverage. > > -- > Doug Ewell | Thornton, CO, US | ewellic.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Mar 2 08:22:36 2018 From: unicode at unicode.org (=?UTF-8?Q?Christoph_P=C3=A4per?= via Unicode) Date: Fri, 2 Mar 2018 15:22:36 +0100 (CET) Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: <20180301131007.665a7a7059d7ee80bb4d670165c8327d.6d7c6e7a10.wbe@email03.godaddy.com> References: <20180301131007.665a7a7059d7ee80bb4d670165c8327d.6d7c6e7a10.wbe@email03.godaddy.com> Message-ID: <1399756717.44805.1520000556843@ox.hosteurope.de> F'up2: cldr-users at unicode.org Doug Ewell via unicode at unicode.org: > > I think that is a measurement of locale coverage -- whether the > collation tables and translations of "a.m." and "p.m." and "a week ago > Thursday" are correct and verified -- not character coverage. By the way, the binary `am` vs. `pm` distinction common in English and labelled `a` as a placeholder in CLDR formats is too simplistic for some languages when using the 12-hour clock (which they usually don't in written language). In German, for instance, you would always use a format with `B` instead (i.e. "morgens", "mittags", "abends", "nachts" or no identifier during daylight). How and where can I best suggest to change this in CLDR? The B formats have their own code, e.g. `Bhms` = `h:mm:ss B`. Should I just propose to set `hms` etc. to the same value next time the Survey Tool is open? In my experience, there are too few people reviewing even the "largest" languages (like German). I participated in v32 and v33, but other than me there were only contributions from (seemingly) a single employee from each of Apple, Google and Microsoft. Most improvements or corrections I suggested just got lost, i.e. nobody discussed or voted on them, so the old values remained. From unicode at unicode.org Fri Mar 2 09:26:18 2018 From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode) Date: Fri, 2 Mar 2018 16:26:18 +0100 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: <1399756717.44805.1520000556843@ox.hosteurope.de> References: <20180301131007.665a7a7059d7ee80bb4d670165c8327d.6d7c6e7a10.wbe@email03.godaddy.com> <1399756717.44805.1520000556843@ox.hosteurope.de> Message-ID: No, the patterns should always have the right format. However, in the supplemental data there is information as to the preferred data for each language. This data isn't collected through the ST, so a ticket needs to be filed. In your particular case, the data has: If DE just doesn't use hB, then you can file a ticket to say that it shouldn't be in @allowed. Note that the format permits either regions or locales, as in: As to involvement, we try to encourage interaction on the forum. In some languages those are quite active; in others not so much. (BTW, a number of your suggestions made sense to me, but not being a native German speaker, I don't weigh in on de.xml except for structural issues or where people seem to miss the intent.) So people may look at the forum, disagree with the proposal, but not respond why they disagree. Mark On Fri, Mar 2, 2018 at 3:22 PM, Christoph P?per via Unicode < unicode at unicode.org> wrote: > F'up2: cldr-users at unicode.org > > Doug Ewell via unicode at unicode.org: > > > > I think that is a measurement of locale coverage -- whether the > > collation tables and translations of "a.m." and "p.m." and "a week ago > > Thursday" are correct and verified -- not character coverage. > > By the way, the binary `am` vs. `pm` distinction common in English and > labelled `a` as a placeholder in CLDR formats is too simplistic for some > languages when using the 12-hour clock (which they usually don't in written > language). In German, for instance, you would always use a format with `B` > instead (i.e. "morgens", "mittags", "abends", "nachts" or no identifier > during daylight). > > How and where can I best suggest to change this in CLDR? The B formats > have their own code, e.g. `Bhms` = `h:mm:ss B`. Should I just propose to > set `hms` etc. to the same value next time the Survey Tool is open? > > In my experience, there are too few people reviewing even the "largest" > languages (like German). I participated in v32 and v33, but other than me > there were only contributions from (seemingly) a single employee from each > of Apple, Google and Microsoft. Most improvements or corrections I > suggested just got lost, i.e. nobody discussed or voted on them, so the old > values remained. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Mar 2 09:51:04 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Fri, 2 Mar 2018 16:51:04 +0100 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: <1399756717.44805.1520000556843@ox.hosteurope.de> References: <20180301131007.665a7a7059d7ee80bb4d670165c8327d.6d7c6e7a10.wbe@email03.godaddy.com> <1399756717.44805.1520000556843@ox.hosteurope.de> Message-ID: day periods (from 00:0 to 24:00 : sometimes "night", but generally included in "matin", then "midi", "apr?s-midi", "soir") are also used in French muct more usefully than the ambiguous and unused am/pm Latin abbreviations that fell compeltely out of use a few centuries ago (side note: not sure if it was commonly abbreviated, most probably only in written form but not spelled orally where it would read only the full latin words in before French finally replaced the judiciary and liturgic "Late Vulgar Latin" language that no one was really understanding correctlmy and it was constantly creolized with the many regional vernacular oil languages instead of following the liturgic and judiciary style; at that time, the "ante/poste meridiem was only heard in christian masses or judiciary documents, both full of corportative jargons, and even different from the approximative Latin of the adminsitration; then Latin collapsed under regional oil languages that differentiated much between each other, before French was finally created, abandoning Latin as the sole source, but reinventing words borrowed from Greek and adapted to the Anjou oil variant used by ruling nobility and the neighborhood of the King and some passionate chuch personalities that also wanted to incoporate the several oc languages and other european languages for the diplomacy; then Frenchc took about 2 centuries to develop before it finally burnt most regional oil variants and nearly burnt also oc variants ; there remains some Latin expressions in French, but only for specific/technical usages, especially in the judiciary language, like in English; but English kept the "ante/post meridiem" only by its abbreviations, and today, most native English speakers don't know really what "am" and "pm" really means). So yes, day periods should have their own format codes. But the number of day periods varies across languages (not really between distinct scripts of the same language), but more importantly also across gerographic regions/countries/territories (more than by language). CLDR would then need more regional variants than those supported for now (ISO 3166-1 codes may not be sufficient as BCP 47 language subtags ) 2018-03-02 15:22 GMT+01:00 Christoph P?per via Unicode : > F'up2: cldr-users at unicode.org > > Doug Ewell via unicode at unicode.org: > > > > I think that is a measurement of locale coverage -- whether the > > collation tables and translations of "a.m." and "p.m." and "a week ago > > Thursday" are correct and verified -- not character coverage. > > By the way, the binary `am` vs. `pm` distinction common in English and > labelled `a` as a placeholder in CLDR formats is too simplistic for some > languages when using the 12-hour clock (which they usually don't in written > language). In German, for instance, you would always use a format with `B` > instead (i.e. "morgens", "mittags", "abends", "nachts" or no identifier > during daylight). > > How and where can I best suggest to change this in CLDR? The B formats > have their own code, e.g. `Bhms` = `h:mm:ss B`. Should I just propose to > set `hms` etc. to the same value next time the Survey Tool is open? > > In my experience, there are too few people reviewing even the "largest" > languages (like German). I participated in v32 and v33, but other than me > there were only contributions from (seemingly) a single employee from each > of Apple, Google and Microsoft. Most improvements or corrections I > suggested just got lost, i.e. nobody discussed or voted on them, so the old > values remained. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Sun Mar 4 08:10:35 2018 From: unicode at unicode.org (Helena Miton via Unicode) Date: Sun, 4 Mar 2018 15:10:35 +0100 Subject: Fonts and font sizes used in the Unicode Message-ID: Greetings. Is there a way to know which font and font size have been used in the Unicode charts (for various writing systems)? Many thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Sun Mar 4 11:12:34 2018 From: unicode at unicode.org (Markus Scherer via Unicode) Date: Sun, 4 Mar 2018 09:12:34 -0800 Subject: Fonts and font sizes used in the Unicode In-Reply-To: References: Message-ID: On Sun, Mar 4, 2018 at 6:10 AM, Helena Miton via Unicode < unicode at unicode.org> wrote: > Greetings. Is there a way to know which font and font size have been used > in the Unicode charts (for various writing systems)? Many thanks! > What are you trying to do? Many of the fonts are unique to the Unicode chart production, and are not licensed for other uses. Some are not even generally usable. markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Sun Mar 4 12:52:26 2018 From: unicode at unicode.org (William_J_G Overington via Unicode) Date: Sun, 4 Mar 2018 18:52:26 +0000 (GMT) Subject: Fonts and font sizes used in the Unicode In-Reply-To: References: Message-ID: <26364438.35351.1520189546089.JavaMail.defaultUser@defaultHost> Helena Milton asks: > Greetings. Is there a way to know which font and font size have been used in the Unicode charts (for various writing systems)? Many thanks! Yes, download the PDF (Portable Document Format) code chart document to local storage. Open the file in Adobe Reader. Right click on the page. On the panel that is displayed, click on Document Properties... and then on the panel that is then displayed, choose the Fonts tab. The list of fonts used in the document is then displayed. Copying a character from the PDF document and pasting it into WordPad may well give the point size of the font that is being used, even if the character glyph is not displayed and what is displayed is just a box with a question mark in it or other some other design of the .notdef glyph from whatever font is being used in WordPad. William From unicode at unicode.org Sun Mar 4 13:49:33 2018 From: unicode at unicode.org (Asmus Freytag via Unicode) Date: Sun, 4 Mar 2018 11:49:33 -0800 Subject: Fonts and font sizes used in the Unicode In-Reply-To: References: Message-ID: <40df19c6-8287-a740-91f3-f00bc827b5e7@ix.netcom.com> An HTML attachment was scrubbed... URL: From unicode at unicode.org Sun Mar 4 21:54:10 2018 From: unicode at unicode.org (fantasai via Unicode) Date: Mon, 5 Mar 2018 12:54:10 +0900 Subject: Emoji as East Asian Width = Wide Message-ID: Why are the new emoji like U+1F600 Grinning Face EAW=Wide when other dingbats like U+263A Smiling Face are EAW=Neutral? This is making it difficult to have consistent formatting across emoticons. Also, emoji aren't really CJK context only now, are they. https://unicode.org/cldr/utility/character.jsp?a=1F600&B1=Show https://unicode.org/cldr/utility/character.jsp?a=263A&B1=Show ~fantasai From unicode at unicode.org Sat Mar 3 19:32:45 2018 From: unicode at unicode.org (=?UTF-8?Q?Martin_J._D=c3=bcrst?= via Unicode) Date: Sun, 4 Mar 2018 10:32:45 +0900 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: <83722fa3ed05a8b0989a963b3f26833a@koremail.com> References: <5A95D192.5050608@unicode.org> <91680448.22170.1519824152519@ox.hosteurope.de> <83722fa3ed05a8b0989a963b3f26833a@koremail.com> Message-ID: <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp> Hello John, On 2018/03/01 12:31, via Unicode wrote: > Pen, or brush and paper is much more flexible. With thousands of names > of people and places still not encoded I am not sure if I would describe > hans (simplified Chinese characters) as well supported. nor with current > policy which limits China with over one billion people to submitting > less than 500 Chinese characters a year on average, and names not being > all to be added, it is hard to say which decade hans will be well > supported. I think this contains several misunderstandings. First, of course pen/brush and paper are more flexible than character encoding, but that's true for the Latin script, too. Second, while I have heard that people create new characters for naming a baby in a traditional Han context, I haven't heard about this in a simplified Han context. And it's not frequent at all, the same way naming a baby John in the US is way more frequent than let's say Qvtwzx. I'd also assume that China has regulations on what characters can be used to name a baby, and that the parents in this age of smartphone communication will think at least twice before giving their baby a name that they cannot send to their relatives via some chat app. Third, I cannot confirm or deny the "500 characters a year" limit, but I'm quite sure that if China (or Hong Kong, Taiwan,...) had a real need to encode more characters, everybody would find a way to handle these. Due to the nature of your claims, it's difficult to falsify many of them. It would be easier to prove them (assuming they were true), so if you have any supporting evidence, please provide it. Regards, Martin. > John Knightley From unicode at unicode.org Mon Mar 5 01:58:33 2018 From: unicode at unicode.org (Oren Watson via Unicode) Date: Mon, 5 Mar 2018 02:58:33 -0500 Subject: Fwd: Emoji as East Asian Width = Wide In-Reply-To: References: Message-ID: EAW is used in fixed-width settings to distinguish characters that should take up one space versus two. I would also prefer that all these be considered wide, since otherwise it causes format problems in these settigns. (unfortunately fixed-width appear to be largley ignored by unicode... ??) On Sun, Mar 4, 2018 at 10:54 PM, fantasai via Unicode wrote: > Why are the new emoji like U+1F600 Grinning Face EAW=Wide > when other dingbats like U+263A Smiling Face are EAW=Neutral? > This is making it difficult to have consistent formatting > across emoticons. Also, emoji aren't really CJK context only > now, are they. > > https://unicode.org/cldr/utility/character.jsp?a=1F600&B1=Show > https://unicode.org/cldr/utility/character.jsp?a=263A&B1=Show > > ~fantasai > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Mar 5 02:57:11 2018 From: unicode at unicode.org (Phake Nick via Unicode) Date: Mon, 05 Mar 2018 08:57:11 +0000 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp> References: <5A95D192.5050608@unicode.org> <91680448.22170.1519824152519@ox.hosteurope.de> <83722fa3ed05a8b0989a963b3f26833a@koremail.com> <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp> Message-ID: ? 2018?3?5??? 13:25?Martin J. D?rst via Unicode ??? > Hello John, > > On 2018/03/01 12:31, via Unicode wrote: > > > Pen, or brush and paper is much more flexible. With thousands of names > > of people and places still not encoded I am not sure if I would describe > > hans (simplified Chinese characters) as well supported. nor with current > > policy which limits China with over one billion people to submitting > > less than 500 Chinese characters a year on average, and names not being > > all to be added, it is hard to say which decade hans will be well > > supported. > > I think this contains several misunderstandings. First, of course > pen/brush and paper are more flexible than character encoding, but > that's true for the Latin script, too. > In latin script, as an example, I can simply name myself "Phake", but in Chinese with current Unicode-based environment, it would not be possible for me to randomly name myself using a character ??? as I would like to. > Second, while I have heard that people create new characters for naming > a baby in a traditional Han context, I haven't heard about this in a > simplified Han context. And it's not frequent at all, the same way > naming a baby John in the US is way more frequent than let's say Qvtwzx. > I'd also assume that China has regulations on what characters can be > used to name a baby, and that the parents in this age of smartphone > communication will think at least twice before giving their baby a name > that they cannot send to their relatives via some chat app. > Traditional character versus simplified characters in this context is just like Fraktur vs Antiqua. The way to write some components have been changed and then there are also orthographical changes that make some characters no longer comprise of same component, but they are still Chinese characters and their usage are still unchanged. I believe there are regulations on naming but that regulations would have be manmade to adopt to the limitations of current computational system. Plus, once in a while I still often heard about news that people are having difficulties in using e.g. train booking system or banking systems due to characters that they are using. (Although in many case those are encoded characters not supported by system) > Third, I cannot confirm or deny the "500 characters a year" limit, but > I'm quite sure that if China (or Hong Kong, Taiwan,...) had a real need > to encode more characters, everybody would find a way to handle these. > Due to the nature of your claims, it's difficult to falsify many of > them. It would be easier to prove them (assuming they were true), so if > you have any supporting evidence, please provide it. > > Regards, Martin. > > > John Knightley > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Mar 5 05:25:17 2018 From: unicode at unicode.org (James Kass via Unicode) Date: Mon, 5 Mar 2018 03:25:17 -0800 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: References: <5A95D192.5050608@unicode.org> <91680448.22170.1519824152519@ox.hosteurope.de> <83722fa3ed05a8b0989a963b3f26833a@koremail.com> <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp> Message-ID: Phake Nick wrote, > In latin script, as an example, I can simply name myself > "Phake", but in Chinese with current Unicode-based environment, > it would not be possible for me to randomly name myself using > a character ??? Isn't that U+246E8? "??" From unicode at unicode.org Mon Mar 5 05:49:47 2018 From: unicode at unicode.org (Phake Nick via Unicode) Date: Mon, 05 Mar 2018 11:49:47 +0000 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: References: <5A95D192.5050608@unicode.org> <91680448.22170.1519824152519@ox.hosteurope.de> <83722fa3ed05a8b0989a963b3f26833a@koremail.com> <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp> Message-ID: ah right that's it. 2018?3?5? 19:25 ? "James Kass" ??? Phake Nick wrote, > In latin script, as an example, I can simply name myself > "Phake", but in Chinese with current Unicode-based environment, > it would not be possible for me to randomly name myself using > a character ??? Isn't that U+246E8? "??" -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Mar 5 06:00:45 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Mon, 5 Mar 2018 13:00:45 +0100 Subject: Emoji as East Asian Width = Wide In-Reply-To: References: Message-ID: I think that fixed-width rendering properties for East-Asian characters was meant only for rendering letters or symbols as plain-text, not for the new rendering with emoji styles. If the symbols are rendered as emojis, these properties don't apply at all, the Emojis style overrides that completely. Note that when characters have both styles (notably the oldest dingbats), there's a variant selector available to select the emoji (EAW ignored) style vs. plain-text style (where EAW is suitable). Characters that have only Emoji styles and no selectors should not have any EAW property (only the default one applicable to all Emojis). 2018-03-05 8:58 GMT+01:00 Oren Watson via Unicode : > EAW is used in fixed-width settings to distinguish characters that should > take up one space versus two. I would also prefer that all these be > considered wide, since otherwise it causes format problems in these > settigns. > (unfortunately fixed-width appear to be largley ignored by unicode... ??) > > On Sun, Mar 4, 2018 at 10:54 PM, fantasai via Unicode > wrote: > >> Why are the new emoji like U+1F600 Grinning Face EAW=Wide >> when other dingbats like U+263A Smiling Face are EAW=Neutral? >> This is making it difficult to have consistent formatting >> across emoticons. Also, emoji aren't really CJK context only >> now, are they. >> >> https://unicode.org/cldr/utility/character.jsp?a=1F600&B1=Show >> https://unicode.org/cldr/utility/character.jsp?a=263A&B1=Show >> >> ~fantasai >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Mar 5 09:13:00 2018 From: unicode at unicode.org (via Unicode) Date: Mon, 05 Mar 2018 23:13:00 +0800 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: References: <5A95D192.5050608@unicode.org> <91680448.22170.1519824152519@ox.hosteurope.de> <83722fa3ed05a8b0989a963b3f26833a@koremail.com> <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp> Message-ID: <4f203cff3a031ab9846afe48b34f87e6@koremail.com> Dear All, to simplify discussion I have split the points. On 05.03.2018 16:57, Phake Nick via Unicode wrote: > ? 2018?3?5??? 13:25?Martin J. D?rst via Unicode > ??? > >> Hello John, >> >> On 2018/03/01 12:31, via Unicode wrote: >> >>>Third, I cannot confirm or deny the "500 characters a year" limit, >>> but >>>I'm quite sure that if China (or Hong Kong, Taiwan,...) had a real >>> need >>>to encode more characters, everybody would find a way to handle >>> these. >>> Due to the nature of your claims, it's difficult to falsify many of >>> them. It would be easier to prove them (assuming they were true), >>> so if >>> you have any supporting evidence, please provide it. Chinese characters for Unicode first go to IRG (or ISO/IEC JTC1/SC2/WG2/IRG) website. The limit of 500 a year for China is an average based on IRG #48 document regarding working set 2017 http://appsrv.cse.cuhk.edu.hk/~irg/irg/irg48/IRGN2220_IRG48Recommends.pdf which explicitly states "each submission shall not exceed 1,000 characters". The People's Republic of China as one member of IRG is limited to 1,000 characters, which hopefully we can all agree has a population of over 1,000,000,000 , therefore was limited to submitting at most 1,000 characters. The earliest possible date for the next working set is two or three years later, that is 2019 or 2020, so that's an average limit of either 500 or 333 characters a year. Regards John >>> Regards, Martin. From unicode at unicode.org Mon Mar 5 09:42:15 2018 From: unicode at unicode.org (via Unicode) Date: Mon, 05 Mar 2018 23:42:15 +0800 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: References: <5A95D192.5050608@unicode.org> <91680448.22170.1519824152519@ox.hosteurope.de> <83722fa3ed05a8b0989a963b3f26833a@koremail.com> <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp> Message-ID: <447c571bad4174b493e4bd42ee7a41f2@koremail.com> Dear All, here is reply to points one and two. On 05.03.2018 16:57, Phake Nick via Unicode wrote: > ? 2018?3?5??? 13:25?Martin J. D?rst via Unicode > ??? > >> Hello John, >> >> On 2018/03/01 12:31, via Unicode wrote: >> >> > Pen, or brush and paper is much more flexible. With thousands of >> names >> > of people and places still not encoded I am not sure if I would >> describe >> > hans (simplified Chinese characters) as well supported. nor with >> current >> > policy which limits China with over one billion people to >> submitting >> > less than 500 Chinese characters a year on average, and names not >> being >> > all to be added, it is hard to say which decade hans will be well >> > supported. >> >> I think this contains several misunderstandings. First, of course >> pen/brush and paper are more flexible than character encoding, but >> thats true for the Latin script, too. > > In latin script, as an example, I can simply name myself "Phake", but > in Chinese with current Unicode-based environment, it would not be > possible for me to randomly name myself using a character? ??? > as I would like to. > >> Second, while I have heard that people create new characters for >> naming >> a baby in a traditional Han context, I havent heard about this in a >> simplified Han context. And its not frequent at all, the same way >> naming a baby John in the US is way more frequent than lets say >> Qvtwzx. >> Id also assume that China has regulations on what characters can be >> used to name a baby, and that the parents in this age of smartphone >> communication will think at least twice before giving their baby a >> name >> that they cannot send to their relatives via some chat app. > In most cases the answer to the above may well be the same, the unencoded names of people and places are not new names, but rather names of places and poeple in use from before Unicode and often before computers. In IRG #48 People's Republic of China http://appsrv.cse.cuhk.edu.hk/~irg/irg/irg48/IRGN2187ChinaActivityReport.pdf that states of over 3,000 names of people and places are under condideration for IRG working set 2017 and at least half require encoding. The document also list other categories of CJK ideographs under consideration for submission to Unicode. Regards John > > > Links:types > ------ > [1] mailto:unicode at unicode.org From unicode at unicode.org Mon Mar 5 11:03:27 2018 From: unicode at unicode.org (suzuki toshiya via Unicode) Date: Tue, 6 Mar 2018 02:03:27 +0900 Subject: [Unicode] Re: Fonts and font sizes used in the Unicode In-Reply-To: <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com> References: <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com> Message-ID: <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp> Hi, I remember, the front page of the code charts by Unicode has following note: Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts. See http://www.unicode.org/charts/fonts.html for a list. -- I have a question; if some people try to make a translated version of Unicode, they should contact all font contributors and ask for the license? Unicode Consortium cannot give any sublicense? If I understand correctly, ISO/IEC JTC1 hold the copyright of the materials used in the published documents of JTC1 standard, because they have to permit the production of the translated version of their standards, the reuse of the content of a spec by another spec, etc. Thus, I guess, it would not be so irrelevant to ask the permission to JTC1, about the fonts used in ISO/IEC 10646 - although it does not mean that JTC1 would permit anything. If I'm misunderstanding, please correct me. Regards, mpsuzuki On 3/5/2018 4:49 AM, Asmus Freytag via Unicode wrote: > On 3/4/2018 9:12 AM, Markus Scherer via Unicode wrote: > On Sun, Mar 4, 2018 at 6:10 AM, Helena Miton via Unicode > wrote: > Greetings. Is there a way to know which font and font size have been used in the Unicode charts (for various writing systems)? Many thanks! > > What are you trying to do? > > Many of the fonts are unique to the Unicode chart production, and are not licensed for other uses. Some are not even generally usable. > > markus > > The editors of the Unicode charts will use any font resource that gets the job done (that is, results in a chart that correctly displays the characters in the standard). These fonts are often not production fonts, and may lack any of the many tables needed to actually display running text. They may also, as has been mentioned, be licensed solely for the purpose of publishing the standard. In some cases, they are custom built. > > For most scripts, the font size is nominally set to 22pt in the main code charts, but the tool that the editors use allow a different size to be selected for any range of code points, or individual characters. There are some examples where a character is very wide or tall where it had to be scaled down individually to fit the cell. > > The purpose of the code charts is *exclusively* that of helping users of the standard identify which character is encoded at what code position. They are not intended as a font resource or normative description of the glyphs. Any usage scenario that is outside the very narrow scope is unsupported and reverse engineering / extracting font resources is explicitly in violation of the terms of use. > > A./ > From unicode at unicode.org Mon Mar 5 11:39:41 2018 From: unicode at unicode.org (Markus Scherer via Unicode) Date: Mon, 5 Mar 2018 09:39:41 -0800 Subject: [Unicode] Re: Fonts and font sizes used in the Unicode In-Reply-To: <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp> References: <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com> <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp> Message-ID: On Mon, Mar 5, 2018 at 9:03 AM, suzuki toshiya via Unicode < unicode at unicode.org> wrote: > I have a question; if some people try to make a > translated version of Unicode, they should contact > all font contributors and ask for the license? > Unicode Consortium cannot give any sublicense? > If you want to translate the Unicode Standard or its companion standards (UAX, UTS, ...), then please contact the Unicode Consortium. Thus, I guess, it would not be so irrelevant to ask > the permission to JTC1, about the fonts used in > ISO/IEC 10646 - although it does not mean that > JTC1 would permit anything. If I'm misunderstanding, > please correct me. > The production of the ISO 10646 standard is done by the Unicode Consortium. I am fuzzy on what exactly that means for copyright. If you need to find out, then please contact the consortium. markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Mar 5 11:40:46 2018 From: unicode at unicode.org (Ken Whistler via Unicode) Date: Mon, 5 Mar 2018 09:40:46 -0800 Subject: CJK Ideograph Encoding Velocity (was: Re: Unicode Emoji 11.0 characters now ready for adoption!) In-Reply-To: <4f203cff3a031ab9846afe48b34f87e6@koremail.com> References: <5A95D192.5050608@unicode.org> <91680448.22170.1519824152519@ox.hosteurope.de> <83722fa3ed05a8b0989a963b3f26833a@koremail.com> <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp> <4f203cff3a031ab9846afe48b34f87e6@koremail.com> Message-ID: John, I think this may be giving the list a somewhat misleading picture of the actual statistics for encoding of CJK unified ideographs. The "500 characters a year" or "1000 characters a year" limits are administrative limits set by the IRG for national bodies (and others) submitting repertoire to the "working set" that the IRG then segments into chunks for processing to prepare new increments for actual encoding. In point of fact, if we take 1991 as the base year, the *average* rate of encoding new CJK unified ideographs now stands at 3379 per annum (87,860 as of Unicode 10.0). By "encoding" here, I mean, final, finished publication of the encoded characters -- not the larger number of potentially unifiable submissions that eventually go into a publication increment. There is a gradual downward drift in that number over time, because of the impact on the stats of the "big bang" encoding of 42,711 ideographs for Extension B back in 2001, but recently, the numbers have been quite consistent with an average incremental rate of about 3000 new ideographs per year: 5762 added for Extension E in 2015 7463 added for Extension F in 2017 ~ 4934 to be added for Extension G, probably to be published in 2020 If you run the average calculation including Extension G, assuming 2020, you end up with a cumulative per annum rate of 3200, not much different than the calculation done as of today. And as for the implication that China, in particular, is somehow limited by these numbers, one should note that the vast majority of Extension G is associated with Chinese sources. Although a substantial chunk is formally labeled with a "UK" source this time around, almost all of those characters represent a roll-in of systematic simplifications, of various sorts, associated with PRC usage. (People who want to check can take a look at L2/17-366R in the UTC document registry.) --Ken On 3/5/2018 7:13 AM, via Unicode wrote: > Dear All, > > to simplify discussion I have split the points. >> >>> >>> >>> On 2018/03/01 12:31, via Unicode wrote: >>> >>>> Third, I cannot confirm or deny the "500 characters a year" limit, but >>>> I'm quite sure that if China (or Hong Kong, Taiwan,...) had a real >>>> need >>>> to encode more characters, everybody would find a way to handle these. > > > Chinese characters for Unicode first go to IRG (or ISO/IEC > JTC1/SC2/WG2/IRG) website. The limit of 500 a year for China is an > average based on IRG #48 document regarding working set 2017 > http://appsrv.cse.cuhk.edu.hk/~irg/irg/irg48/IRGN2220_IRG48Recommends.pdf > which explicitly states "each submission shall not exceed 1,000 > characters". The People's Republic of China as one member of IRG is > limited to 1,000 characters, which hopefully we can all agree has a > population of over 1,000,000,000 , therefore was limited to submitting > at most 1,000 characters. The earliest possible date for the next > working set is two or three years later, that is 2019 or 2020, so > that's an average limit of either 500 or 333 characters a year. > > Regards > John > > > > From unicode at unicode.org Mon Mar 5 11:49:17 2018 From: unicode at unicode.org (Asmus Freytag via Unicode) Date: Mon, 5 Mar 2018 09:49:17 -0800 Subject: [Unicode] Re: Fonts and font sizes used in the Unicode In-Reply-To: <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp> References: <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com> <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp> Message-ID: An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Mar 5 12:21:23 2018 From: unicode at unicode.org (Ken Whistler via Unicode) Date: Mon, 5 Mar 2018 10:21:23 -0800 Subject: Translating the standard (was: Re: Fonts and font sizes used in the Unicode) In-Reply-To: <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp> References: <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com> <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp> Message-ID: <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net> On 3/5/2018 9:03 AM, suzuki toshiya via Unicode wrote: > I have a question; if some people try to make a > translated version of Unicode And to add to Asmus' response, folks on the list should understand that even with the best of effort, the concept of a "translated version of Unicode" is a near impossibility. In fairly recent times, two serious efforts to translate *just *the core specification -- one in Japanese, and a somewhat later attempt for Chinese -- crashed and burned, for a variety of reasons. The core specification is huge, contains a lot of very specific technical terminology that is difficult to translate, along with a large collection of script- and language-specific detail, also hard to translate. Worse, it keeps changing, with updates now coming out once every year. Some large parts are stable, but it is impossible to predict what sections might be impacted by the next year's encoding decisions. That is not including that fact that "the Unicode Standard" now also includes 14 separate HTML (or XHTML) annexes, all of which are also moving targets, along with the UCD data files, which often contain important information in their headers that would also require translation. And then, of course, there are the 2000+ pages of the formatted code charts, which require highly specific and very complicated custom tooling and font usage to produce. It would require a dedicated (and expensive) small army of translators, terminologists, editors, programmers, font designers, and project managers to replicate all of this into another language publication -- and then they would have to do it again the next year, and again the next year, in perpetuity. Basically, given the current situation, it would be a fool's errand, more likely to introduce errors and inconsistencies than to help anybody with actual implementation. People who want accessibility to the Unicode Standard in other languages need to scale down their expectations considerably, and focus on preparing reasonably short and succinct introductions to the terminology and complexity involved in the full standard. Such projects are feasible. But a full translation of "the Unicode Standard" simply is not. --Ken -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Mar 5 13:19:47 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Mon, 5 Mar 2018 20:19:47 +0100 Subject: Translating the standard (was: Re: Fonts and font sizes used in the Unicode) In-Reply-To: <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net> References: <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com> <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp> <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net> Message-ID: There's been significant efforts to "translate" or more precisely "adapt" significant parts of the standard with good presentations in Wikipedia and various sites for scoped topics. So there are alternate charts, and instead of translating all, the concepts are summarized, reexplained, but still give links to the original version in English everytime more info is needed. All UCD files don't need to be translated, they can also be automatically processed to generate alternate presentations or datatables in other formats. There's no value in taking efforts to translate them manually, it's better to develop a tool that will process them in the format users can read. So remove the UCD files and the tables from the count, as well as sample code (which is jsut demontrative and uses simplified non optimal implementation to keep this code clear). We an now have separate tools or websites presenting them and proposing commented code which is also better performing. We have large collections of i18n libraries that were developed for various development platforms and usage documentation in various languages. The only efforts is in: * naming characters (Wikipedia is great to distribute the effort and have articles showing relevant collections of characters and document alternate names or disambiguate synonyms). * the core text of the standard (section 3 about conformance and requirements is the first thing to adapt). There's absolutely no need however to do that as a pure translation, it can be rewritten and presented with the goals wanted by users. Here again Wikiepdia has done significant efforts there, in various languages * keeping the tools developed in the previous paragraph in sync and conformity with the standard (sync the UCD files they use). 2018-03-05 19:21 GMT+01:00 Ken Whistler via Unicode : > > On 3/5/2018 9:03 AM, suzuki toshiya via Unicode wrote: > > I have a question; if some people try to make a > translated version of Unicode > > > And to add to Asmus' response, folks on the list should understand that > even with the best of effort, the concept of a "translated version of > Unicode" is a near impossibility. In fairly recent times, two serious > efforts to translate *just *the core specification -- one in Japanese, > and a somewhat later attempt for Chinese -- crashed and burned, for a > variety of reasons. The core specification is huge, contains a lot of very > specific technical terminology that is difficult to translate, along with a > large collection of script- and language-specific detail, also hard to > translate. Worse, it keeps changing, with updates now coming out once every > year. Some large parts are stable, but it is impossible to predict what > sections might be impacted by the next year's encoding decisions. > > That is not including that fact that "the Unicode Standard" now also > includes 14 separate HTML (or XHTML) annexes, all of which are also moving > targets, along with the UCD data files, which often contain important > information in their headers that would also require translation. And then, > of course, there are the 2000+ pages of the formatted code charts, which > require highly specific and very complicated custom tooling and font usage > to produce. > > It would require a dedicated (and expensive) small army of translators, > terminologists, editors, programmers, font designers, and project managers > to replicate all of this into another language publication -- and then they > would have to do it again the next year, and again the next year, in > perpetuity. Basically, given the current situation, it would be a fool's > errand, more likely to introduce errors and inconsistencies than to help > anybody with actual implementation. > > People who want accessibility to the Unicode Standard in other languages > need to scale down their expectations considerably, and focus on preparing > reasonably short and succinct introductions to the terminology and > complexity involved in the full standard. Such projects are feasible. But a > full translation of "the Unicode Standard" simply is not. > > --Ken > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Mar 6 02:09:49 2018 From: unicode at unicode.org (via Unicode) Date: Tue, 06 Mar 2018 16:09:49 +0800 Subject: CJK Ideograph Encoding Velocity (was: Re: Unicode Emoji 11.0 characters now ready for adoption!) In-Reply-To: References: <5A95D192.5050608@unicode.org> <91680448.22170.1519824152519@ox.hosteurope.de> <83722fa3ed05a8b0989a963b3f26833a@koremail.com> <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp> <4f203cff3a031ab9846afe48b34f87e6@koremail.com> Message-ID: <506f0e2407f87dfe5159a7a66da44e3d@koremail.com> Dear Ken, the context of the question was how many characters in modern use are being encoded. Part of the answer is that there are several thousand Chinese characters that are names of people on places to be encoded. The limit of 1,000 characters a working set per member was for workings set 2017, this is a new thing. If the same member limit is applied to future working sets, then the result will be that some of these characters identified in 2017. Some around 500 have been included in working set 2017. Some will be included in the following working set which will most likely be in 2020 and if there is then also a limit of 1,000 characters per member then not all would be included. That would mean some would have to wait until 2022 before they can be submitted to IRG, which means at least 2027 before they are encoded. Names of pleople and places are not the only CJK unified ideographs that need to be encoded but they illustrate the problem that if future working have a 1,000 limit per member which submissions every 2 or 3 years, then it delay the encoding on CJK unified ideographs by years. On 06.03.2018 01:40, Ken Whistler via Unicode wrote: > John, > > I think this may be giving the list a somewhat misleading picture of > the actual statistics for encoding of CJK unified ideographs. The > "500 > characters a year" or "1000 characters a year" limits are > administrative limits set by the IRG for national bodies (and others) > submitting repertoire to the "working set" that the IRG then segments > into chunks for processing to prepare new increments for actual > encoding. > Here I was refering to the number of CJK unified ideogrpahs that the People's Republic of China can submit to IRG, the numbers are of course different for CJK unified ideographs as a whole. A limit of 1,000 a working set means that the number of CJK unified ideographs in the People's Republic of China awaiting submission to IRG is most likely to increase not decreases for decades to come. For other IRG members that still have characters to submit a limit of 1,000 a working set most likely leads to a decrease in the number of CJK unified ideographs awaiting submission over time. In short the administrative limit of 1,000 works to a degree for most IRG members, but not for the People's Republic of China. > In point of fact, if we take 1991 as the base year, the *average* > rate of encoding new CJK unified ideographs now stands at 3379 per > annum (87,860 as of Unicode 10.0). By "encoding" here, I mean, final, > finished publication of the encoded characters -- not the larger > number of potentially unifiable submissions that eventually go into a > publication increment. There is a gradual downward drift in that > number over time, because of the impact on the stats of the "big > bang" > encoding of 42,711 ideographs for Extension B back in 2001, but > recently, the numbers have been quite consistent with an average > incremental rate of about 3000 new ideographs per year: > 1991 to 2001 70,207 that is around seven thousand a year. However 2002 to 2018 only 17,675 so around one thousand a year > 5762 added for Extension E in 2015 > These 5762 were submitted to IRG in 2001, so 14 years from submission to encoding. > 7463 added for Extension F in 2017 > > ~ 4934 to be added for Extension G, probably to be published in 2020 > > If you run the average calculation including Extension G, assuming > 2020, you end up with a cumulative per annum rate of 3200, not much > different than the calculation done as of today. > > And as for the implication that China, in particular, is somehow > limited by these numbers, one should note that the vast majority of > Extension G is associated with Chinese sources. Although a > substantial > chunk is formally labeled with a "UK" source this time around, almost > all of those characters represent a roll-in of systematic > simplifications, of various sorts, associated with PRC usage. (People > who want to check can take a look at L2/17-366R in the UTC document > registry.) > Extension G was before the 1,000 character per memeber limit. Whatever the UK characters submitted were, the largest single Chinese source was in fact over one thousand Zhuang characters submitted by People's Republic of Chhina not "systematic simplifications". It would certainly be incorrect to think that the vaste majority of CJK unified ideographs to be encoded are "systematic simplifications". Regards John > --Ken > > > On 3/5/2018 7:13 AM, via Unicode wrote: >> Dear All, >> >> to simplify discussion I have split the points. > [1] > >> >>> >>>> >>>> >>>> On 2018/03/01 12:31, via Unicode wrote: >>>> >>>>> Third, I cannot confirm or deny the "500 characters a year" >>>>> limit, but >>>>> I'm quite sure that if China (or Hong Kong, Taiwan,...) had a >>>>> real need >>>>> to encode more characters, everybody would find a way to handle >>>>> these. >> >> >> Chinese characters for Unicode first go to IRG (or ISO/IEC >> JTC1/SC2/WG2/IRG) website. The limit of 500 a year for China is an >> average based on IRG #48 document regarding working set 2017 >> http://appsrv.cse.cuhk.edu.hk/~irg/irg/irg48/IRGN2220_IRG48Recommends.pdf >> which explicitly states "each submission shall not exceed 1,000 >> characters". The People's Republic of China as one member of IRG is >> limited to 1,000 characters, which hopefully we can all agree has a >> population of over 1,000,000,000 , therefore was limited to submitting >> at most 1,000 characters. The earliest possible date for the next >> working set is two or three years later, that is 2019 or 2020, so >> that's an average limit of either 500 or 333 characters a year. >> >> Regards >> John >> >> >> >> From unicode at unicode.org Tue Mar 6 14:52:30 2018 From: unicode at unicode.org (=?utf-8?B?IkouwqBTLiBDaG9pIg==?= via Unicode) Date: Tue, 06 Mar 2018 12:52:30 -0800 Subject: New default emoji presentation in CSS: non-conformance with UTR 51 by web browsers Message-ID: <20EA0846-0690-475A-A8B8-B79E90E1A833@icloud.com> The W3C CSS Working Group is continuing to work on standardizing the default emoji presentation in perhaps the most ubiquitous application of Unicode today, the world wide web. Some recent logs: https://github.com/w3c/csswg-drafts/commit/7a5e0d702b00f8d3df5f2b43c9c65d1c2a2284f6 https://github.com/w3c/csswg-drafts/issues/2304#issuecomment-369323232 Current draft at https://drafts.csswg.org/css-fonts-4/#font-variant-emoji-desc Currently, the CSS draft specifies three values for emoji that a web author may use to style their content: auto, text, and emoji. The auto value (which is the default) leaves emoji presentation to the discretion of the web browser and system platform itself, rather than conforming strictly to UTR 51. https://github.com/w3c/csswg-drafts/issues/1223 proposes that a strict If the authors or experts of UTR 51 believe that the Emoji_presentation property is useful, then they may want to chime in at Issue w3c/csswg-drafts#1223 with their expertise. My opinion is that standardizing the default presentation is important enough to strictly conform to UTR 51. Breakage has already occurred in the past, such as when WebKit in 2015 unexpectedly switched the default presentation of U+21A9 LEFTWARDS ARROW WITH HOOK ??? from text to emoji, which unexpectedly broke existing websites such as Daring Fireball (see https://daringfireball.net/linked/2015/04/22/unicode-emoji and also http://mts.io/2015/04/21/unicode-symbol-render-text-emoji/ ). See also https://www.unicode.org/mail-arch/unicode-ml/y2018-m01/0016.html and https://github.com/w3c/csswg-drafts/issues/2138 . To give an update on this issue: The CSS WG recently resolved to make all web browsers completely ignore BCP47?s -u- extension. If the authors/experts of the BCP47 extension believe that the extension is at all useful, they still may wish to chime in at https://github.com/w3c/csswg-drafts/issues/2138 , but https://drafts.csswg.org/css-fonts-4/#font-variant-emoji-desc has now been updated to specify the ignoring of the BCP47 extension. Cheers, J. S. Choi -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Mar 6 14:57:21 2018 From: unicode at unicode.org (=?utf-8?B?IkouwqBTLiBDaG9pIg==?= via Unicode) Date: Tue, 06 Mar 2018 12:57:21 -0800 Subject: New default emoji presentation in CSS: non-conformance with UTR 51 by web browsers In-Reply-To: <20EA0846-0690-475A-A8B8-B79E90E1A833@icloud.com> References: <20EA0846-0690-475A-A8B8-B79E90E1A833@icloud.com> Message-ID: <34A34966-58C6-4DAC-8CD5-2482D8943161@icloud.com> Apologies for the duplicate threads; I accidentally sent the email as rich text. Here?s a version without the duplicate links. > On Mar 6, 2018, at 12:52 PM, J. S. Choi via Unicode wrote: > > The W3C CSS Working Group is continuing to work on standardizing the default emoji presentation in perhaps the most ubiquitous application of Unicode today, the world wide web. Some recent logs: > > https://github.com/w3c/csswg-drafts/commit/7a5e0d702b00f8d3df5f2b43c9c65d1c2a2284f6 > https://github.com/w3c/csswg-drafts/issues/2304#issuecomment-369323232 > Current draft at https://drafts.csswg.org/css-fonts-4/#font-variant-emoji-desc > > Currently, the CSS draft specifies three values for emoji that a web author may use to style their content: auto, text, and emoji. The auto value (which is the default) leaves emoji presentation to the discretion of the web browser and system platform itself, rather than conforming strictly to UTR 51. https://github.com/w3c/csswg-drafts/issues/1223 proposes that a strict > > If the authors or experts of UTR 51 believe that the Emoji_presentation property is useful, then they may want to chime in at Issue w3c/csswg-drafts#1223 with their expertise. My opinion is that standardizing the default presentation is important enough to strictly conform to UTR 51. Breakage has already occurred in the past, such as when WebKit in 2015 unexpectedly switched the default presentation of U+21A9 LEFTWARDS ARROW WITH HOOK ??? from text to emoji, which unexpectedly broke existing websites such as Daring Fireball (see https://daringfireball.net/linked/2015/04/22/unicode-emoji and also http://mts.io/2015/04/21/unicode-symbol-render-text-emoji/). > > See also https://www.unicode.org/mail-arch/unicode-ml/y2018-m01/0016.html and https://github.com/w3c/csswg-drafts/issues/2138. To give an update on this issue: The CSS WG recently resolved to make all web browsers completely ignore BCP47?s -u- extension. If the authors/experts of the BCP47 extension believe that the extension is at all useful, they still may wish to chime in at https://github.com/w3c/csswg-drafts/issues/2138, but https://drafts.csswg.org/css-fonts-4/#font-variant-emoji-desc has now been updated to specify the ignoring of the BCP47 extension. > > Cheers, > J. S. Choi From unicode at unicode.org Wed Mar 7 14:26:21 2018 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Wed, 7 Mar 2018 20:26:21 +0000 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: <447c571bad4174b493e4bd42ee7a41f2@koremail.com> References: <5A95D192.5050608@unicode.org> <91680448.22170.1519824152519@ox.hosteurope.de> <83722fa3ed05a8b0989a963b3f26833a@koremail.com> <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp> <447c571bad4174b493e4bd42ee7a41f2@koremail.com> Message-ID: <20180307202621.770d1099@JRWUBU2> On Mon, 05 Mar 2018 23:42:15 +0800 via Unicode wrote: > In most cases the answer to the above may well be the same, the > unencoded names of people and places are not new names, How many new characters are being devised per year? Richard. From unicode at unicode.org Wed Mar 7 15:12:41 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Wed, 7 Mar 2018 22:12:41 +0100 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: <20180307202621.770d1099@JRWUBU2> References: <5A95D192.5050608@unicode.org> <91680448.22170.1519824152519@ox.hosteurope.de> <83722fa3ed05a8b0989a963b3f26833a@koremail.com> <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp> <447c571bad4174b493e4bd42ee7a41f2@koremail.com> <20180307202621.770d1099@JRWUBU2> Message-ID: So most of the growth in Han characters is caused by people inventing and registering new sinograms for their own names, using the basic principles of combining a phonogram and a distinctive semantic character. It's like if we were encoding in the UCS the personal handwritten signatures with our own choice. Are these worth encoding ? Why can't we just encode most of them as a sequence (phonogram, ideogram, and combining layout character) i.e. mostly what IDS provide, except that they are descriptive but suited for the same purpose. Why can't those IDS be rendered as ligatures and then have those "characters" being in fact ligatured IDS strings ? Shouldn't the IRG better work on providing a disctionary of IDS strings needed for people names, then allowing font providers in China to render them as ligatures (the "representative glyph" of these ligatures would be the official Chinese personal record for such use, and it would be enough for the chinese administration). After all this is what we are already doing by encoding in Unicode various emoji sequences (then rendered as ligatures in a much more fuzzy way !)... Shouldn't we create a variant of IDS, using combining joiners between Han base glyphs (then possibly augmented by variant selectors if there are significant differences on the simplification of rendered strokes for each component) ? What is really limiting us to do that ? 2018-03-07 21:26 GMT+01:00 Richard Wordingham via Unicode < unicode at unicode.org>: > On Mon, 05 Mar 2018 23:42:15 +0800 > via Unicode wrote: > > > In most cases the answer to the above may well be the same, the > > unencoded names of people and places are not new names, > > How many new characters are being devised per year? > > Richard. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Mar 7 15:35:42 2018 From: unicode at unicode.org (Ken Whistler via Unicode) Date: Wed, 7 Mar 2018 13:35:42 -0800 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: References: <5A95D192.5050608@unicode.org> <91680448.22170.1519824152519@ox.hosteurope.de> <83722fa3ed05a8b0989a963b3f26833a@koremail.com> <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp> <447c571bad4174b493e4bd42ee7a41f2@koremail.com> <20180307202621.770d1099@JRWUBU2> Message-ID: <6052a482-a4a3-bc4d-379f-137a8e4b8891@att.net> On 3/7/2018 1:12 PM, Philippe Verdy via Unicode wrote: > Shouldn't we create a variant of IDS, using combining joiners between > Han base glyphs (then possibly augmented by variant selectors if there > are significant differences on the simplification of rendered strokes > for each component) ? What is really limiting us to do that ? > Ummm.... ambiguity, lack of precision, complexity of model, pushback by stakeholders, likely failure of uptake by most implementers, duplication of representation, ... Do you think combining models of Han weren't already thought of years ago? They predated the original encoding of unified CJK in Unicode in 1992. They weren't viable then, and they aren't viable now, either, after 26 years of Unicode implementation of unified CJK as atomic ideographs. --Ken From unicode at unicode.org Wed Mar 7 16:04:21 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Wed, 7 Mar 2018 23:04:21 +0100 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: <6052a482-a4a3-bc4d-379f-137a8e4b8891@att.net> References: <5A95D192.5050608@unicode.org> <91680448.22170.1519824152519@ox.hosteurope.de> <83722fa3ed05a8b0989a963b3f26833a@koremail.com> <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp> <447c571bad4174b493e4bd42ee7a41f2@koremail.com> <20180307202621.770d1099@JRWUBU2> <6052a482-a4a3-bc4d-379f-137a8e4b8891@att.net> Message-ID: I'm just speaking about the many yearly inventions of sinograms for personal/proper names, not about the ues of traditional characters for normal language. People just start by assembling components with common rules. Then they enhance the produced character just like we personalize signatures. But for me, all these look like personal signatures and are not neede for formal encoding and even these persons will accept alternate presentations if it's just to cite them (and would not like much that you imitate their personal signature by standardizing it in a worldwide standard: I think many of these encodings have severe privacy issues, possibly as well copyright issues !). 2018-03-07 22:35 GMT+01:00 Ken Whistler : > > > On 3/7/2018 1:12 PM, Philippe Verdy via Unicode wrote: > >> Shouldn't we create a variant of IDS, using combining joiners between Han >> base glyphs (then possibly augmented by variant selectors if there are >> significant differences on the simplification of rendered strokes for each >> component) ? What is really limiting us to do that ? >> >> > Ummm.... ambiguity, lack of precision, complexity of model, pushback by > stakeholders, likely failure of uptake by most implementers, duplication of > representation, ... > > Do you think combining models of Han weren't already thought of years ago? > They predated the original encoding of unified CJK in Unicode in 1992. They > weren't viable then, and they aren't viable now, either, after 26 years of > Unicode implementation of unified CJK as atomic ideographs. > > --Ken > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Mar 7 16:13:31 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Wed, 7 Mar 2018 23:13:31 +0100 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: References: <5A95D192.5050608@unicode.org> <91680448.22170.1519824152519@ox.hosteurope.de> <83722fa3ed05a8b0989a963b3f26833a@koremail.com> <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp> <447c571bad4174b493e4bd42ee7a41f2@koremail.com> <20180307202621.770d1099@JRWUBU2> <6052a482-a4a3-bc4d-379f-137a8e4b8891@att.net> Message-ID: Note: I don't advocate "duplicate encoding" as you think. But probably the current IDS model is not sufficient to describe characters correctly, and that it may be augmented a bit (using variant codes or some additional joiners or diacritics?). But IDS strings are suitable for rendering as ligatures and this should be permitted, and should even be the standard way to represent personal names without making them depend on an unproved single distinctive presentation. E.g. someone writes his name with some personal strokes and uses it as its registered "signature"; he is then doing business or is cited in news with simplified presentation, and the Chinese authorities also use their own simplications. All these will designate the same person. But who is correct for the presentation of the character ? In my opinion it is only the person that invented it for themselve, as a personal signature, but this is not suitable for encoding (privacy and copyright issue). All the other presentation are legitimate, and we don't need additional encoding for it: the ligaturing of IDS strings is sufficient even if it does not match exactly the person's signature. 2018-03-07 23:04 GMT+01:00 Philippe Verdy : > I'm just speaking about the many yearly inventions of sinograms for > personal/proper names, not about the ues of traditional characters for > normal language. > > People just start by assembling components with common rules. Then they > enhance the produced character just like we personalize signatures. But for > me, all these look like personal signatures and are not neede for formal > encoding and even these persons will accept alternate presentations if it's > just to cite them (and would not like much that you imitate their personal > signature by standardizing it in a worldwide standard: I think many of > these encodings have severe privacy issues, possibly as well copyright > issues !). > > > 2018-03-07 22:35 GMT+01:00 Ken Whistler : > >> >> >> On 3/7/2018 1:12 PM, Philippe Verdy via Unicode wrote: >> >>> Shouldn't we create a variant of IDS, using combining joiners between >>> Han base glyphs (then possibly augmented by variant selectors if there are >>> significant differences on the simplification of rendered strokes for each >>> component) ? What is really limiting us to do that ? >>> >>> >> Ummm.... ambiguity, lack of precision, complexity of model, pushback by >> stakeholders, likely failure of uptake by most implementers, duplication of >> representation, ... >> >> Do you think combining models of Han weren't already thought of years >> ago? They predated the original encoding of unified CJK in Unicode in 1992. >> They weren't viable then, and they aren't viable now, either, after 26 >> years of Unicode implementation of unified CJK as atomic ideographs. >> >> --Ken >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Mar 7 16:18:01 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Wed, 7 Mar 2018 23:18:01 +0100 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: References: <5A95D192.5050608@unicode.org> <91680448.22170.1519824152519@ox.hosteurope.de> <83722fa3ed05a8b0989a963b3f26833a@koremail.com> <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp> <447c571bad4174b493e4bd42ee7a41f2@koremail.com> <20180307202621.770d1099@JRWUBU2> <6052a482-a4a3-bc4d-379f-137a8e4b8891@att.net> Message-ID: Additional note: the UCS will never large enough to support the personal signatures of billions Chinese people living today or born since milleniums, or jsut those to be born in the next century. There's a need to represent these names using composed strings. A reasonable compositing/ligaturing process can then present almost all of them ! 2018-03-07 23:13 GMT+01:00 Philippe Verdy : > Note: I don't advocate "duplicate encoding" as you think. But probably the > current IDS model is not sufficient to describe characters correctly, and > that it may be augmented a bit (using variant codes or some additional > joiners or diacritics?). > > But IDS strings are suitable for rendering as ligatures and this should be > permitted, and should even be the standard way to represent personal names > without making them depend on an unproved single distinctive presentation. > > E.g. someone writes his name with some personal strokes and uses it as its > registered "signature"; he is then doing business or is cited in news with > simplified presentation, and the Chinese authorities also use their own > simplications. All these will designate the same person. But who is correct > for the presentation of the character ? In my opinion it is only the person > that invented it for themselve, as a personal signature, but this is not > suitable for encoding (privacy and copyright issue). All the other > presentation are legitimate, and we don't need additional encoding for it: > the ligaturing of IDS strings is sufficient even if it does not match > exactly the person's signature. > > > 2018-03-07 23:04 GMT+01:00 Philippe Verdy : > >> I'm just speaking about the many yearly inventions of sinograms for >> personal/proper names, not about the ues of traditional characters for >> normal language. >> >> People just start by assembling components with common rules. Then they >> enhance the produced character just like we personalize signatures. But for >> me, all these look like personal signatures and are not neede for formal >> encoding and even these persons will accept alternate presentations if it's >> just to cite them (and would not like much that you imitate their personal >> signature by standardizing it in a worldwide standard: I think many of >> these encodings have severe privacy issues, possibly as well copyright >> issues !). >> >> >> 2018-03-07 22:35 GMT+01:00 Ken Whistler : >> >>> >>> >>> On 3/7/2018 1:12 PM, Philippe Verdy via Unicode wrote: >>> >>>> Shouldn't we create a variant of IDS, using combining joiners between >>>> Han base glyphs (then possibly augmented by variant selectors if there are >>>> significant differences on the simplification of rendered strokes for each >>>> component) ? What is really limiting us to do that ? >>>> >>>> >>> Ummm.... ambiguity, lack of precision, complexity of model, pushback by >>> stakeholders, likely failure of uptake by most implementers, duplication of >>> representation, ... >>> >>> Do you think combining models of Han weren't already thought of years >>> ago? They predated the original encoding of unified CJK in Unicode in 1992. >>> They weren't viable then, and they aren't viable now, either, after 26 >>> years of Unicode implementation of unified CJK as atomic ideographs. >>> >>> --Ken >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Mar 7 17:32:02 2018 From: unicode at unicode.org (Andrew West via Unicode) Date: Wed, 7 Mar 2018 23:32:02 +0000 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: References: <5A95D192.5050608@unicode.org> <91680448.22170.1519824152519@ox.hosteurope.de> <83722fa3ed05a8b0989a963b3f26833a@koremail.com> <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp> <447c571bad4174b493e4bd42ee7a41f2@koremail.com> <20180307202621.770d1099@JRWUBU2> <6052a482-a4a3-bc4d-379f-137a8e4b8891@att.net> Message-ID: On 7 March 2018 at 22:18, Philippe Verdy via Unicode wrote: > > Additional note: the UCS will never large enough to support the personal > signatures of billions Chinese people living today or born since milleniums, > or jsut those to be born in the next century. There's a need to represent > these names using composed strings. A reasonable compositing/ligaturing > process can then present almost all of them ! CJK characters invented for writing personal names are extremely rare, and do not constitute a significant fraction of CJK ideographs proposed for encoding. The majority of unencoded modern-use characters in China (that are not systematic simplified forms of existing encoded characters) are used in place names or in Chinese dialects or for writing non-Chinese languages such as Zhuang. Andrew From unicode at unicode.org Wed Mar 7 19:27:06 2018 From: unicode at unicode.org (Marcel Schneider via Unicode) Date: Thu, 8 Mar 2018 02:27:06 +0100 (CET) Subject: Translating the standard (was: Re: Fonts and font sizes used in the Unicode) In-Reply-To: References: <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com> <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp> <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net> Message-ID: <944626039.25442.1520472427152.JavaMail.www@wwinf1m17> On?Mon, 5 Mar 2018 20:19:47 +0100, Philippe Verdy via Unicode wrote: ? > There's been significant efforts to "translate" or more precisely "adapt" > significant parts of the standard with good presentations in Wikipedia and > various sites for scoped topics. So there are alternate charts, and instead > of translating all, the concepts are summarized, reexplained, but still > give links to the original version in English everytime more info is needed. Indeed one of the best uses we can make of efforts in Unicode education is in extending and improving the Wikipedia coverage, because this is the first place almost everybody is going to. So if a government is considering an investment, donating to Wikimedia and motivating a vast community seems a really good plan. And hiring staffers for this purpose will increase reliability of the data (given that some corporations misuse the infrastructure for PR). > All UCD files don't need to be translated, they can also be automatically > processed to generate alternate presentations or datatables in other > formats. There's no value in taking efforts to translate them manually, > it's better to develop a tool that will process them in the format users > can read. The only UCD file I?d advise to fully translate is the Nameslist as being the source code of the Code Charts. These are indeed indispensable because of the glyphic information they convey, that can be found nowhere else, Hence all good secondary sources like Wikipedia link to the Unicode Charts, The NamesList per se is useful also in that it provides a minimal amount of information about the characters. But it lacks important hints about bidi?mirroring, that should be compiled from yet another UCD file. The downside of generating a holistic view is that it generally ends up in an atomic view as on a per?character basis. Though anyway it?s up to the user to gather an overview tailored for his/her needs. This is catered for by Chinese and Japanese versions of sites such as www.fileformat.info. [?] > The only efforts is in: > * naming characters (Wikipedia is great to distribute the effort and have > articles showing relevant collections of characters and document alternate > names or disambiguate synonyms). Naming characters is a real challenge and is often running into multiple issues. First we need to make clear for who the localization is intended: technical people or UIs. It happened that a literal translation tuned in accordance with specialists was then handed out to the industry for showing up on everyone?s computer, while some core characters of the intended locale are named differently in real life, so that students don?t encounter what they have learned at school. And the worst thing is that once a translation is released, image considerations lead to seek stability even where no Unicode (ISO) policy is preventing updates. > * the core text of the standard (section 3 about conformance and > requirements is the first thing to adapt). There's absolutely no need > however to do that as a pure translation, it can be rewritten and presented > with the goals wanted by users. Here again Wikiepdia has done significant > efforts there, in various languages > * keeping the tools developed in the previous paragraph in sync and > conformity with the standard (sync the UCD files they use). ? Yes the biggest issue over time, as Ken wrote, is to *maintain* a translation, be it only the Nameslist. Marcel From unicode at unicode.org Wed Mar 7 19:42:38 2018 From: unicode at unicode.org (via Unicode) Date: Thu, 08 Mar 2018 09:42:38 +0800 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: <20180307202621.770d1099@JRWUBU2> References: <5A95D192.5050608@unicode.org> <91680448.22170.1519824152519@ox.hosteurope.de> <83722fa3ed05a8b0989a963b3f26833a@koremail.com> <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp> <447c571bad4174b493e4bd42ee7a41f2@koremail.com> <20180307202621.770d1099@JRWUBU2> Message-ID: <51c12b4974b1cc0d2f476e29db47b99d@koremail.com> Dear Richard, to the best of my knowledge virtually no new characters used just for names are under consideration, all the ones that are under consideration are from before this century. Some are only being submitted now, but that does not mean they are new in real life, just new to Unicode. Place names tend to be even older. Regards John On 08.03.2018 04:26, Richard Wordingham via Unicode wrote: > On Mon, 05 Mar 2018 23:42:15 +0800 > via Unicode wrote: > >> In most cases the answer to the above may well be the same, the >> unencoded names of people and places are not new names, > > How many new characters are being devised per year? > > Richard. From unicode at unicode.org Wed Mar 7 20:13:36 2018 From: unicode at unicode.org (via Unicode) Date: Thu, 08 Mar 2018 10:13:36 +0800 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: References: <5A95D192.5050608@unicode.org> <91680448.22170.1519824152519@ox.hosteurope.de> <83722fa3ed05a8b0989a963b3f26833a@koremail.com> <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp> <447c571bad4174b493e4bd42ee7a41f2@koremail.com> <20180307202621.770d1099@JRWUBU2> Message-ID: <66d5823c8a89c7d6c96a39bf83adf766@koremail.com> Dear Phillip On 08.03.2018 05:12, Philippe Verdy via Unicode wrote: > So most of the growth in Han characters is caused by people inventing > and registering new sinograms for their own names, using the basic > principles of combining a phonogram and a distinctive semantic > character. This is not correct. It is certainly not correct for CJK characrters added to Unicode, and to the best of my knowledge it one just makes up a new character for one's name it is now no longer possible to legally register it anywhere that uses Chinese characters. Take Extension F, over seven thousand characters of which nearly three thousand Japanese characters in Budhist texts, over one thousand Zhuang characters, naerly two thousand characters used in Korean historical texts. Regards John From unicode at unicode.org Wed Mar 7 20:32:27 2018 From: unicode at unicode.org (via Unicode) Date: Thu, 08 Mar 2018 10:32:27 +0800 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: References: <5A95D192.5050608@unicode.org> <91680448.22170.1519824152519@ox.hosteurope.de> <83722fa3ed05a8b0989a963b3f26833a@koremail.com> <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp> <447c571bad4174b493e4bd42ee7a41f2@koremail.com> <20180307202621.770d1099@JRWUBU2> <6052a482-a4a3-bc4d-379f-137a8e4b8891@att.net> Message-ID: <6802900325b925e037f79746fc4b5b25@koremail.com> On 08.03.2018 06:18, Philippe Verdy via Unicode wrote: > Additional note: the UCS will never large enough to support the > personal signatures of billions Chinese people living today or born > since milleniums, or jsut those to be born in the next century. > Theres > a need to represent these names using composed strings. A reasonable > compositing/ligaturing process can then present almost all of them ! > There is no such need, Chinese names are not formed in this way, if one just makes up a character how would others be able to read it, slight variants that add style to a character do not in Unicode count as new characters. Furthermore with government records in all computerised the are now strict rules on babies names in People's Reepulic of China, Taiwan, etc that prevent one making up new characters for names. Whilst there are maybe a few thousand name CJK unified ideographs to add to UCS, there are tens of thousands of non-name CJK unified ideographs yet to be added. Regards John From unicode at unicode.org Thu Mar 8 02:04:40 2018 From: unicode at unicode.org (Marcel Schneider via Unicode) Date: Thu, 8 Mar 2018 09:04:40 +0100 (CET) Subject: Translating the standard (was: Re: Fonts and font sizes used in the Unicode) In-Reply-To: References: <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com> <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp> <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net> Message-ID: <2018652387.2179.1520496281199.JavaMail.www@wwinf1m17> On Mon, 5 Mar 2018 20:19:47 +0100, Philippe Verdy via Unicode wrote: [?] > * the core text of the standard (section 3 about conformance and requirements is the first thing to adapt). > There's absolutely no need however to do that as a pure translation, it can be rewritten and presented > with the goals wanted by users. Here again Wikiepdia has done significant efforts there, in various languages I don?t think there is a potential to rewrite the core specs if the goal is making an abstract, given that the original authors already made efforts to keep the language simple. Whenever the goal is to add information, by contrast, e.g. about (yet) non?standard use of superscripts in Latin text, then the added value ? clearly tagged as such ? will reward the effort. A big part of the core spec is made of script?specific introductions designed to be balanced and handy. Hence part of the information is provided only in the code charts, some in the annexes. Compiling it all and writing up more detailed articles is indeed much more interesting for readers focussing on a script. Best regards, Marcel From unicode at unicode.org Thu Mar 8 02:25:25 2018 From: unicode at unicode.org (fantasai via Unicode) Date: Thu, 8 Mar 2018 17:25:25 +0900 Subject: Sentence_Break, Semi-colons, and Apparent Miscategorization Message-ID: Given that the comma and colon are categorized as SContinue, why is the semicolon also not SContinue? Also, why is the Greek Question Mark not categorized with the rest of the question marks? Why aren't the vertical presentation forms categorized with the things they are presenting? Thanks~ ~fantasai From unicode at unicode.org Thu Mar 8 03:03:28 2018 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Thu, 8 Mar 2018 09:03:28 +0000 Subject: Translating the standard (was: Re: Fonts and font sizes used in the Unicode) In-Reply-To: <944626039.25442.1520472427152.JavaMail.www@wwinf1m17> References: <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com> <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp> <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net> <944626039.25442.1520472427152.JavaMail.www@wwinf1m17> Message-ID: <20180308090328.336a734f@JRWUBU2> On Thu, 8 Mar 2018 02:27:06 +0100 (CET) Marcel Schneider via Unicode wrote: > Yes the biggest issue over time, as Ken wrote, is to *maintain* a > translation, be it only the Nameslist. For which accurately determined change bars can work wonders. An alternative would be paragraph identification and a list of changed paragraphs. The section number in TUS is too coarse for giving text locations, and page numbers are inherently changeable. Richard. From unicode at unicode.org Thu Mar 8 08:18:19 2018 From: unicode at unicode.org (=?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Grosshans?= via Unicode) Date: Thu, 8 Mar 2018 15:18:19 +0100 Subject: metric for block coverage In-Reply-To: <20180217221825.wovnzpnzftpsjp37@angband.pl> References: <20180217221825.wovnzpnzftpsjp37@angband.pl> Message-ID: <81d9e511-f33e-c0ff-39a0-9b9ecbdc937b@gmail.com> Hi ! ?? I?ll just add two points to the various points raised in the previous conversation about block coverage : Le 17/02/2018 ? 23:18, Adam Borowski via Unicode a ?crit?: > Hi! > As a part of Debian fonts team work, we're trying to improve fonts review: > ways to organize them, add metadata, pick which fonts are installed by > default and/or recommended to users, etc. > > I'm looking for a way to determine a font's coverage of available scripts. > It's probably reasonable to do this per Unicode block. [...] > > A na?ve way would be to count codepoints present in the font vs the number > of all codepoints in the block. Alas, there's way too much chaff for such > an approach to be reasonable: ? or ? count the same as LATIN TURNED CAPITAL > LETTER SAMPI WITH HORNS AND TAIL WITH SMALL LETTER X WITH CARON. A slightly less na?ve way would be to take care of when the code-points ere added to Unicode, with the rough idea that the most widespread use characters were added first. It also adds the nice feature that this metric is less ambiguous for the blocks which are not yet completed. For example, if you have a 100% coverage of Armenian for Unicode 10.0 (which I?ll call Armenian10.0 for short), it only implies a coverage of 89/91=97.8% of Armenian11.0, which will see the addition of two characters used in Armenian dialectology (ARMENIAN SMALL LETTER TURNED AYB and YI WITH STROKE). If you look at the history of Armenian Block (e.g. here https://en.wikipedia.org/wiki/Armenian_(Unicode_block)), Most (84) characters where added in 1.0, A ligature was added in 1.0, ARMENIAN HYPHEN was added in 3.0, a currency symbol in 6.1, two decorative symbols in 7.0 and two characters used in dialectology are planned in 11.0. I guess this roughly correspond to a ranking of the characters from the most used to the least used. To take your examples, both ? and ? are in unicode since 1.1 (and, I guess 1.0), while LATIN TURNED CAPITAL LETTER SAMPI WITH HORNS AND TAIL WITH SMALL LETTER X WITH CARON is not encoded yet, so,they are not the same according to this metric...? To know what this means for othe Latin example, you can watch the Latin Extende-D block (history here https://en.wikipedia.org/wiki/Latin_Extended-D ) whith new characters in 5.0, 5.1, 6.1, 7.0, 8.0, 9.0 and some accepted for 11.0 (SMALL CAPITAL Q, CAPITAL/SMALL LETTER U WITH STROKE), and later (15, for? Egyptology, Assyriology, medieval English and historical Pinyin) Of course, this measure is only rough. A counter example is in the monetary symbol block, where ? U+20AC EURO SIGN (in Unicode since 2.1) is much more used than ? U+20A3 FRENCH FRANC SIGN encode since Unicode 1.1 (1.0?) but that I never saw, despite living in France for more than four decades. > [...] > I don't think I'm the first to have this question. Any suggestions? For the Han (CJK) script, the IRG (Ideographic Rapporteur Group) defined a set of less than 10k essential Han characters, IICore (International Ideographs Core, https://en.wikipedia.org/wiki/International_Ideographs_Core). This is described in the Unihan database in the Unihan_IRGSources.txt file, kIICore field (https://www.unicode.org/reports/tr38/#kIICore ). This field also includes a letter (A,B or C) indicating a priority value and some regional information. For Unicode 10.0, a simple grep tells that there are 9810 IICore characters, 7772 of hitch pritority A, 417 priority B and 1621 priority C. Note that IICore has been stable (as version 2.2) since 2004, but Ken Lunde, from Adobe, has recently proposed an update to it (https://www.unicode.org/L2/L2018/18066-iicore-changes.pdf), but only in the region tags, neither on the priorities nor on the list of characters. However, reading the associated blog post of Ken Lunde, it seems a few characters could be added to IICore in the future. ?? Cheers, ??? ??? ??? French From unicode at unicode.org Thu Mar 8 08:19:24 2018 From: unicode at unicode.org (Marcel Schneider via Unicode) Date: Thu, 8 Mar 2018 15:19:24 +0100 (CET) Subject: Translating the standard (was: Re: Fonts and font sizes used in the Unicode) Message-ID: <1023307540.11630.1520518764800.JavaMail.www@wwinf1m17> On Thu, 8 Mar 2018 09:03:28 +0000, Richard Wordingham via Unicode wrote: > > > Yes the biggest issue over time, as Ken wrote, is to *maintain* a > > translation, be it only the Nameslist. > > For which accurately determined change bars can work wonders. An > alternative would be paragraph identification and a list of changed > paragraphs. The section number in TUS is too coarse for giving text > locations, and page numbers are inherently changeable. Adobe Illustrator doesn?t seem to support purple numbers, and Adobe Reader seems unable to accept input of bookmarks as a go?to feature (while that must be proper to Acrobat). Word is reported not to add lasting change bars in an automated way. But all that can be done in HTML ? which is not the format of The Unicode Standard, whose web bookmarks are fortunately published in separate collections. When UAXes are updated, an intermediate revision has all changes highlighted and remains available online. We can see delta charts with all changes highlighted, in PDF. Why did the Core Specification not come into the benefit of these facilities? Has this already been submitted as formal feedback? (UTC is known for not considering feedback that has not been submitted via the Contact form or docsubmit at unicode.org, and Mailing lists have explicit caveats.) Best regards, Marcel From unicode at unicode.org Thu Mar 8 09:04:44 2018 From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode) Date: Thu, 8 Mar 2018 16:04:44 +0100 Subject: Sentence_Break, Semi-colons, and Apparent Miscategorization In-Reply-To: References: Message-ID: >From the first line, I guess you mean that all three questions are having to do with the Sentence_Break property values. Namely: http://www.unicode.org/reports/tr29/proposed.html#Table_Sentence_Break_Property_Values http://www.unicode.org/reports/tr29/proposed.html#SContinue Mark On Thu, Mar 8, 2018 at 9:25 AM, fantasai via Unicode wrote: > Given that the comma and colon are categorized as SContinue, > why is the semicolon also not SContinue? > Also, why is the Greek Question Mark not categorized with > the rest of the question marks? > ?As I recall ?,? ?both are because the semicolon can also represent a greek question mark (they are canonically equivalent ?, so you can't reliably distinguish between them ).? ?BTW, here is a table of property differences for codepoint X, toNfc(X) (if a single character) and toNfkc(X) (again, if a single character). https://docs.google.com/spreadsheets/d/1ZExxhAujA8kX42F8KBK3okX_So7Dt5YZvyanL8dH8tM/edit#gid=0 It was a quick dump so no guarantees that all the dots are crossed. It skips comparing properties that are purposefully different across NFC (like Decomposition_Mapping) or different code points (like Name or Block), and most CJK properties (ones starting with 'k'). > Why aren't the vertical presentation forms categorized with > the things they are presenting? > ?At least some of them are: U+FE10 ( ? ) PRESENTATION FORM FOR VERTICAL COMMA U+FE11 ( ? ) PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC COMMA U+FE13 ( ? ) PRESENTATION FORM FOR VERTICAL COLON U+FE31 ( ? ) PRESENTATION FORM FOR VERTICAL EM DASH U+FE32 ( ? ) PRESENTATION FORM FOR VERTICAL EN DASH ? > > Thanks~ > ~fantasai > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Mar 8 03:25:53 2018 From: unicode at unicode.org (Elsebeth Flarup via Unicode) Date: Thu, 08 Mar 2018 04:25:53 -0500 Subject: Translating the standard (was: Re: Fonts and font sizes used in the Unicode) In-Reply-To: <20180308090328.336a734f@JRWUBU2> References: <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com> <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp> <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net> <944626039.25442.1520472427152.JavaMail.www@wwinf1m17> <20180308090328.336a734f@JRWUBU2> Message-ID: <2eVJnayaB1fI6v03wDXiNHjzi3sd4iQVcP3B6zKo0jNlJZbIv3NBHTTxbNF09FVxQnyHi-in9vx9eReFVRc2Cg==@protonmail.ch> For a number of reasons I think translating the standard is a really bad idea. As long as there are people interested in maintaining the translation, identifying deltas and easily translating just the deltas would NOT be difficult, however. Modern computer aided translation tools all use translation memories that automatically translate already translated segments and present only new/changed segments to the translator. No need for change bars etc. This assumes that somebody would have stewardship of the translation memory, that the people doing the translation would be willing to/capable of using the CAT tools, etc., but the technical translation technology is available to make this part of the equation not much of an issue. There are other reasons to not do this. Elsebeth ?? ??????? Original Message ??????? On March 8, 2018 10:03 AM, Richard Wordingham via Unicode wrote: > ?? > > On Thu, 8 Mar 2018 02:27:06 +0100 (CET) > > Marcel Schneider via Unicode unicode at unicode.org wrote: > > > Yes the biggest issue over time, as Ken wrote, is to maintain a > > > > translation, be it only the Nameslist. > > For which accurately determined change bars can work wonders. An > > alternative would be paragraph identification and a list of changed > > paragraphs. The section number in TUS is too coarse for giving text > > locations, and page numbers are inherently changeable. > > Richard. From unicode at unicode.org Thu Mar 8 12:05:06 2018 From: unicode at unicode.org (Marcel Schneider via Unicode) Date: Thu, 8 Mar 2018 19:05:06 +0100 (CET) Subject: Translating the standard (was: Re: Fonts and font sizes used in the Unicode) In-Reply-To: <2eVJnayaB1fI6v03wDXiNHjzi3sd4iQVcP3B6zKo0jNlJZbIv3NBHTTxbNF09FVxQnyHi-in9vx9eReFVRc2Cg==@protonmail.ch> References: <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com> <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp> <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net> <944626039.25442.1520472427152.JavaMail.www@wwinf1m17> <20180308090328.336a734f@JRWUBU2> <2eVJnayaB1fI6v03wDXiNHjzi3sd4iQVcP3B6zKo0jNlJZbIv3NBHTTxbNF09FVxQnyHi-in9vx9eReFVRc2Cg==@protonmail.ch> Message-ID: <53251075.18695.1520532306737.JavaMail.www@wwinf1m17> On Thu, 08 Mar 2018 04:25:53 -0500, Elsebeth Flarup via Unicode wrote: > > For a number of reasons I think translating the standard is a really bad idea. > [?] > > There are other reasons to not do this. I assume that the reasons you are thinking of, are congruent with those that Ken already explained in detail in: http://www.unicode.org/mail-arch/unicode-ml/y2018-m03/0025.html And I think with Ken that the idea in itself isn?t bad as such, but that it is not feasible any longer. Everybody (supposedly) knows that the Core Spec has really been translated, published in a print edition, scanned into Google Books, and is still for sale: https://www.amazon.fr/Unicode-5-0-pratique-Patrick-Andries/dp/2100511408/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1206989878&sr=8-1 https://books.google.fr/books? id=GgbWZNTRncsC&printsec=frontcover&dq=Andries+Patrick&hl=fr&sa=X&ved=0ahUKEwis59Cwp93ZAhUF6RQKHZ1GBlIQ6AEIKjAA#v=onepage&q =Andries%20Patrick&f=false OK, the version number was only half the actual one. Best regards, Marcel From unicode at unicode.org Thu Mar 8 12:27:47 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Thu, 8 Mar 2018 19:27:47 +0100 Subject: metric for block coverage In-Reply-To: <81d9e511-f33e-c0ff-39a0-9b9ecbdc937b@gmail.com> References: <20180217221825.wovnzpnzftpsjp37@angband.pl> <81d9e511-f33e-c0ff-39a0-9b9ecbdc937b@gmail.com> Message-ID: 2018-03-08 15:18 GMT+01:00 Fr?d?ric Grosshans via Unicode < unicode at unicode.org>: > Le 17/02/2018 ? 23:18, Adam Borowski via Unicode a ?crit : > >> Of course, this measure is only rough. A counter example is in the >> monetary symbol block, where ? U+20AC EURO SIGN (in Unicode since 2.1) is >> much more used than ? U+20A3 FRENCH FRANC SIGN encode since Unicode 1.1 >> (1.0?) but that I never saw, despite living in France for more than four >> decades. > > I actually saw a French franc symbol (not necessarily this one, most often a narrowed version of the "Fr." abbreviation) only on mechanical typewriters built in the 1960-1970's, and some IBM typewriter "balls" in the early 1980's also on some old printers with rotating wheels, This narrow abbreviation was used by typists in accounting and administrative services, typically in tabular data. It was also seen sometimes for indicating the pricing on newspapers/magazines (but not sure it was really a single character, as they were used along with non-monospaced fonts, and was probably only using smaller narrow font styles, without any ligature). I wonder if this symbol was not just outside of France, or in former colonies before the 1960's (or created later to distinguish the French Franc from the CFA Franc). May be it has some use today in Africa as an abbreviation of the CFA (now pegged to the Euro via agreement with Banque de France and the European Commission for the amount of warranties, collected by CFA members and France, and needed to offer this limited warranty of conversion with the Euro on a limited exchange market subject to more restrictive policies), but the two CFA currencies (of the BEAC or BCAO) are not fractions of Euro The CFA-EUR conversion rates are not stable and subject to scheduled changes ,by agreements between CFA members, France and the European commission. And because they are not "liquid" currencies (subject to restrictive conversions and controls), their rates against the Euro on open markets varies constantly (but modestly) around the current designated value decided by CFA banks members and partners: the pegged value is then only indicative of the medium rate it should have on markets. For this reason, most international payments and contracts are made in more liquid major currencies (EUR, GBP, USD, CHF, ZAR, DTS, and gold ounces) but at much more variable rates (and with higher transaction fees on open markets than conversions between major currencies). -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Mar 8 17:06:03 2018 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Thu, 8 Mar 2018 23:06:03 +0000 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: <51c12b4974b1cc0d2f476e29db47b99d@koremail.com> References: <5A95D192.5050608@unicode.org> <91680448.22170.1519824152519@ox.hosteurope.de> <83722fa3ed05a8b0989a963b3f26833a@koremail.com> <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp> <447c571bad4174b493e4bd42ee7a41f2@koremail.com> <20180307202621.770d1099@JRWUBU2> <51c12b4974b1cc0d2f476e29db47b99d@koremail.com> Message-ID: <20180308230603.3fb73ef6@JRWUBU2> On Thu, 08 Mar 2018 09:42:38 +0800 via Unicode wrote: > to the best of my knowledge virtually no new characters used just for > names are under consideration, all the ones that are under > consideration are from before this century. What I was interested in was the rate of generation of new CJK characters in general, not just those for names. I appreciate that encoding is dominated by the backlog of older characters. Richard. From unicode at unicode.org Thu Mar 8 19:17:32 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Fri, 9 Mar 2018 02:17:32 +0100 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: <20180308230603.3fb73ef6@JRWUBU2> References: <5A95D192.5050608@unicode.org> <91680448.22170.1519824152519@ox.hosteurope.de> <83722fa3ed05a8b0989a963b3f26833a@koremail.com> <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp> <447c571bad4174b493e4bd42ee7a41f2@koremail.com> <20180307202621.770d1099@JRWUBU2> <51c12b4974b1cc0d2f476e29db47b99d@koremail.com> <20180308230603.3fb73ef6@JRWUBU2> Message-ID: This still leaves the question about how to write personal names ! IDS alone cannot represent them without enabling some "reasonable" ligaturing (they don't have to match the exact strokes variants for optimal placement, or with all possible simplifications). I'm curious to know how China, Taiwan, Singapore or Japan handle this (for official records or in banks): like our personal signatures (as digital images), and then using a simplified official record (including the registration of romanized names)? 2018-03-09 0:06 GMT+01:00 Richard Wordingham via Unicode < unicode at unicode.org>: > On Thu, 08 Mar 2018 09:42:38 +0800 > via Unicode wrote: > > > to the best of my knowledge virtually no new characters used just for > > names are under consideration, all the ones that are under > > consideration are from before this century. > > What I was interested in was the rate of generation of new > CJK characters in general, not just those for names. I appreciate that > encoding is dominated by the backlog of older characters. > > Richard. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Mar 8 19:22:47 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Fri, 9 Mar 2018 02:22:47 +0100 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: References: <5A95D192.5050608@unicode.org> <91680448.22170.1519824152519@ox.hosteurope.de> <83722fa3ed05a8b0989a963b3f26833a@koremail.com> <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp> <447c571bad4174b493e4bd42ee7a41f2@koremail.com> <20180307202621.770d1099@JRWUBU2> <51c12b4974b1cc0d2f476e29db47b99d@koremail.com> <20180308230603.3fb73ef6@JRWUBU2> Message-ID: As well how Chinese/Japanese post offices handle addresses written with sinograms for personal names ? Is the expanded IDS form acceptable for them, or do they require using Romanized addresses, or phonetic approximations (Bopomofo in China, Kanas in Japan, Hangul in Korea) ? 2018-03-09 2:17 GMT+01:00 Philippe Verdy : > This still leaves the question about how to write personal names ! > IDS alone cannot represent them without enabling some "reasonable" > ligaturing (they don't have to match the exact strokes variants for optimal > placement, or with all possible simplifications). > I'm curious to know how China, Taiwan, Singapore or Japan handle this (for > official records or in banks): like our personal signatures (as digital > images), and then using a simplified official record (including the > registration of romanized names)? > > 2018-03-09 0:06 GMT+01:00 Richard Wordingham via Unicode < > unicode at unicode.org>: > >> On Thu, 08 Mar 2018 09:42:38 +0800 >> via Unicode wrote: >> >> > to the best of my knowledge virtually no new characters used just for >> > names are under consideration, all the ones that are under >> > consideration are from before this century. >> >> What I was interested in was the rate of generation of new >> CJK characters in general, not just those for names. I appreciate that >> encoding is dominated by the backlog of older characters. >> >> Richard. >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Mar 9 04:48:04 2018 From: unicode at unicode.org (=?UTF-8?Q?Martin_J._D=c3=bcrst?= via Unicode) Date: Fri, 9 Mar 2018 19:48:04 +0900 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: References: <5A95D192.5050608@unicode.org> <91680448.22170.1519824152519@ox.hosteurope.de> <83722fa3ed05a8b0989a963b3f26833a@koremail.com> <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp> <447c571bad4174b493e4bd42ee7a41f2@koremail.com> <20180307202621.770d1099@JRWUBU2> <51c12b4974b1cc0d2f476e29db47b99d@koremail.com> <20180308230603.3fb73ef6@JRWUBU2> Message-ID: <28cece14-67ee-b0e5-52f6-34147e13c50a@it.aoyama.ac.jp> On 2018/03/09 10:17, Philippe Verdy via Unicode wrote: > This still leaves the question about how to write personal names ! > IDS alone cannot represent them without enabling some "reasonable" > ligaturing (they don't have to match the exact strokes variants for optimal > placement, or with all possible simplifications). > I'm curious to know how China, Taiwan, Singapore or Japan handle this (for > official records or in banks): like our personal signatures (as digital > images), and then using a simplified official record (including the > registration of romanized names)? This question seems to assume more of a difference between alphabetic and ideographic traditions. A name in ideographs, in the same way as a name in alphabetic characters, is defined by the characters that are used, not by stuff like stroke variants, etc. And virtually all names, even before the introduction of computers, and even more after that, use reasonably frequent characters. The difference, at least in Japan, is that some people keep the ideograph before simplification in their official records, but they may or may not insist on its use in everyday practice. In most cases, both a traditional and a simplified variant are available. Examples are ?/?, ?/?, ?/?, and so on. I regularly hit such cases when grading, because our university database uses the formal (old) one, where students may not care about it and enter the new one on some system where they have to enter their name by themselves. Apart from that, at least in Japan, signatures are used extremely rarely; it's mostly stamped seals, which are also kept as images by banks,... Regards, Martin. From unicode at unicode.org Fri Mar 9 04:54:18 2018 From: unicode at unicode.org (=?UTF-8?Q?Martin_J._D=c3=bcrst?= via Unicode) Date: Fri, 9 Mar 2018 19:54:18 +0900 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: References: <5A95D192.5050608@unicode.org> <91680448.22170.1519824152519@ox.hosteurope.de> <83722fa3ed05a8b0989a963b3f26833a@koremail.com> <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp> <447c571bad4174b493e4bd42ee7a41f2@koremail.com> <20180307202621.770d1099@JRWUBU2> <51c12b4974b1cc0d2f476e29db47b99d@koremail.com> <20180308230603.3fb73ef6@JRWUBU2> Message-ID: On 2018/03/09 10:22, Philippe Verdy via Unicode wrote: > As well how Chinese/Japanese post offices handle addresses written with > sinograms for personal names ? Is the expanded IDS form acceptable for > them, or do they require using Romanized addresses, or phonetic > approximations (Bopomofo in China, Kanas in Japan, Hangul in Korea) ? They just see the printed form, not an encoding, and therefore no IDS. Many addresses use handwriting, which has its own variability. Variations such as those covered by IDSes are easily recognizable by people as being the same as the 'base' character, and OCR systems, if they are good enough to decipher handwriting, can handle such cases, too. Romanized addresses will be delivered because otherwise it would be difficult for foreigners to send anything. Pure Kana should work in Japan, although the postal employee will have a second look because it's extremely unusual. For Korea, these days, it will be mostly Hangul; I'm not sure whether addresses with Hanja would incur a delay. My guess would be that Bopomofo wouldn't work in mainland China (might work in Taiwan, not sure). Regards, Martin. From unicode at unicode.org Fri Mar 9 05:09:27 2018 From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode) Date: Fri, 9 Mar 2018 12:09:27 +0100 Subject: A sketch with the best-known Swiss tongue twister Message-ID: https://www.youtube.com/watch?v=QOwITNazUKg De Papscht h?t z?Schpi?z s?Schp?kchbschtekch z?schpaat bschtellt. literally: The Pope has [in Spiez] [the bacon cutlery] [too late] ordered. Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Mar 9 05:52:33 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Fri, 9 Mar 2018 12:52:33 +0100 Subject: A sketch with the best-known Swiss tongue twister In-Reply-To: References: Message-ID: Is that just for Switzerland in one of the local dialectal variants ? Or more generally Alemannic (also in Northeastern France, South Germany, Western Austria, Liechtenstein, Northern Italy). 2018-03-09 12:09 GMT+01:00 Mark Davis ?? via Unicode : > https://www.youtube.com/watch?v=QOwITNazUKg > > De Papscht h?t z?Schpi?z s?Schp?kchbschtekch z?schpaat bschtellt. > literally: The Pope has [in Spiez] [the bacon cutlery] [too late] ordered. > > Mark > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Mar 9 06:23:23 2018 From: unicode at unicode.org (Otto Stolz via Unicode) Date: Fri, 9 Mar 2018 13:23:23 +0100 Subject: A sketch with the best-known Swiss tongue twister In-Reply-To: References: Message-ID: <42104177-7b70-a4a1-c1e9-6d376f3bb5cc@uni-konstanz.de> 2018-03-09 12:09 GMT+01:00 Mark Davis ?? via Unicode : > De Papscht h?t z?Schpi?z s?Schp?kchbschtekch z?schpaat bschtellt. > literally: The Pope has [in Spiez] [the bacon cutlery] [too late] > ordered. Am 2018-03-09 um 12:52 schrieb Philippe Verdy via Unicode: > Is that just for Switzerland in one of the local dialectal variants ? Basically the same in Central Swabian (I am from Stuttgart): I m?en, mir h?bet s Sp?tzles-Bsteck z sp?t bstellt. literally: I guess, we have ordered the noodle cutlery too late. And when my niece married a guy with the Polish surname Brzeczek and had asked for cutlery for their wedding present, guess what we have told them. ? Otto Solution: Zerst hemmer denkt, mir h?bet f?r die Brzeczeks s Bsteck z sp?t bstellt, aber n? h?ts doch no glangt. From unicode at unicode.org Fri Mar 9 06:24:13 2018 From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode) Date: Fri, 9 Mar 2018 13:24:13 +0100 Subject: A sketch with the best-known Swiss tongue twister In-Reply-To: References: Message-ID: There are definitely many dialects across Switzerland. I think that for *this* phrase it would be roughly the same for most of the population, with minor differences (eg 'het' vs 'h?t'). But a native speaker like Martin would be able to say for sure. Mark On Fri, Mar 9, 2018 at 12:52 PM, Philippe Verdy wrote: > Is that just for Switzerland in one of the local dialectal variants ? Or > more generally Alemannic (also in Northeastern France, South Germany, > Western Austria, Liechtenstein, Northern Italy). > > 2018-03-09 12:09 GMT+01:00 Mark Davis ?? via Unicode > : > >> https://www.youtube.com/watch?v=QOwITNazUKg >> >> De Papscht h?t z?Schpi?z s?Schp?kchbschtekch z?schpaat bschtellt. >> literally: The Pope has [in Spiez] [the bacon cutlery] [too late] ordered. >> >> Mark >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Mar 9 06:52:54 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Fri, 9 Mar 2018 13:52:54 +0100 Subject: A sketch with the best-known Swiss tongue twister In-Reply-To: <42104177-7b70-a4a1-c1e9-6d376f3bb5cc@uni-konstanz.de> References: <42104177-7b70-a4a1-c1e9-6d376f3bb5cc@uni-konstanz.de> Message-ID: So the "best-known Swiss tongue" is still not so much known, and still incorrectly referenced (frequently confused with "Swiss German", which is much like standard High German, unifying with it on most aspects, with only minor orthographic preferences such as capitalization rules or very few Swiss-specific terms, but no alteration of the grammar and no specific characters like in Alemanic dialects; the term "Swiss tongue" in the context given by the video is obviously false). Note tht Schw?bisch is way far from it. What looks more like the Swiss dialects of Alemanic if French Alsatian, it is not "Swiss", and don't tell Alsatians that this is "German" when there are clear differences with the language on the other side of the Rhine River, and lot of differences with Schw?bish (which is much more a distinct language than a dialect of Alemannic or German). Same remark about Tyrol and Bavarian (they are probably nearer from Schw?bish than Swiss or French Alemannic, or than Standard High German; their difference with Schw?bish is almost like the difference between Standard Dutch and Limburgish or West Fl?misch; Standard Dutch, Standard German, French/Swiss Alemanic, and Schw?bisch are enough differentiated to be distinct languages). The term "Alemannic" is way too large, but calling it "Swiss German" is also wrong (even if its ISO 639-3 code is "gsw", probably taken from this incorrect name). 2018-03-09 13:23 GMT+01:00 Otto Stolz via Unicode : > 2018-03-09 12:09 GMT+01:00 Mark Davis ?? via Unicode > : > >> De Papscht h?t z?Schpi?z s?Schp?kchbschtekch z?schpaat bschtellt. >> literally: The Pope has [in Spiez] [the bacon cutlery] [too late] >> ordered. >> > > Am 2018-03-09 um 12:52 schrieb Philippe Verdy via Unicode: > >> Is that just for Switzerland in one of the local dialectal variants ? >> > > Basically the same in Central Swabian (I am from Stuttgart): > I m?en, mir h?bet s Sp?tzles-Bsteck z sp?t bstellt. > literally: I guess, we have ordered the noodle cutlery too late. > > And when my niece married a guy with the Polish surname Brzeczek > and had asked for cutlery for their wedding present, guess what we > have told them. ? > > Otto > > Solution: > Zerst hemmer denkt, mir h?bet f?r die Brzeczeks s Bsteck > z sp?t bstellt, aber n? h?ts doch no glangt. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Mar 9 07:40:08 2018 From: unicode at unicode.org (Tom Gewecke via Unicode) Date: Fri, 9 Mar 2018 06:40:08 -0700 Subject: A sketch with the best-known Swiss tongue twister In-Reply-To: References: <42104177-7b70-a4a1-c1e9-6d376f3bb5cc@uni-konstanz.de> Message-ID: <042AD1BF-3709-43AA-B4BC-8E748B37CE62@bluesky.org> > On Mar 9, 2018, at 5:52 AM, Philippe Verdy via Unicode wrote: > > So the "best-known Swiss tongue" is still not so much known, and still incorrectly referenced (frequently confused with "Swiss German", which is much like standard High German I think Swiss German is in fact the correct English name for the Swiss dialects, taken from the German Schweizerdeutsch. https://en.wikipedia.org/wiki/Swiss_German From unicode at unicode.org Fri Mar 9 07:55:16 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Fri, 9 Mar 2018 14:55:16 +0100 Subject: A sketch with the best-known Swiss tongue twister In-Reply-To: <042AD1BF-3709-43AA-B4BC-8E748B37CE62@bluesky.org> References: <42104177-7b70-a4a1-c1e9-6d376f3bb5cc@uni-konstanz.de> <042AD1BF-3709-43AA-B4BC-8E748B37CE62@bluesky.org> Message-ID: English Wikipedia is not a good reference for the name; the GSW wiki states clearly another name and "Alemannic" is attested and correct for the family of dialects. "Schweizerdeutsch" is also wrong like "Swiss German" when it refers to Alsatian (neither Swiss nor German for those speaking it): these expressions only refer to "de-CH", not "gsw". 2018-03-09 14:40 GMT+01:00 Tom Gewecke via Unicode : > > > On Mar 9, 2018, at 5:52 AM, Philippe Verdy via Unicode < > unicode at unicode.org> wrote: > > > > So the "best-known Swiss tongue" is still not so much known, and still > incorrectly referenced (frequently confused with "Swiss German", which is > much like standard High German > > I think Swiss German is in fact the correct English name for the Swiss > dialects, taken from the German Schweizerdeutsch. > > https://en.wikipedia.org/wiki/Swiss_German > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Mar 9 08:11:49 2018 From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode) Date: Fri, 9 Mar 2018 15:11:49 +0100 Subject: A sketch with the best-known Swiss tongue twister In-Reply-To: <042AD1BF-3709-43AA-B4BC-8E748B37CE62@bluesky.org> References: <42104177-7b70-a4a1-c1e9-6d376f3bb5cc@uni-konstanz.de> <042AD1BF-3709-43AA-B4BC-8E748B37CE62@bluesky.org> Message-ID: Yes, the right English names are "Swiss High German" for de-CH, and "Swiss German" for gsw-CH. Mark On Fri, Mar 9, 2018 at 2:40 PM, Tom Gewecke via Unicode wrote: > > > On Mar 9, 2018, at 5:52 AM, Philippe Verdy via Unicode < > unicode at unicode.org> wrote: > > > > So the "best-known Swiss tongue" is still not so much known, and still > incorrectly referenced (frequently confused with "Swiss German", which is > much like standard High German > > I think Swiss German is in fact the correct English name for the Swiss > dialects, taken from the German Schweizerdeutsch. > > https://en.wikipedia.org/wiki/Swiss_German > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Mar 9 08:52:16 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Fri, 9 Mar 2018 15:52:16 +0100 Subject: A sketch with the best-known Swiss tongue twister In-Reply-To: References: <42104177-7b70-a4a1-c1e9-6d376f3bb5cc@uni-konstanz.de> <042AD1BF-3709-43AA-B4BC-8E748B37CE62@bluesky.org> Message-ID: In summary you do not object the fact that unqualified "gsw" language code is not (and should not be) named "Swiss German" (as it is only for "gsw-CH", not for any other non-Swiss variants of Alemannic). The addition of "High" is optional, unneeded in fact, as it does not remove any ambiguity, in Germany for "de-DE", or in Switzerland for "de-CH", or in Italian South Tyrol for "de-IT", or in Austria for "de-AT", or even for "Standard German" (de) Note also that Alsatian itself ("gsw-FR") is considered part of the "High German" branch of Germanic languages ! "High German" refers to the group that includes Standard German and its national variants ("de", "de-DE", "de-CH", "de-AT", "de-CH", "de-IT") as well as the Alemannic group ( "gsw" , "gsw-FR", "gsw-CH"), possibly extended (this is discutable) to Schw?bish in Germany and Hungary. My opinion is that even the Swiss variants should be preferably named "Swiss Alemannic" collectively, and not "Swiss German" which causes constant confusion between "de-CH" and "gsw-CH". 2018-03-09 15:11 GMT+01:00 Mark Davis ?? via Unicode : > Yes, the right English names are "Swiss High German" for de-CH, and "Swiss > German" for gsw-CH. > > Mark > > On Fri, Mar 9, 2018 at 2:40 PM, Tom Gewecke via Unicode < > unicode at unicode.org> wrote: > >> >> > On Mar 9, 2018, at 5:52 AM, Philippe Verdy via Unicode < >> unicode at unicode.org> wrote: >> > >> > So the "best-known Swiss tongue" is still not so much known, and still >> incorrectly referenced (frequently confused with "Swiss German", which is >> much like standard High German >> >> I think Swiss German is in fact the correct English name for the Swiss >> dialects, taken from the German Schweizerdeutsch. >> >> https://en.wikipedia.org/wiki/Swiss_German >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Mar 9 08:58:29 2018 From: unicode at unicode.org (Marcel Schneider via Unicode) Date: Fri, 9 Mar 2018 15:58:29 +0100 (CET) Subject: Translating the standard (was: Re: Fonts and font sizes used in the Unicode) In-Reply-To: <20180308183304.GB2050855@phare.normalesup.org> References: <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com> <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp> <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net> <944626039.25442.1520472427152.JavaMail.www@wwinf1m17> <20180308090328.336a734f@JRWUBU2> <2eVJnayaB1fI6v03wDXiNHjzi3sd4iQVcP3B6zKo0jNlJZbIv3NBHTTxbNF09FVxQnyHi-in9vx9eReFVRc2Cg==@protonmail.ch> <53251075.18695.1520532306737.JavaMail.www@wwinf1m17> <20180308183304.GB2050855@phare.normalesup.org> Message-ID: <1296021581.14406.1520607509951.JavaMail.www@wwinf1m17> On 08/03/18 19:33, Arthur Reutenauer wrote: > > On Thu, Mar 08, 2018 at 07:05:06PM +0100, Marcel Schneider via Unicode wrote: > > https://www.amazon.fr/Unicode-5-0-pratique-Patrick-Andries/dp/2100511408/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1206989878&sr=8-1 > > You?re linking to the wrong one of Patrick?s books :-) The > translation he made of version 3.1 (not 5.0) of the core specification > is available in full at http://hapax.qc.ca/ (?Unicode et ISO 10646 en > fran?ais?, middle of page), as well as a few free sample chapters from > his other book. > > Best, > > Arthur > Indeed, thank you very much for correction, and thanks for the link. I can tell so much that the free online chapters of Patrick Andries? translation of the Unicode standard were to me the first introduction, more precisely ch. 7 (Punctuation) which I even printed out to get in touch with the various dashes and spaces and learn more about quotation marks. [I didn?t have internet and took the copy home from a library.] Based on this experience, I think there isn?t too much extrapolation in supposing that millions of newcomers in all countries could use such a translation. Although the latest version of TUS is obviously more up?to?date, version 3.1 isn?t plain wrong at all. Hence I warmly recommend to translate at least v3.1 ? or those chapters of v10.0 that are already in v3.1 ? while prompting the reader to seek further information on the Unicode website. We note too that Patrick?s translation is annotated (footnotes in gray print) with additional information of interest for the target locale. (Here one could mention that Latin script requires preformatted superscript letters for an interoperable representation of current text in some languages.) Some Unicode terminology like ?bidi?mirroring? may be hard to adapt but that isn?t more of a challenge than any tech/science writer is facing when handling content that was originally produced in the United States and/or, more generally, in English. E.g. in French we may choose from a panel of more conservative through less usual grammatical forms among which: ?r?flexion bidi?, ?r?flexion bidirectonnelle?, ?bidi?reflexion? (hyphenated or not), ?r?flexible? or, simply, ?miroir?. Anyway, every locale is expected to localize the full range of Unicode terminology ? unless people agree to switch to English whenever the topic is Unicode, even while discussing any other topic currently in Chinese or in Japanese, although doing so is not a problem, it?s just ethically weird. So we look forward to the concept of a ?Unicode in Practice? textbook implemented in Chinese and in Japanese and in any other non?English and non?French locale if it isn?t already. As of translating the Core spec as a whole, why did two recent attempts crash even before the maintenance stage, while the 3.1 project succeeded? Some pieces of the puzzle seem to be still missing. Best regards, Marcel From unicode at unicode.org Fri Mar 9 10:21:31 2018 From: unicode at unicode.org (via Unicode) Date: Sat, 10 Mar 2018 00:21:31 +0800 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: References: <5A95D192.5050608@unicode.org> <91680448.22170.1519824152519@ox.hosteurope.de> <83722fa3ed05a8b0989a963b3f26833a@koremail.com> <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp> <447c571bad4174b493e4bd42ee7a41f2@koremail.com> <20180307202621.770d1099@JRWUBU2> <51c12b4974b1cc0d2f476e29db47b99d@koremail.com> <20180308230603.3fb73ef6@JRWUBU2> Message-ID: <60d1d499b4351e3d4285692afb1d792a@koremail.com> On 09.03.2018 09:17, Philippe Verdy via Unicode wrote: > This still leaves the question about how to write personal names ! > IDS alone cannot represent them without enabling some "reasonable" > ligaturing (they dont have to match the exact strokes variants for > optimal placement, or with all possible simplifications). > Im curious to know how China, Taiwan, Singapore or Japan handle this > (for official records or in banks): like our personal signatures (as > digital images), and then using a simplified official record > (including the registration of romanized names)? > > 2018-03-09 0:06 GMT+01:00 Richard Wordingham via Unicode > : > In mainliand China the full back is to use pinyin capitals without tone marks, so ASCII. Passport have names printed in both Chinese characters and capitalised pinyin, both are legally valid. ID cards which people get when they turn 16 have the names in printed Chinese characters only. So these I assume must be printed using a system that has some characters not in UCS. Banks certainly don't have all these extra characters so they use capitalised pinyin for any characters they can not type. Japan in CJK Ext F had 1,645 characters which included all characters required for names of poeple and places. So there should be no need for a fallback system, Unicode is enough, now John Knightley From unicode at unicode.org Fri Mar 9 10:41:35 2018 From: unicode at unicode.org (Ken Whistler via Unicode) Date: Fri, 9 Mar 2018 08:41:35 -0800 Subject: Translating the standard In-Reply-To: <1296021581.14406.1520607509951.JavaMail.www@wwinf1m17> References: <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com> <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp> <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net> <944626039.25442.1520472427152.JavaMail.www@wwinf1m17> <20180308090328.336a734f@JRWUBU2> <2eVJnayaB1fI6v03wDXiNHjzi3sd4iQVcP3B6zKo0jNlJZbIv3NBHTTxbNF09FVxQnyHi-in9vx9eReFVRc2Cg==@protonmail.ch> <53251075.18695.1520532306737.JavaMail.www@wwinf1m17> <20180308183304.GB2050855@phare.normalesup.org> <1296021581.14406.1520607509951.JavaMail.www@wwinf1m17> Message-ID: <8f0369d7-4fea-8685-1770-cfb5b895fbae@att.net> On 3/9/2018 6:58 AM, Marcel Schneider via Unicode wrote: > As of translating the Core spec as a whole, why did two recent attempts crash even > before the maintenance stage, while the 3.1 project succeeded? Essentially because both the Japanese and the Chinese attempts were conceived of as commercial projects, which ultimately did not cost out for the publishers, I think. Both projects attempted limiting the scope of their translation to a subset of the core spec that would focus on East Asian topics, but the core spec is complex enough that it does not abridge well. And I think both projects ran into difficulties in trying to figure out how to deal with fonts and figures. The Unicode 3.0 translation (and the 3.1 update) by Patrick Andries was a labor of love. In this arena, a labor of love is far more likely to succeed than a commercial translation project, because it doesn't have to make financial sense. By the way, as a kind of annotation to an annotated translation, people should know that the 3.1 translation on Patrick's site is not a straight translation of 3.1, but a kind of interpreted adaptation. In particular, it incorporated a translation of UAX #15, Unicode Normalization Forms, Version 3.1.0, as a Chapter 6 of the translation, which is not the actual structure of Unicode 3.1. And there are other abridgements and alterations, where they make sense -- compare the resources section of the Preface, for example. This is not a knock on Patrick's excellent translation work, but it does illustrate the inherent difficulties of trying to approach a complete translation project for *any* version of the Unicode Standard. --Ken From unicode at unicode.org Fri Mar 9 11:29:07 2018 From: unicode at unicode.org (via Unicode) Date: Sat, 10 Mar 2018 01:29:07 +0800 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: <20180308230603.3fb73ef6@JRWUBU2> References: <5A95D192.5050608@unicode.org> <91680448.22170.1519824152519@ox.hosteurope.de> <83722fa3ed05a8b0989a963b3f26833a@koremail.com> <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp> <447c571bad4174b493e4bd42ee7a41f2@koremail.com> <20180307202621.770d1099@JRWUBU2> <51c12b4974b1cc0d2f476e29db47b99d@koremail.com> <20180308230603.3fb73ef6@JRWUBU2> Message-ID: <269e1fb39a96b080981fd2748b6bfc65@koremail.com> Dear Richard, On 09.03.2018 07:06, Richard Wordingham via Unicode wrote: > On Thu, 08 Mar 2018 09:42:38 +0800 > via Unicode wrote: > >> to the best of my knowledge virtually no new characters used just >> for >> names are under consideration, all the ones that are under >> consideration are from before this century. > > What I was interested in was the rate of generation of new > CJK characters in general, not just those for names. I appreciate > that > encoding is dominated by the backlog of older characters. > Impossible to give an accurate answer or even a reasonable guess. As to those that would be condidates for Unicode, my guess would be not more than a few dozen a year. New characters are not permitted in legal names. Fanasty Chinese characters used for a alien language or a mystery novel would not usually be suitable for encoding. Most new words in Chinese have more than one syllable and do not require any new characters. Documented increase such as scientific terms for new elements, flora and fauna, would seem to be not more one or two dozen a year. Regards John Knightley > Richard. From unicode at unicode.org Fri Mar 9 12:46:23 2018 From: unicode at unicode.org (Ken Whistler via Unicode) Date: Fri, 9 Mar 2018 10:46:23 -0800 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: <269e1fb39a96b080981fd2748b6bfc65@koremail.com> References: <5A95D192.5050608@unicode.org> <91680448.22170.1519824152519@ox.hosteurope.de> <83722fa3ed05a8b0989a963b3f26833a@koremail.com> <31a6d3ce-d2c2-03eb-4c63-79679b68a245@it.aoyama.ac.jp> <447c571bad4174b493e4bd42ee7a41f2@koremail.com> <20180307202621.770d1099@JRWUBU2> <51c12b4974b1cc0d2f476e29db47b99d@koremail.com> <20180308230603.3fb73ef6@JRWUBU2> <269e1fb39a96b080981fd2748b6bfc65@koremail.com> Message-ID: On 3/9/2018 9:29 AM, via Unicode wrote: > Documented increase such as scientific terms for new elements, flora > and fauna, would seem to be not more one or two dozen a year. Indeed. Of the "urgently needed characters" added to the unified CJK ideographs for Unicode 11.0, two were obscure place name characters needed to complete mapping for the Japanese IT mandatory use of the Moji Joho collection. The other three were newly standardized Chinese characters for superheavy elements that now have official designations by the IUPAC (as of December 2015): Nihonium (113), Tennessine (117) and Oganesson (118). The Chinese characters coined for those 3 were encoded at U+9FED, U+9FEC, and U+9FEB, respectively. Oganesson, in particular, is of interest, as the heaviest known element produced to date. It is the subject of 1000's of hours of intense experimentation and of hundreds of scientific papers, but: ... since 2005, only five (possibly six) atoms of the nuclide ^294 Og have been detected. But we already have a Chinese character (pronounced ?o) for Og, and a standardized Unicode code point for it: U+9FEB. Next up: unobtanium and hardtofindium --Ken -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Mar 9 15:19:46 2018 From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode) Date: Fri, 9 Mar 2018 22:19:46 +0100 Subject: A sketch with the best-known Swiss tongue twister In-Reply-To: References: <42104177-7b70-a4a1-c1e9-6d376f3bb5cc@uni-konstanz.de> <042AD1BF-3709-43AA-B4BC-8E748B37CE62@bluesky.org> Message-ID: > In summary you do not object the fact that unqualified "gsw" language code ?Whether I object or not makes no? difference. Whether for good or for bad, the gsw code (clearly originally for German-Swiss from the code letters) has been expanded beyond the borders of Switzerland. There are also separate codes for Schw?bisch and Waliserd?tsch, so outside of Switzerland 'gsw' mainly extends to Elsassisch (Alsace, ~0.5M speakers). So gsw-CH works to limit the scope to Switzerland (~4.5M speakers). > My opinion is that even the Swiss variants should be preferably named "Swiss Alemannic" collectively... That's clearly also not going to happen for the English term. Good luck with the French equivalent... Mark On Fri, Mar 9, 2018 at 3:52 PM, Philippe Verdy wrote: > In summary you do not object the fact that unqualified "gsw" language code > is not (and should not be) named "Swiss German" (as it is only for > "gsw-CH", not for any other non-Swiss variants of Alemannic). > > The addition of "High" is optional, unneeded in fact, as it does not > remove any ambiguity, in Germany for "de-DE", or in Switzerland for > "de-CH", or in Italian South Tyrol for "de-IT", or in Austria for "de-AT", > or even for "Standard German" (de) > > Note also that Alsatian itself ("gsw-FR") is considered part of the "High > German" branch of Germanic languages ! > > "High German" refers to the group that includes Standard German and its > national variants ("de", "de-DE", "de-CH", "de-AT", "de-CH", "de-IT") as > well as the Alemannic group ( "gsw" , "gsw-FR", "gsw-CH"), possibly extended > (this is discutable) to Schw?bish in Germany and Hungary. > > My opinion is that even the Swiss variants should be preferably named > "Swiss Alemannic" collectively, and not "Swiss German" which causes > constant confusion between "de-CH" and "gsw-CH". > > > 2018-03-09 15:11 GMT+01:00 Mark Davis ?? via Unicode > : > >> Yes, the right English names are "Swiss High German" for de-CH, and >> "Swiss German" for gsw-CH. >> >> Mark >> >> On Fri, Mar 9, 2018 at 2:40 PM, Tom Gewecke via Unicode < >> unicode at unicode.org> wrote: >> >>> >>> > On Mar 9, 2018, at 5:52 AM, Philippe Verdy via Unicode < >>> unicode at unicode.org> wrote: >>> > >>> > So the "best-known Swiss tongue" is still not so much known, and still >>> incorrectly referenced (frequently confused with "Swiss German", which is >>> much like standard High German >>> >>> I think Swiss German is in fact the correct English name for the Swiss >>> dialects, taken from the German Schweizerdeutsch. >>> >>> https://en.wikipedia.org/wiki/Swiss_German >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Sat Mar 10 05:26:42 2018 From: unicode at unicode.org (philip chastney via Unicode) Date: Sat, 10 Mar 2018 11:26:42 +0000 (UTC) Subject: A sketch with the best-known Swiss tongue twister References: <392967520.14691005.1520681202416.ref@mail.yahoo.com> Message-ID: <392967520.14691005.1520681202416@mail.yahoo.com> it is not clear whether you are quoting from some agreed standard, quoting from some other authority, or constructing a classification of your own whatever the classification, it should be descriptive, and it is best not to be too pedantic, because practice can vary from region to region, from individual to individual with the same region, and from context to context for an individual I would make the following observations on terminology in practice: -- the newspapers in Zurich advertised courses in "Schweizerdeutsch", meaning the contemporary spoken language -- in Wengen (pronounced with a [w] not a [v]), I tried to explain to the man behind the counter that my ski binding needed fixing, using my best High German (with a Stuttgart accent, according to my tutor - he came from Hannover, so I don't think it was intended as a compliment) with a muttered "momenta", the owner dived into the back of the shop, to fetch the technician, whose skills included conversation in High German -- I told him my problem, he told me it wasn't worth fixing, and I said, "Oh, bugger" at this point, they realised I was a Brit, and (at their request) we switched to English ("so much easier", the owner said) -- for all 3 of us, High German was a foreign language -- in Romansch-speaking St. Moritz, the hotels claim to be able to accomodate those who speak High German, as well as those who speak Swiss German (because the two languages are not always mutually intelligible) -- the newspapers in Zurich advertised courses in "Hoch Deutsch", for those who needed to deal with foreigners -- when I lived that way, the French-speaking population of Nancy referred to the language of their German-speaking compatriots as "platt deutsch" (the way they used the term, it did not extend any further east than Alsace) -- in Luxemburg, the same language was referred to as Luxemburgish (or Letzeburgesch, which is Luxemburgish for "Luxemburgish ") (I forget what the Belgians called the language spoken in Ostbelgien) -- I was assured by a Luxemburgish-speaking car mechanic, with a Swiss German speaking wife, that the two languages (dialects?) were practically identical, except for the names of some household items in short, there seems little point in making distinctions which cannot be precisely identified in practice there appear to be significant differences between between High German and (what the natives call) Swiss German there are far fewer significant differences between Swiss German and the other spoken Germanic languages found on the borders of Germany /phil -------------------------------------------- On Fri, 9/3/18, Philippe Verdy via Unicode wrote: Subject: Re: A sketch with the best-known Swiss tongue twister To: "Mark Davis ??" Cc: "Tom Gewecke" , "unicode Unicode Discussion" Date: Friday, 9 March, 2018, 2:52 PM In summary you do not object the fact that unqualified "gsw" language code is not (and should not be) named "Swiss German" (as it is only for "gsw-CH", not for any other non-Swiss variants of Alemannic). The addition of "High" is optional, unneeded in fact, as it does not remove any ambiguity, in Germany for "de-DE", or in Switzerland for "de-CH", or in Italian South Tyrol for "de-IT", or in Austria for "de-AT", or even for "Standard German" (de) Note also that Alsatian itself ("gsw-FR") is considered part of the "High German" branch of Germanic languages ! "High German" refers to the group that includes Standard German and its national variants ("de", "de-DE", "de-CH", "de-AT", "de-CH", "de-IT") as well as the Alemannic group ( "gsw"?, "gsw-FR", "gsw-CH"), possibly?extended (this is discutable) to Schw?bish in Germany and Hungary. My opinion is that even the Swiss variants should be preferably named "Swiss Alemannic" collectively, and not "Swiss German" which causes constant confusion between "de-CH" and "gsw-CH". From unicode at unicode.org Sat Mar 10 06:16:48 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Sat, 10 Mar 2018 13:16:48 +0100 Subject: A sketch with the best-known Swiss tongue twister In-Reply-To: <392967520.14691005.1520681202416@mail.yahoo.com> References: <392967520.14691005.1520681202416.ref@mail.yahoo.com> <392967520.14691005.1520681202416@mail.yahoo.com> Message-ID: 2018-03-10 12:26 GMT+01:00 philip chastney : > > -- when I lived that way, the French-speaking population of Nancy referred > to the language of their German-speaking compatriots as "platt deutsch" > (the way they used the term, it did not extend any further east than Alsace) > Note that this is what you heard in Lorraine, and there's some competition between Lorraine and Alsace. If you lived in Alsace they absolutely don't like to have their language named "German" or "Deutsch" or "platt Deutsch", this is "alsacien" for them and nothing else even if people in Lorraine (that use other regional oil languages, not based on the Germanic substrate but on Romance substrate) refer to Alsatians as "platt deutsch" with even more confusion as it actually mean "low German" and confusing with "nds" spoken much further to the North (North-western Germany and Netherlands) and not at all in France, not even in the Nord department (where Flemish, i.e. a local variant of Dutch="nl-FR" is spoken by a small aging minority around Dunkerque and nearly extinct now everywhere in the French Flanders and extinct now in Lille, replaced since long by the popular Lillois variant of Picard locally named "ch'timi"). -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Sat Mar 10 12:26:32 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Sat, 10 Mar 2018 19:26:32 +0100 Subject: A sketch with the best-known Swiss tongue twister In-Reply-To: <20180310180235.GB3698923@phare.normalesup.org> References: <392967520.14691005.1520681202416.ref@mail.yahoo.com> <392967520.14691005.1520681202416@mail.yahoo.com> <20180310180235.GB3698923@phare.normalesup.org> Message-ID: 2018-03-10 19:02 GMT+01:00 Arthur Reutenauer < arthur.reutenauer at normalesup.org>: > Philippe, > > So many approximations and misinterpretations ... > > > Note that this is what you heard in Lorraine, and there's some > competition > > between Lorraine and Alsace. If you lived in Alsace they absolutely don't > > like to have their language named "German" or "Deutsch" or "platt > Deutsch", > > this is "alsacien" for them and nothing else > > Condescending, are we? This can of course be a delicate issue, > especially if expressed insensitively, but most people are also able to > recognise objective truths. I never heard anyone deny that Alsatian was > a dialect of German, except the totally misinformed. There is even a > good feeling of connection with the dialects beyond the border, in Baden > in particular (not so much in Switzerland) -- and an acknowledgement > that dialects become quite different further inland. > > > even if people in Lorraine > > (that use other regional oil languages, not based on the Germanic > substrate > > but on Romance substrate) refer to Alsatians as "platt deutsch" with even > > more confusion as it actually mean "low German" and confusing with "nds" > > spoken much further to the North (North-western Germany and Netherlands) > > Where do I start? > > 1. That?s not what Philip said > 2. There is a Germanic dialect in Lorraine, with a large number of > speakers > The dialect of Lorraine with the large number of speaker is not the one you think about, yes it is a Romance/O?l language and not Germanic at all. The one you are refering to is only in a very small tiny part of Lorraine and almost extinct. 3. Platt just means dialect in German > 4. Nobody is confusing Lothringer Platt with Low German, except perhaps > you > You are confusing it with the "parler lorrain" (as I said, "Lothringer Platt", part of "Francique" is nearly extinct in Lorraine, this is not the case of the "Parler lorrain", also known in Belgium as "Gaumais" and very near from "Wallon"). > 5. If you?re going to write ?o?l languages? in English you could at > least put the diaeresis on the ?i?, otherwise it really looks silly > Sorry, my message was posted in English, I had not realized that "Oil" with the capital would look so silly without the diaeresis and in this context, as if we were sepaking about olives or burnable energy. -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Sat Mar 10 14:44:14 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Sat, 10 Mar 2018 21:44:14 +0100 Subject: A sketch with the best-known Swiss tongue twister In-Reply-To: <20180310193359.GA3818257@phare.normalesup.org> References: <392967520.14691005.1520681202416.ref@mail.yahoo.com> <392967520.14691005.1520681202416@mail.yahoo.com> <20180310180235.GB3698923@phare.normalesup.org> <20180310193359.GA3818257@phare.normalesup.org> Message-ID: Apparently you just trust Wikipedia that uses old sources. Very poopulated area does not mean it is populated by native speakers. There were lots of migrants that never spoke anything than just standard French or French slightly "creolized" with foreign languages (but these adapations are also disappearing in younger generations of people born in France from migrants). The "Francique" is not so popular, much less than Alsatian (Alsace is very densily populated too) and Francique" is not the same as Alsatian and doest not have the same mevel of protection by cultural relation institututions (there's no national support at all only regional initiatives or initiatives taken by municipalities to support schools, and some museums or local universities with linguistic study branches). Apparently you've never been in France: regional languages have low levels of support (lower than the support for English or Standard German or Spanish in higher levels of education, or even Arabic, Latin and Hebrew, sponsored by private educational institutiuons where a minimum "trunk" for standard French is still mandatory for most domains). I really doubt you can find 400,000 speakers of Francique in Lorraine, except in a very narrow band near Luxembourg in rural areas in an aging population. I've lived and worked in Nancy and Metz, and in fact almost never heard any word in that language, only French and few reginal words. On the opposite the Alsatian language (French Allemanic) is very vivid in Alsace (including in Strasbourg), and not correlated with the Allemanic languages of Switzerland and far enough from standard German to be distinguished. . 2018-03-10 20:33 GMT+01:00 Arthur Reutenauer < arthur.reutenauer at normalesup.org>: > > The dialect of Lorraine with the large number of speaker is not the one > > you think about, yes it is a Romance/O?l language and not Germanic at > all. > > You are not reading what I write, so you can?t know what I?m thinking. > > > The one you are refering to is only in a very small tiny part of Lorraine > > and almost extinct. > > Yes, and that?s the language Philip was talking about, reportedly > called Plattdeutsch by French speakers. What?s your source for ?almost > extinct?? Ethnologue 20th ed. has 400,000 speakers (2013), even > accounting for possible exaggerations that?s hardly extinct. The ?very > small tiny part? where it?s spoken ? 3,300km? according to The Dialects > of Modern German (Charles Russ ed., Routledge 1990) ? is very populous > because of the former mining industry. > > > You are confusing it with the "parler lorrain" (as I said, "Lothringer > > Platt", part of "Francique" is nearly extinct in Lorraine, this is not > the > > case of the "Parler lorrain", also known in Belgium as "Gaumais" and very > > near from "Wallon"). > > You are condescending and your pseudo-erudition gets in the way of the > conversation. Nobody except you mentioned Romance dialects, you just > drifted there on your own. > > Arthur > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Sat Mar 10 23:04:25 2018 From: unicode at unicode.org (Keith Turner via Unicode) Date: Sun, 11 Mar 2018 00:04:25 -0500 Subject: base1024 encoding using Unicode emojis Message-ID: I created a neat little project based on Unicode emojis. I thought some on this list may find it interesting. It encodes arbitrary data as 1024 emojis. The project is called Ecoji and is hosted on github at https://github.com/keith-turner/ecoji Below are some examples of encoding and decoding. $ echo 'Unicode emojis are awesome!!' | ecoji ???????????????????????????????????????????????? $ echo ???????????????????????????????????????????????? | ecoji -d Unicode emojis are awesome!! I would eventually like to create a base4096 version when there are more emojis. Keith From unicode at unicode.org Sun Mar 11 09:46:54 2018 From: unicode at unicode.org (Mathias Bynens via Unicode) Date: Sun, 11 Mar 2018 15:46:54 +0100 Subject: base1024 encoding using Unicode emojis In-Reply-To: References: Message-ID: Neat! Prior art: - https://github.com/watson/base64-emoji - https://github.com/nate-parrott/emojicode On Sun, Mar 11, 2018 at 6:04 AM, Keith Turner via Unicode < unicode at unicode.org> wrote: > I created a neat little project based on Unicode emojis. I thought > some on this list may find it interesting. It encodes arbitrary data > as 1024 emojis. The project is called Ecoji and is hosted on github > at https://github.com/keith-turner/ecoji > > Below are some examples of encoding and decoding. > > $ echo 'Unicode emojis are awesome!!' | ecoji > ???????????????????????????????????????????????? > > $ echo ???????????????????????????????????????????????? | ecoji -d > Unicode emojis are awesome!! > > I would eventually like to create a base4096 version when there are more > emojis. > > Keith > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Sun Mar 11 10:25:13 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Sun, 11 Mar 2018 16:25:13 +0100 Subject: base1024 encoding using Unicode emojis In-Reply-To: References: Message-ID: Ideally, the purpose of such base-1024 encoding is to allow compacting arbitrary data into plain-text which can be safely preserved including by Unicode normalization and transforms by encoding like UTF-8. But then we have a way to do that is such a way that this minimizes the UTF-8 string sizes (Emojis is probably not the best set to use if most of them lie in supplementary planes). You can choose another arbitrary set of 1024 codepoints in the BMP that is preserved by normalization (no decomposition, combining class=0) and text filters (no controls, no whitespaces, possibly no punctuation, only letters or digits) and which is still simple to compute with a basic algorithm not requiring any table lookup (only a few tests for some boundary values or a very small lookup table with 16 entries, one entry for each subset of 64 values). As well some frequent binary data (notably runs of null bytes) should be able to use shorter UTF-8 sequences from the ASCII set, so my opinion is that the 64 first codes should be the same as standard Base-64, others can be taken easily from CJK blocks, or the PUA block in the BMP, but you can also select some blocks below the U+0800 codepoint so that they get encoded as 2 bytes and not 3 for the rest of the BMP (and 4 bytes for most emojis, where 10 bits become 64 bits with a huge waste of storage space in UTF-8) So the real need it to find the smallest set of subranges with 64 consecutive codepoints with minimal values that contain only letters or digits and where all positions are assigned with such general properties. Emojis will unlikely be part of them ! With this goal, you can even avoid using any PUAs (which are likely to be filtered/forbidden by some protocols), or compatibility characters (likely to be transformed by NFKC/NFKD). And even within just the BMP, you could reach more than 10-bit encoding (base-1024) and can probably find 12-bit encoding (base 4096) or more (CJK blocks of the BMP offer wide ranges of suitable characters, as well as some extended Latin or extended Cyrillic blocks) If you want to use supplementary characters that are already encoded, then you can certainly use CJK blocks in the large supplementary ideographic plane and create a 16-bit encoding (base 65536). Only some legacy Emojis in the BMP will be used before that. 2018-03-11 6:04 GMT+01:00 Keith Turner via Unicode : > I created a neat little project based on Unicode emojis. I thought > some on this list may find it interesting. It encodes arbitrary data > as 1024 emojis. The project is called Ecoji and is hosted on github > at https://github.com/keith-turner/ecoji > > Below are some examples of encoding and decoding. > > $ echo 'Unicode emojis are awesome!!' | ecoji > ???????????????????????????????????????????????? > > $ echo ???????????????????????????????????????????????? | ecoji -d > Unicode emojis are awesome!! > > I would eventually like to create a base4096 version when there are more > emojis. > > Keith > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Sun Mar 11 12:07:27 2018 From: unicode at unicode.org (Keith Turner via Unicode) Date: Sun, 11 Mar 2018 13:07:27 -0400 Subject: base1024 encoding using Unicode emojis In-Reply-To: References: Message-ID: On Sun, Mar 11, 2018 at 11:25 AM, Philippe Verdy wrote: > Ideally, the purpose of such base-1024 encoding is to allow compacting > arbitrary data into plain-text which can be safely preserved including by > Unicode normalization and transforms by encoding like UTF-8. > But then we have a way to do that is such a way that this minimizes the > UTF-8 string sizes (Emojis is probably not the best set to use if most of > them lie in supplementary planes). > Yeah, it certainly results in larger utf8 strings. For example a sha256 hash is 112 bytes when encoded as Ecoji utf8. For base64, sha256 is 44 bytes. Even though its more bytes, Ecoji has less visible characters than base64 for sha256. Ecoji has 28 visible characters and base64 44. So that makes me wonder which one would be quicker for a human to verify on average? Also, which one is more accurate for a human to verify? I have no idea. For accuracy, it seems like a lot of thought was put into the visual uniqueness of Unicode emojis. > > You can choose another arbitrary set of 1024 codepoints in the BMP that is > preserved by normalization (no decomposition, combining class=0) and text > filters (no controls, no whitespaces, possibly no punctuation, only letters > or digits) and which is still simple to compute with a basic algorithm not > requiring any table lookup (only a few tests for some boundary values or a > very small lookup table with 16 entries, one entry for each subset of 64 > values). > > As well some frequent binary data (notably runs of null bytes) should be > able to use shorter UTF-8 sequences from the ASCII set, so my opinion is > that the 64 first codes should be the same as standard Base-64, others can > be taken easily from CJK blocks, or the PUA block in the BMP, but you can > also select some blocks below the U+0800 codepoint so that they get encoded > as 2 bytes and not 3 for the rest of the BMP (and 4 bytes for most emojis, > where 10 bits become 64 bits with a huge waste of storage space in UTF-8) > > So the real need it to find the smallest set of subranges with 64 > consecutive codepoints with minimal values that contain only letters or > digits and where all positions are assigned with such general properties. > Emojis will unlikely be part of them ! With this goal, you can even avoid > using any PUAs (which are likely to be filtered/forbidden by some > protocols), or compatibility characters (likely to be transformed by > NFKC/NFKD). > > And even within just the BMP, you could reach more than 10-bit encoding > (base-1024) and can probably find 12-bit encoding (base 4096) or more (CJK > blocks of the BMP offer wide ranges of suitable characters, as well as some > extended Latin or extended Cyrillic blocks) > > If you want to use supplementary characters that are already encoded, then > you can certainly use CJK blocks in the large supplementary ideographic > plane and create a 16-bit encoding (base 65536). Only some legacy Emojis in > the BMP will be used before that. > > > > 2018-03-11 6:04 GMT+01:00 Keith Turner via Unicode : > >> I created a neat little project based on Unicode emojis. I thought >> some on this list may find it interesting. It encodes arbitrary data >> as 1024 emojis. The project is called Ecoji and is hosted on github >> at https://github.com/keith-turner/ecoji >> >> Below are some examples of encoding and decoding. >> >> $ echo 'Unicode emojis are awesome!!' | ecoji >> ???????????????????????????????????????????????? >> >> $ echo ???????????????????????????????????????????????? | ecoji -d >> Unicode emojis are awesome!! >> >> I would eventually like to create a base4096 version when there are more >> emojis. >> >> Keith >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Sun Mar 11 12:32:40 2018 From: unicode at unicode.org (Doug Ewell via Unicode) Date: Sun, 11 Mar 2018 11:32:40 -0600 Subject: base1024 encoding using Unicode emojis In-Reply-To: References: Message-ID: <6FAF09B530744A87AA1B3086073F8748@DougEwell> Oh, let him have a little fun. At least he's using emoji for something related to characters, instead of playing Mr. Potato Head. Incidentally, more prior art on large-base encoding: https://sites.google.com/site/markusicu/unicode/base16k -- Doug Ewell | Thornton, CO, US | ewellic.org From unicode at unicode.org Sun Mar 11 13:35:11 2018 From: unicode at unicode.org (Marcel Schneider via Unicode) Date: Sun, 11 Mar 2018 19:35:11 +0100 (CET) Subject: Translating the standard Message-ID: <217796935.14420.1520793311427.JavaMail.www@wwinf1m17> On Fri, 9 Mar 2018 08:41:35 -0800, Ken Whistler wrote: > > > On 3/9/2018 6:58 AM, Marcel Schneider via Unicode wrote: > > As of translating the Core spec as a whole, why did two recent attempts crash even > > before the maintenance stage, while the 3.1 project succeeded? > > Essentially because both the Japanese and the Chinese attempts were > conceived of as commercial projects, which ultimately did not cost out > for the publishers, I think. I immediately thought of these projects as government?funded initiatives, which is most coherent with the importance of Unicode?s work for these nations given that the unified CJK repertoire has always consumed the most of the Consortium?s resources, I figure out. However, looking into early translations on the Unicode site, only those governments that are close to the United Kingdom are unveiled (or not) to have helped promote Unicode education. And from the one among the three terminological vocabularies that I?m able to parse, as well as from the 60+ What?is?Unicode translations, we gain the chilling impression that once the early enthusiasm had passed away, any level of effort dropped down to zero. To such an extent that even the link to the translation guidelines has been removed from the first place: http://www.unicode.org/help/translation.html | | Although its working language is English, the Unicode Consortium strives to reach as many people | and organizations in as many countries as possible around the world. One way of doing that is by | encouraging the translation of Unicode material into languages other than English. | | This page guides volunteers who wish to contribute a translation of any Unicode material | they deem interesting to their local audiences. I fail to understand why increasing complexity decreases the need to be widely understood. Recurrent threads show how slowly Unicode education is spreading among English native speakers; others incidentally complained about Unicode?educational issues in African countries. *Not* translating the Standard ? in whatever way ? won?t help steepen the curve. Best regards, Marcel [To be continued; sorry for delay.] From unicode at unicode.org Sun Mar 11 16:14:18 2018 From: unicode at unicode.org (Marcel Schneider via Unicode) Date: Sun, 11 Mar 2018 22:14:18 +0100 (CET) Subject: Translating the standard In-Reply-To: <20180311200503.GB216921@phare.normalesup.org> References: <217796935.14420.1520793311427.JavaMail.www@wwinf1m17> <20180311200503.GB216921@phare.normalesup.org> Message-ID: <1354541898.17674.1520802858975.JavaMail.www@wwinf1m17> On 11/03/18 21:05, Arthur Reutenauer wrote: > > On Sun, Mar 11, 2018 at 07:35:11PM +0100, Marcel Schneider via Unicode wrote: > > I fail to understand why increasing complexity decreases the need to be > > widely understood. > > I?m pretty sure that everybody will agree that the need gets all the > greater as Unicode and connected technologies get more complex. But you > can hopefully see that the cost also increases, and that?s incentive > enough to refrain from doing it ? as it already was very costly fifteen > years ago, it?s likely to be prohibitive today. > > > Recurrent threads show how slowly Unicode education > > is spreading among English native speakers; others incidentally complained > > about Unicode?educational issues in African countries. *Not* translating > > the Standard ? in whatever way ? won?t help steepen the curve. > > Nobody is saying ?let?s not translate the Unicode Standard?; what > several people here have pointed out is that it pays to have more modest > and manageable goals. Besides, you?re hinting yourself that the > problems are not only with translation, since they also affect native > English speakers. Indeed, to be fair. And for implementers, documenting themselves in English may scarcely ever have much of a problem, no matter what?s the locale. Today?s policy is, that we are welcome to browse Wikipedia: http://www.unicode.org/standard/WhatIsUnicode.html Fundamentally that?s true (although the wording could use some fixes as of the difference between *using* Unicode and *documenting* Unicode), and it?s consistent with actual trends. As of the cost ? It still seems to me that we?re far from the last word? Best regards, Marcel From unicode at unicode.org Mon Mar 12 02:39:53 2018 From: unicode at unicode.org (Alastair Houghton via Unicode) Date: Mon, 12 Mar 2018 07:39:53 +0000 Subject: Translating the standard In-Reply-To: <1354541898.17674.1520802858975.JavaMail.www@wwinf1m17> References: <217796935.14420.1520793311427.JavaMail.www@wwinf1m17> <20180311200503.GB216921@phare.normalesup.org> <1354541898.17674.1520802858975.JavaMail.www@wwinf1m17> Message-ID: On 11 Mar 2018, at 21:14, Marcel Schneider via Unicode wrote: > > Indeed, to be fair. And for implementers, documenting themselves in English > may scarcely ever have much of a problem, no matter what?s the locale. Agreed. Implementers will already understand English; you can?t write computer software without, since almost all documentation is in English, almost all computer languages are based on English, and, to be frank, a large proportion of the software market is itself English speaking. I have yet to meet a software developer who didn?t speak English. That?s not to say that people wouldn?t appreciate a translation of the standard, but there are, as others have pointed out, obvious maintenance problems, not to mention the issue that plagues some international institutions, namely the fact that translations are necessarily non-canonical and so those who really care about the details of the rules usually have to refer to a version in a particular language (sometimes that language might be French rather than English; very occasionally there are two versions declared, for political reasons, to both be canonical, which is obviously risky as there?s a chance they might differ subtly on some point, perhaps even because of punctuation). In terms of widespread understanding of the standard, which is where I think translation is perhaps more important, I?m not sure translating the actual standard itself is really the way forward. It?d be better to ensure that there are reliable translations of books like Unicode Demystified or Unicode Explained - or, quite possibly, other books aimed more at the general public rather than the software community per se. Kind regards, Alastair. -- http://alastairs-place.net From unicode at unicode.org Mon Mar 12 02:59:48 2018 From: unicode at unicode.org (Marcel Schneider via Unicode) Date: Mon, 12 Mar 2018 08:59:48 +0100 (CET) Subject: Translating the standard In-Reply-To: <8f0369d7-4fea-8685-1770-cfb5b895fbae@att.net> References: <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com> <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp> <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net> <944626039.25442.1520472427152.JavaMail.www@wwinf1m17> <20180308090328.336a734f@JRWUBU2> <2eVJnayaB1fI6v03wDXiNHjzi3sd4iQVcP3B6zKo0jNlJZbIv3NBHTTxbNF09FVxQnyHi-in9vx9eReFVRc2Cg==@protonmail.ch> <53251075.18695.1520532306737.JavaMail.www@wwinf1m17> <20180308183304.GB2050855@phare.normalesup.org> <1296021581.14406.1520607509951.JavaMail.www@wwinf1m17> <8f0369d7-4fea-8685-1770-cfb5b895fbae@att.net> Message-ID: <1359906050.2399.1520841589070.JavaMail.www@wwinf1m17> On Fri, 9 Mar 2018 08:41:35 -0800, Ken Whistler wrote: > > > On 3/9/2018 6:58 AM, Marcel Schneider via Unicode wrote: > > As of translating the Core spec as a whole, why did two recent attempts crash even > > before the maintenance stage, while the 3.1 project succeeded? > > Essentially because both the Japanese and the Chinese attempts were > conceived of as commercial projects, which ultimately did not cost out > for the publishers, I think. Both projects attempted limiting the scope > of their translation to a subset of the core spec that would focus on > East Asian topics, but the core spec is complex enough that it does not > abridge well. And I think both projects ran into difficulties in trying > to figure out how to deal with fonts and figures. This is normally catered for by Unicode whose fonts are donated and licensed for the sole purpose of documenting the Standard. See FAQ. Templates of any material to be translated are sent by Unicode, aren?t they? The Unicode home page reads: ?An essential part of our mission is to educate and engage academic and scientific communities, and the general public.? Therefore, translators should just have to translate e.g. the NamesList following Ken?s sample localization (TN #24) ? which is already a hard piece of work ? and send the file to Unicode, to get a localized version of the Code Charts. Likewise ISO/IEC 10646 is available in a French version or at least, it should have an official ?French version like all ISO standards. If Unicode don?t own the tooling yet, Apple shall be happy to donate the funding to get Unicode in a position to fulfill their mission thoroughly, like Apple (supposedly) donate non?trivial amounts to many vendors to get them remove old software from the internet. Using such localized NamesLists with Unibook to browse the Code Charts locally is another question, since that supposes handing the fonts out to the general public. So that is clearly a non?starter. But browsing localized Code Charts in Adobe Reader would be a nice facility. Best regards, Marcel From unicode at unicode.org Mon Mar 12 03:34:01 2018 From: unicode at unicode.org (Marcel Schneider via Unicode) Date: Mon, 12 Mar 2018 09:34:01 +0100 (CET) Subject: Translating the standard Message-ID: <789893273.3371.1520843642008.JavaMail.www@wwinf1m17> On Mon, 12 Mar 2018 07:39:53 +0000, Alastair Houghton wrote: > > On 11 Mar 2018, at 21:14, Marcel Schneider via Unicode wrote: > > > > Indeed, to be fair. And for implementers, documenting themselves in English > > may scarcely ever have much of a problem, no matter what?s the locale. > > Agreed. Implementers will already understand English; you can?t write computer software > without, since almost all documentation is in English, almost all computer languages are > based on English, and, to be frank, a large proportion of the software market is itself > English speaking. I have yet to meet a software developer who didn?t speak English. > > That?s not to say that people wouldn?t appreciate a translation of the standard, but there are, > as others have pointed out, obvious maintenance problems, not to mention the issue that > plagues some international institutions, namely the fact that translations are necessarily > non-canonical and so those who really care about the details of the rules usually have to refer > to a version in a particular language (sometimes that language might be French rather than > English; very occasionally there are two versions declared, for political reasons, to both be > canonical, which is obviously risky as there?s a chance they might differ subtly on some point, > perhaps even because of punctuation). Sometimes it occurred in the EU that the French version was so sloppy it transformed the issue to entirely another one, but at the Unicode?ISO/IEC merger the bad will was clearly on the other side ? > > In terms of widespread understanding of the standard, which is where I think translation is > perhaps more important, I?m not sure translating the actual standard itself is really the way > forward. It?d be better to ensure that there are reliable translations of books like > Unicode Demystified or Unicode Explained - or, quite possibly, other books aimed more at > the general public rather than the software community per se. Good point. What we need most of all is a complete terminology, as well as full ranges of character names in every language, to enable people to talk about it after reading in English. Best regards, Marcel From unicode at unicode.org Mon Mar 12 04:11:09 2018 From: unicode at unicode.org (=?UTF-8?Q?Martin_J._D=c3=bcrst?= via Unicode) Date: Mon, 12 Mar 2018 18:11:09 +0900 Subject: base1024 encoding using Unicode emojis In-Reply-To: References: Message-ID: <3b0f9914-9d09-5fb9-9e3e-e68493a81e8a@it.aoyama.ac.jp> On 2018/03/12 02:07, Keith Turner via Unicode wrote: > Yeah, it certainly results in larger utf8 strings. For example a sha256 > hash is 112 bytes when encoded as Ecoji utf8. For base64, sha256 is 44 > bytes. > > Even though its more bytes, Ecoji has less visible characters than base64 > for sha256. Ecoji has 28 visible characters and base64 44. So that makes > me wonder which one would be quicker for a human to verify on average? > Also, which one is more accurate for a human to verify? I have no idea. For > accuracy, it seems like a lot of thought was put into the visual uniqueness > of Unicode emojis. Using emoji to help people verify security information is an interesting idea. What I'm afraid is that even if emoji are designed with distinctiveness in mind, some people may have difficulties distinguish all the various face variants. Also, while emoji get designed so that in-font distinguishability is high, the same may not apply across fonts (e.g. if one has to compare a printed version with a version on-screen). Regards, Martin. >> 2018-03-11 6:04 GMT+01:00 Keith Turner via Unicode : >> >>> I created a neat little project based on Unicode emojis. I thought >>> some on this list may find it interesting. It encodes arbitrary data >>> as 1024 emojis. The project is called Ecoji and is hosted on github >>> at https://github.com/keith-turner/ecoji >>> >>> Below are some examples of encoding and decoding. >>> >>> $ echo 'Unicode emojis are awesome!!' | ecoji >>> ???????????????????????????????????????????????? >>> >>> $ echo ???????????????????????????????????????????????? | ecoji -d >>> Unicode emojis are awesome!! >>> >>> I would eventually like to create a base4096 version when there are more >>> emojis. From unicode at unicode.org Mon Mar 12 05:00:16 2018 From: unicode at unicode.org (Andrew West via Unicode) Date: Mon, 12 Mar 2018 10:00:16 +0000 Subject: Translating the standard In-Reply-To: <1359906050.2399.1520841589070.JavaMail.www@wwinf1m17> References: <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com> <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp> <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net> <944626039.25442.1520472427152.JavaMail.www@wwinf1m17> <20180308090328.336a734f@JRWUBU2> <2eVJnayaB1fI6v03wDXiNHjzi3sd4iQVcP3B6zKo0jNlJZbIv3NBHTTxbNF09FVxQnyHi-in9vx9eReFVRc2Cg==@protonmail.ch> <53251075.18695.1520532306737.JavaMail.www@wwinf1m17> <20180308183304.GB2050855@phare.normalesup.org> <1296021581.14406.1520607509951.JavaMail.www@wwinf1m17> <8f0369d7-4fea-8685-1770-cfb5b895fbae@att.net> <1359906050.2399.1520841589070.JavaMail.www@wwinf1m17> Message-ID: On 12 March 2018 at 07:59, Marcel Schneider via Unicode wrote: > > Likewise ISO/IEC 10646 is available in a French version No it is not, and never has been. Why don't you check your facts before making misleading statements to this list? > or at least, it should have an official French version like all ISO standards. That is also blatantly untrue. Only six of the publicly available ISO standards listed at http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html have French versions, and one has a Russian version. You will notice that there is no French version of ISO/IEC 10646. Andrew From unicode at unicode.org Mon Mar 12 06:55:54 2018 From: unicode at unicode.org (Marcel Schneider via Unicode) Date: Mon, 12 Mar 2018 12:55:54 +0100 (CET) Subject: Translating the standard In-Reply-To: References: <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com> <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp> <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net> <944626039.25442.1520472427152.JavaMail.www@wwinf1m17> <20180308090328.336a734f@JRWUBU2> <2eVJnayaB1fI6v03wDXiNHjzi3sd4iQVcP3B6zKo0jNlJZbIv3NBHTTxbNF09FVxQnyHi-in9vx9eReFVRc2Cg==@protonmail.ch> <53251075.18695.1520532306737.JavaMail.www@wwinf1m17> <20180308183304.GB2050855@phare.normalesup.org> <1296021581.14406.1520607509951.JavaMail.www@wwinf1m17> <8f0369d7-4fea-8685-1770-cfb5b895fbae@att.net> <1359906050.2399.1520841589070.JavaMail.www@wwinf1m17> Message-ID: <2130020491.9884.1520855755457.JavaMail.www@wwinf1m17> On Mon, 12 Mar 2018 10:00:16 +0000, Andrew West wrote: > > On 12 March 2018 at 07:59, Marcel Schneider via Unicode > wrote: > > > > Likewise ISO/IEC 10646 is available in a French version > > No it is not, and never has been. > > Why don't you check your facts before making misleading statements to this list? > > > or at least, it should have an official French version like all ISO standards. > > That is also blatantly untrue. > > Only six of the publicly available ISO standards listed at > http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html > have French versions, and one has a Russian version. You will notice > that there is no French version of ISO/IEC 10646. > > Andrew Since ISO has made of standards a business, all prior versions are removed from the internet, so that they don?t show up even in that list (which I?d used to grab a free copy, just to check the differences). Because if they had public archives of the free standards, not having any for the pay standards would stand out even more. This is why if you need an older version for reference, you need to find a good soul in the organization, who will be so kind to make a copy for you in the archives at the headquarters. The last published French version of ISO/IEC 10646 ? to which you contributed ? is still available on Patrick?s site: http://hapax.qc.ca/Tableaux-5.0.htm Actually, the French version has no chief redactor, and during a time, the French version of the NamesList was maintained only so far as to add the new names (for use in ISO 14651). For Unicode 10.0.0, the French translation has been again fully updated to Code Charts production level: http://hapax.qc.ca/ListeNoms-10.0.0.txt (I?d noticed that the contributors? list has slightly shrinked without being able to find out why.) The Code Charts have not been produced, however (because there is actually no redactor?in?chief, as already stated, and also because of budget cuts the government is not in a position to pay the non?trivial amount of money asked for by Unicode for use of the fonts and/or [just trying to be as precise as I can this time| the owner of the tooling needed). Having said that, I still believe that all ISO standards should have a French version, shouldn?t they??????????:) Best regards, Marcel From unicode at unicode.org Mon Mar 12 08:58:32 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Mon, 12 Mar 2018 14:58:32 +0100 Subject: A sketch with the best-known Swiss tongue twister In-Reply-To: <20180311195150.GA216921@phare.normalesup.org> References: <392967520.14691005.1520681202416.ref@mail.yahoo.com> <392967520.14691005.1520681202416@mail.yahoo.com> <20180310180235.GB3698923@phare.normalesup.org> <20180310193359.GA3818257@phare.normalesup.org> <20180311195150.GA216921@phare.normalesup.org> Message-ID: 20 Incidentally, since you have very strong opinions on what things > > should and shouldn?t be called: I don?t see the phrase ?French > Alemannic? catching on at all :-) > I've not used that terminology. In France this is just called "alsacien" (Alsatian in English) and descibed as one of the Alemannic languages/dialects, and never German, nor Swiss, nor a combination of these ! -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Mar 12 09:30:39 2018 From: unicode at unicode.org (Andre Schappo via Unicode) Date: Mon, 12 Mar 2018 14:30:39 +0000 Subject: Unicode 11.0 and 12.0 Cover Design Art Message-ID: One of my project students has an art gallery as a client ? surfacegallery.org This gallery is also a focal point for a collective of local artists. This morning I had a project meeting with this student. I suggested that surface gallery artists might like to submit entries. I showed the Unicode character set to the student and she was well impressed. I also suggested possible cover design art. The basic principle of my suggestions was that the artwork should be constructed from Unicode characters and only Unicode characters. My suggestions included: plants, animals, portraits, cityscape, zoo, farm ...etc... If the artists collective use my suggestions then the unicode cover artwork they submit will most definitely feature Unicode. Recent Unicode cover artwork has not featured Unicode (well not in any way that I can determine) and I think it should and it should feature it prominently and obviously. I do not know who or how the artwork is judged but I think it would be good if members of this list could vote on the submitted cover artwork. Andr? Schappo -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Mar 12 09:55:28 2018 From: unicode at unicode.org (Michel Suignard via Unicode) Date: Mon, 12 Mar 2018 14:55:28 +0000 Subject: Translating the standard In-Reply-To: <2130020491.9884.1520855755457.JavaMail.www@wwinf1m17> References: <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com> <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp> <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net> <944626039.25442.1520472427152.JavaMail.www@wwinf1m17> <20180308090328.336a734f@JRWUBU2> <2eVJnayaB1fI6v03wDXiNHjzi3sd4iQVcP3B6zKo0jNlJZbIv3NBHTTxbNF09FVxQnyHi-in9vx9eReFVRc2Cg==@protonmail.ch> <53251075.18695.1520532306737.JavaMail.www@wwinf1m17> <20180308183304.GB2050855@phare.normalesup.org> <1296021581.14406.1520607509951.JavaMail.www@wwinf1m17> <8f0369d7-4fea-8685-1770-cfb5b895fbae@att.net> <1359906050.2399.1520841589070.JavaMail.www@wwinf1m17> <2130020491.9884.1520855755457.JavaMail.www@wwinf1m17> Message-ID: Time to correct some facts. The French version of ISO/IEC 10646 (2003 version) were done in a separate effort by Canada and France NBs and not within SC2 proper. National bodies are always welcome to try to transpose and translate an ISO standard. But unless this is done by the ISO Sub-committee (SC2 here) itself, this is not a long-term solution. This was almost 15 years ago. I should know, I have been project editor for 10646 since October 2000 (I started as project editor in 1997 for part-2, and been involved in both Unicode and SC2 since 1990). Now to some alternative facts: >Since ISO has made of standards a business, all prior versions are removed from the internet, >so that they don?t show up even in that list (which I?d used to grab a free copy, just to check > the differences). Because if they had public archives of the free standards, not having any >for the pay standards would stand out even more. >This is why if you need an older version for reference, you need to find a good soul in > the organization, who will be so kind to make a copy for you in the archives at the > headquarters. OK, yes, the old versions are removed from the ISO site. Andrew has probably easier access to older versions than you through BSI. He has been involved directly in SC2 work for many years. The 2003 version is completely irrelevant now anyway and again was not done by the SC, there was never a project editor for a French version of 10646. >The last published French version of ISO/IEC 10646 ? to which you contributed ? is still available on > Patrick?s site: > >http://hapax.qc.ca/Tableaux-5.0.htm The only live part of that page is the code chart and does not correspond to the 1064:2003 itself (they are in fact Unicode 5.0 charts, however close to 10646:2003 and its first 2 amendments), I am not sure the original 10646:2003 (F), and the 2 translated amendments (1 and 2) are available anywhere and are totally obsolete today anyway. Only Canada and/or Afnor may still have archived versions. >(I?d noticed that the contributors? list has slightly shrinked without being able to find out why.) > The Code Charts have not been produced, however (because there is actually no > redactor?in?chief, as already stated, and also because of budget cuts the government is not in > a position to pay the non?trivial amount of money asked for by Unicode for use of the fonts > and/or [just trying to be as precise as I can this time| the owner of the tooling needed). A bunch of speculation here, never was a 'redactor-in-chief' for French version, Unicode never asked for money because first of all it does not own the tool (it is licensed by the tool owner who btw does this work as a giant goodwill gesture, based on the money received and the amount of work required to get this to work). In a previous message you also made some speculation about Apple role or possibility that have no relationship with reality. >Having said that, I still believe that all ISO standards should have a French version, shouldn?t they?? You are welcome to contribute to that. Good luck though. On a side note, I have been working with the same team of French volunteers to revive the French name list. So, this may re-appear in the Unicode web site at some point. Because I also produce the original code chart (in cooperation with Rick McGowan) for both ISO and Unicode it is a bit easier for me (although non-trivial). It also helps that I can read the French list :-). But the names list is probably as far as you want to go, and even that requires a serious amount of work in term of terms definition and production. Michel From unicode at unicode.org Mon Mar 12 11:22:25 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Mon, 12 Mar 2018 17:22:25 +0100 Subject: Unicode 11.0 and 12.0 Cover Design Art In-Reply-To: References: Message-ID: The problem with some recent covers is that they either - had no meaning (not even implied), they where just marble textures. - or where too culturally centered, showing some scripts or a specific projection of the Earth - I have sent a proposal something that is culturally neutral, it evokates a chart of characters, but not using any actual glyph, and suggesting some maps with continents/islands, but not real maps, and easily scalable/croppable at any resolution (It should be noted that the edition will have several volumes and that the central vertical part of the image may be variable. as well I avoided implying an horizontal or vertical layout, placing the grid at uneven angles (about 30 degrees so that it scales smoothly without visible artefacts). Also the pattern used is never repeated (all tiles are unique but share some common general aspect, as if it was a regular structure, but still irregular shapes, never twice the same but still aligning cleaning with a semi-regular structure). I was inspired by the beautiful blue mosaics I saw in Portugal. You may of course have other ideas. But characters endoded in Unicode are now very rich (and the glyphs for representing them and combining them are even more rich if we also add the introduction of significant colors). And the general principle was that this was just a background texture that should not obscure the text/titles put on top of it (so it should have low contrasting lines, and should be mostly unicolor, and reasonably dark or pale, still attractive (avoiding low saturation levels of grays). As several concepts are requested for several editions, we may vary these ideas/concepts, including on the central cover border area, where the Unicode logos and titles in smaller fonts should also be clearly distinctive. As well the fine prints (e.g. name of the editor, or a small abstract text on the background side (left part of the suggested image canvas), without necessarily having to map an uniform background panel on it (an uniform white rectangle will be needed for getting a clear black&white barcode, and such insert should be also not more distractive to the titles, meaning that the titles will most probably be white if we want to avoid the uniform background behing them, and this suggests a moderately dark or medium-light colored texture). Some photos may be used of course, or some assembly. but it's hard to predict the exact placement/centering of the photo if the cover size must be adapted to the effective size of the central border area (depending on the number of pages of each volume and the quality/grammage of paper used for printed pages in the book). Given the small diffusion of the book and its price, I think that cheap paper will be used to limit production costs and allow "printing on demand" by publishers (or directly by some online resellers such as Amazon, if they is permitted to print books themselves via their partner publishers in the world, to save expedition costs and storage costs). It is also very likely that most sales could be now for electronic editions. Complex patterns of contrasting lines should be limited to not cover the whole area and should still allow easy placement of large titles, at their placement suggested by the described template, and should avoid touching the central vertical area (border cover) as well as some places needed for usual small prints. 2018-03-12 15:30 GMT+01:00 Andre Schappo via Unicode : > > One of my project students has an art gallery as a client ? > surfacegallery.org This gallery is also a focal point for a collective of > local artists. > > This morning I had a project meeting with this student. I suggested that > surface gallery artists might like to submit entries. > > I showed the Unicode character set to the student and she was well > impressed. I also suggested possible cover design art. > > The basic principle of my suggestions was that the artwork should be > constructed from Unicode characters and only Unicode characters. My > suggestions included: plants, animals, portraits, cityscape, zoo, farm > ...etc... If the artists collective use my suggestions then the unicode > cover artwork they submit will most definitely feature Unicode. > > Recent Unicode cover artwork has not featured Unicode (well not in any way > that I can determine) and I think it should and it should feature it > prominently and obviously. > > I do not know who or how the artwork is judged but I think it would be > good if members of this list could vote on the submitted cover artwork. > > Andr? Schappo > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Mar 12 14:21:49 2018 From: unicode at unicode.org (Rick McGowan via Unicode) Date: Mon, 12 Mar 2018 12:21:49 -0700 Subject: IUC 42 - abstract submission deadline extended to March 16 Message-ID: <5AA6D34D.3080702@unicode.org> Hello everyone, The submission deadline for IUC 42 abstracts has been extended to Friday, March 16. http://www.unicodeconference.org/call-for-participation.htm Hope you can join us in September. Regards, Rick From unicode at unicode.org Mon Mar 12 18:31:24 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Tue, 13 Mar 2018 00:31:24 +0100 Subject: A sketch with the best-known Swiss tongue twister In-Reply-To: <20180312202825.GA1207055@phare.normalesup.org> References: <392967520.14691005.1520681202416.ref@mail.yahoo.com> <392967520.14691005.1520681202416@mail.yahoo.com> <20180310180235.GB3698923@phare.normalesup.org> <20180310193359.GA3818257@phare.normalesup.org> <20180311195150.GA216921@phare.normalesup.org> <20180312202825.GA1207055@phare.normalesup.org> Message-ID: 2018-03-12 21:28 GMT+01:00 Arthur Reutenauer < arthur.reutenauer at normalesup.org>: > On Mon, Mar 12, 2018 at 02:58:32PM +0100, Philippe Verdy via Unicode wrote: > >> should and shouldn?t be called: I don?t see the phrase ?French > >> Alemannic? catching on at all :-) > > > > I've not used that terminology. > > That?s true, you misspelt it as ?French Allemanic?. > False, I've not used that expression at all ! I only cited what you wrote, so there was no typo at all, except possibly by you (the citation above that you wrote yourself). -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Mar 12 19:18:31 2018 From: unicode at unicode.org (Asmus Freytag via Unicode) Date: Mon, 12 Mar 2018 17:18:31 -0700 Subject: A sketch with the best-known Swiss tongue twister In-Reply-To: References: <392967520.14691005.1520681202416.ref@mail.yahoo.com> <392967520.14691005.1520681202416@mail.yahoo.com> <20180310180235.GB3698923@phare.normalesup.org> <20180310193359.GA3818257@phare.normalesup.org> <20180311195150.GA216921@phare.normalesup.org> <20180312202825.GA1207055@phare.normalesup.org> Message-ID: <745ca474-d259-bc95-7d0d-11fdd3f43f40@ix.netcom.com> An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Mar 12 21:49:04 2018 From: unicode at unicode.org (=?UTF-8?B?WWlmw6FuIFfDoW5n?= via Unicode) Date: Tue, 13 Mar 2018 11:49:04 +0900 Subject: Translating the standard In-Reply-To: References: <217796935.14420.1520793311427.JavaMail.www@wwinf1m17> <20180311200503.GB216921@phare.normalesup.org> <1354541898.17674.1520802858975.JavaMail.www@wwinf1m17> Message-ID: 2018-03-12 16:39 GMT+09:00 Alastair Houghton via Unicode : > On 11 Mar 2018, at 21:14, Marcel Schneider via Unicode wrote: >> >> Indeed, to be fair. And for implementers, documenting themselves in English >> may scarcely ever have much of a problem, no matter what?s the locale. > > Agreed. Implementers will already understand English; you can?t write computer software without, since almost all documentation is in English, almost all computer languages are based on English, and, to be frank, a large proportion of the software market is itself English speaking. I have yet to meet a software developer who didn?t speak English. Somewhat digressing from the topic, but I'd like to make some comment on this part as I smell a persistent myth among some, hopefully small number of, software engineers in Anglosphere. First, the fact that computer languages are written using English words doesn't mean that programmers are supposed to have proportional English knowledge. Take the word of Matz, the creator of Ruby language: "The English skill is a super-powerful rare card (in the career path of a Japanese engineer)!" He then continue that you should be in keeping with most up-to-date overseas info/trend in order to be a high-tier engineer and so on. It's far from "requirement". http://eikaiwa.dmm.com/blog/3826/ I've also read somewhere a memoir of a middle-aged programmer who was already into BASIC in childhood. One day he thought he'd written off a "great" program and printed it on paper, but to his surprise, an auntie who took a look at it immediately decoded the program and roughly understood what it was meant to do; she knew English, and he didn't. Programming as such, is just like a Chinese room replaced with English, where you sit inside a cramped room night after night, communicating with a computer by typing in English words the bulky reference guide teaches you. Most East Asian countries are blessed enough with a tremendous number of translated technical publications (e.g. O'Reilly) each year, not to mention firsthand writings in their own languages. So the documentation is easily available if you don't speak English the language. Second, that English is lingua franca doesn't necessarily mean the English spoken in the wild is. The aviation industry is another field which employs English as the common language, but they exert utmost effort to maintain the system working. Namely, they have a controlled word set with semantics as disambiguated as possible, called ASD-STE100, for technical documentation, such as maintenance manuals, to minimize errors caused by limited English knowledge. Unicode, on the other hand, is merely written in a free style used when English speakers who (almost) graduated from college write to English speakers who (almost) graduated from college. Having such level of proficiency being a non-native speaker isn't something trivial, unless someone is constantly in contact with English-speaking community. (And programming community isn't contained inside English-speaking community at all.) That said, I agree to almost everything Alastair said after. If I have to add one more thing, a monolingual writing is usually too tightly coupled with the language, more than engineers may believe, even if the writer carefully chose their words to be context-neutral. Thus it's hard job to say no more and no less than the original text in another language, especially when exactitude matters. It's one of the problems prevent from fully automated translation being a thing, I guess. Best regards, Yifan 2018-03-12 16:39 GMT+09:00 Alastair Houghton via Unicode : > On 11 Mar 2018, at 21:14, Marcel Schneider via Unicode wrote: >> >> Indeed, to be fair. And for implementers, documenting themselves in English >> may scarcely ever have much of a problem, no matter what?s the locale. > > Agreed. Implementers will already understand English; you can?t write computer software without, since almost all documentation is in English, almost all computer languages are based on English, and, to be frank, a large proportion of the software market is itself English speaking. I have yet to meet a software developer who didn?t speak English. > > That?s not to say that people wouldn?t appreciate a translation of the standard, but there are, as others have pointed out, obvious maintenance problems, not to mention the issue that plagues some international institutions, namely the fact that translations are necessarily non-canonical and so those who really care about the details of the rules usually have to refer to a version in a particular language (sometimes that language might be French rather than English; very occasionally there are two versions declared, for political reasons, to both be canonical, which is obviously risky as there?s a chance they might differ subtly on some point, perhaps even because of punctuation). > > In terms of widespread understanding of the standard, which is where I think translation is perhaps more important, I?m not sure translating the actual standard itself is really the way forward. It?d be better to ensure that there are reliable translations of books like Unicode Demystified or Unicode Explained - or, quite possibly, other books aimed more at the general public rather than the software community per se. > > Kind regards, > > Alastair. > > -- > http://alastairs-place.net > > From unicode at unicode.org Tue Mar 13 02:52:57 2018 From: unicode at unicode.org (=?UTF-8?Q?Martin_J._D=c3=bcrst?= via Unicode) Date: Tue, 13 Mar 2018 16:52:57 +0900 Subject: A sketch with the best-known Swiss tongue twister In-Reply-To: <392967520.14691005.1520681202416@mail.yahoo.com> References: <392967520.14691005.1520681202416.ref@mail.yahoo.com> <392967520.14691005.1520681202416@mail.yahoo.com> Message-ID: On 2018/03/10 20:26, philip chastney via Unicode wrote: > I would make the following observations on terminology in practice: > -- the newspapers in Zurich advertised courses in "Hoch Deutsch", for those who needed to deal with foreigners This should probably be written 'the newspapers in Zurich advertised courses in "Hochdeutsch", for foreigners'. Hochdeutsch (Standard German) is the language used in school, and in writing, and while there may be some specialized courses for Swiss people who didn't do well throughout grade school and want to catch up, that's not what the advertisements are about. > -- in Luxemburg, the same language was referred to as Luxemburgish (or Letzeburgesch, which is Luxemburgish for "Luxemburgish ") > (I forget what the Belgians called the language spoken in Ostbelgien) > > -- I was assured by a Luxemburgish-speaking car mechanic, with a Swiss German speaking wife, that the two languages (dialects?) were practically identical, except for the names of some household items I can't comment on this, because I don't remember to ever have listened to somebody speaking Letzeburgesch. > in short, there seems little point in making distinctions which cannot be precisely identified in practice > > there appear to be significant differences between between High German and (what the natives call) Swiss German > > there are far fewer significant differences between Swiss German and the other spoken Germanic languages found on the borders of Germany In terms of linguistic analysis, that may be true. But virtually every native Swiss German speaker would draw a clear line between Swiss German (including the dialect(s) spoken in the upper Valais (Oberwallis), which are classified differently by linguists) and other varieties such as Swabian, Elsatian, Vorarlbergian, or even Letzeburgesch (which I have never seen classified as Allemannic)). The reason for this is not so much basic linguistics, but much more a) vocabulary differences ranging from food to administrative terms, and b) the fact that people hear many different Swiss dialects on Swiss Radio and Television, while that's not the case for the dialects from outside the borders. So in practice, Swiss German can be delineated quite precisely, but more from a sociolinguistic and vocabulary perspective than from a purely evolutionary/historic linguistic perspective. [Disclaimer: I'm not a linguist.] Regards, Martin. From unicode at unicode.org Tue Mar 13 03:39:57 2018 From: unicode at unicode.org (=?UTF-8?Q?Martin_J._D=c3=bcrst?= via Unicode) Date: Tue, 13 Mar 2018 17:39:57 +0900 Subject: A sketch with the best-known Swiss tongue twister In-Reply-To: References: Message-ID: On 2018/03/09 21:24, Mark Davis ?? wrote: > There are definitely many dialects across Switzerland. I think that for > *this* phrase it would be roughly the same for most of the population, with > minor differences (eg 'het' vs 'h?t'). But a native speaker like Martin > would be able to say for sure. Yes indeed. The differences would be in the vowels (not necessarily minor, but your mileage may vary), and the difficulty of this tongue twister is very much on the consonants. Regards, Martin. From unicode at unicode.org Mon Mar 12 23:46:40 2018 From: unicode at unicode.org (Lisa Moore via Unicode) Date: Mon, 12 Mar 2018 21:46:40 -0700 Subject: Unicode 11.0 and 12.0 Cover Design Art In-Reply-To: References: Message-ID: <6b9ce906-6d98-4bba-3f97-8b841dbc65ce@lisamoore.us.com> Dear Andre, Please encourage her and other artists to make a submission. The judges take in many different perspectives, some more character oriented and some more abstract. All are welcome submissions. Thank you, Lisa On 3/12/2018 7:30 AM, Andre Schappo via Unicode wrote: > surface gallery artists might like to submit entries. > > I showed the Unicode character set to the student and she From unicode at unicode.org Tue Mar 13 13:20:55 2018 From: unicode at unicode.org (Marcel Schneider via Unicode) Date: Tue, 13 Mar 2018 19:20:55 +0100 (CET) Subject: Translating the standard In-Reply-To: References: <1877f868cf8e46bd9ce9d1f42827a33e@OS2PR01MB1147.jpnprd01.prod.outlook.com> <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp> <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net> <944626039.25442.1520472427152.JavaMail.www@wwinf1m17> <20180308090328.336a734f@JRWUBU2> <2eVJnayaB1fI6v03wDXiNHjzi3sd4iQVcP3B6zKo0jNlJZbIv3NBHTTxbNF09FVxQnyHi-in9vx9eReFVRc2Cg==@protonmail.ch> <53251075.18695.1520532306737.JavaMail.www@wwinf1m17> <20180308183304.GB2050855@phare.normalesup.org> <1296021581.14406.1520607509951.JavaMail.www@wwinf1m17> <8f0369d7-4fea-8685-1770-cfb5b895fbae@att.net> <1359906050.2399.1520841589070.JavaMail.www@wwinf1m17> <2130020491.9884.1520855755457.JavaMail.www@wwinf1m17> Message-ID: <894159522.24446.1520965256249.JavaMail.www@wwinf1p23> On Mon, 12 Mar 2018 14:55:28 +0000, Michel Suignard wrote: > > Time to correct some facts. > The French version of ISO/IEC 10646 (2003 version) were done in a separate effort by Canada and France NBs and not within SC2 proper. > National bodies are always welcome to try to transpose and translate an ISO standard. But unless this is done by the ISO Sub-committee > (SC2 here) itself, this is not a long-term solution. This was almost 15 years ago. I should know, I have been project editor for 10646 since > October 2000 (I started as project editor in 1997 for part-2, and been involved in both Unicode and SC2 since 1990). Then it can be referred to as ?French version of ISO/IEC 10646? but I?ve got Andrew?s point, too. > > Now to some alternative facts: > >Since ISO has made of standards a business, all prior versions are removed from the internet, > >so that they don?t show up even in that list (which I?d used to grab a free copy, just to check > > the differences). Because if they had public archives of the free standards, not having any > >for the pay standards would stand out even more. > >This is why if you need an older version for reference, you need to find a good soul in > > the organization, who will be so kind to make a copy for you in the archives at the > > headquarters. > > OK, yes, the old versions are removed from the ISO site. Andrew has probably easier access to older versions than you through BSI. > He has been involved directly in SC2 work for many years. The 2003 version is completely irrelevant now anyway and again was not > done by the SC, there was never a project editor for a French version of 10646. Call him whatever, how can a project thrive without a head? I think relevance is not the only criterium in evaluating a translation. The most important would probably be usefulness. Older versions are an appropriate means to get in touch with Unicode, as discussed when some old core specs were proposed on this list. > > >The last published French version of ISO/IEC 10646 ? to which you contributed ? is still available on > > Patrick?s site: > > > >http://hapax.qc.ca/Tableaux-5.0.htm > > The only live part of that page is the code chart and does not correspond to the 1064:2003 itself (they are in fact Unicode 5.0 charts, > however close to 10646:2003 and its first 2 amendments), I am not sure the original 10646:2003 (F), and the 2 translated amendments > (1 and 2) are available anywhere and are totally obsolete today anyway. Only Canada and/or Afnor may still have archived versions. Given that for each time some benevolent people have their nameslist translation ready for print, they have to pay the tool and the fonts ? just plainly disgusting. No wonder once you get such a localized Code Charts edition printed out in PDF, it has everlasting value! > > >(I?d noticed that the contributors? list has slightly shrinked without being able to find out why.) > > The Code Charts have not been produced, however (because there is actually no > > redactor?in?chief, as already stated, and also because of budget cuts the government is not in > > a position to pay the non?trivial amount of money asked for by Unicode for use of the fonts > > and/or [just trying to be as precise as I can this time| the owner of the tooling needed). > > A bunch of speculation here, never was a 'redactor-in-chief' for French version, Unicode never asked for money because first of all > it does not own the tool (it is licensed by the tool owner who btw does this work as a giant goodwill gesture, based on the money received > and the amount of work required to get this to work). Shame! Unicode should manage to get the funding ? no problem for Apple! (but for Microsoft who had to fire many employees) ? so that the developer is fully paid and rewarded. Why has Unicode no unlimited license? Because of the stinginess of those corporate members that have plenty of money to waste. I?ll save that off?topic rant but without ceasing to insist that he must be paid, fully paid and paid back and paid in the future, the more as the Code Charts are now printed annually and grow bigger and bigger. It?s really up to the Consortium to gather the full license fee from their corporate members for the English version and any other interested locale. Unicode?s claim of mission encompasses logically making available for free as many localized Code Charts and whatever else so far as benevolent people translate the sources. Shouldn?t that have been clear from the beginning on? > In a previous message you also made some speculation about Apple role or possibility that have no relationship with reality. > > >Having said that, I still believe that all ISO standards should have a French version, shouldn?t they?? > > You are welcome to contribute to that. Good luck though. > > On a side note, I have been working with the same team of French volunteers to revive the French name list. So, this may re-appear > in the Unicode web site at some point. Because I also produce the original code chart (in cooperation with Rick McGowan) for both ISO > and Unicode it is a bit easier for me (although non-trivial). It also helps that I can read the French list :-). But the names list is probably > as far as you want to go, and even that requires a serious amount of work in term of terms definition and production. Indeed. I experience how true that is. There is a lot of discordance about how to call things. E.g. didn?t you first translate TURNED (R and so on) to RETOURN? or to TOURN?? (I furiously hate CULBUT?). I welcome your effort in updating the French part of the Unicode site. Actually this is so outdated that it is even disallowed to search engines! Here: unicode.org/robots.txt | | Disallow: /fr/ # obsolete pages and charts Shame over shame. And when a guy works over the holidays to get the mess fixed, it is ignored by the UTC. ?Best regards, Marcel From unicode at unicode.org Tue Mar 13 13:38:20 2018 From: unicode at unicode.org (Asmus Freytag via Unicode) Date: Tue, 13 Mar 2018 11:38:20 -0700 Subject: Translating the standard In-Reply-To: <894159522.24446.1520965256249.JavaMail.www@wwinf1p23> References: <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp> <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net> <944626039.25442.1520472427152.JavaMail.www@wwinf1m17> <20180308090328.336a734f@JRWUBU2> <2eVJnayaB1fI6v03wDXiNHjzi3sd4iQVcP3B6zKo0jNlJZbIv3NBHTTxbNF09FVxQnyHi-in9vx9eReFVRc2Cg==@protonmail.ch> <53251075.18695.1520532306737.JavaMail.www@wwinf1m17> <20180308183304.GB2050855@phare.normalesup.org> <1296021581.14406.1520607509951.JavaMail.www@wwinf1m17> <8f0369d7-4fea-8685-1770-cfb5b895fbae@att.net> <1359906050.2399.1520841589070.JavaMail.www@wwinf1m17> <2130020491.9884.1520855755457.JavaMail.www@wwinf1m17> <894159522.24446.1520965256249.JavaMail.www@wwinf1p23> Message-ID: <8995b710-0e68-9a53-ad39-39c668564840@ix.netcom.com> An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Mar 13 14:55:01 2018 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Tue, 13 Mar 2018 20:55:01 +0100 Subject: Translating the standard In-Reply-To: <8995b710-0e68-9a53-ad39-39c668564840@ix.netcom.com> References: <1eb17932-8a9d-3dfe-5448-750861aa415b@hiroshima-u.ac.jp> <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net> <944626039.25442.1520472427152.JavaMail.www@wwinf1m17> <20180308090328.336a734f@JRWUBU2> <2eVJnayaB1fI6v03wDXiNHjzi3sd4iQVcP3B6zKo0jNlJZbIv3NBHTTxbNF09FVxQnyHi-in9vx9eReFVRc2Cg==@protonmail.ch> <53251075.18695.1520532306737.JavaMail.www@wwinf1m17> <20180308183304.GB2050855@phare.normalesup.org> <1296021581.14406.1520607509951.JavaMail.www@wwinf1m17> <8f0369d7-4fea-8685-1770-cfb5b895fbae@att.net> <1359906050.2399.1520841589070.JavaMail.www@wwinf1m17> <2130020491.9884.1520855755457.JavaMail.www@wwinf1m17> <894159522.24446.1520965256249.JavaMail.www@wwinf1p23> <8995b710-0e68-9a53-ad39-39c668564840@ix.netcom.com> Message-ID: It is then a version of the matching standards from Canadian and French standard bodies. This does not make a big difference, except that those national standards (last editions in 2003) are not kept in sync with evolutions of the ISO/IEC standard. So it can be said that this was a version for the 2003 version of the ISO/IEC standard, supported and sponsored by some of their national members. 2018-03-13 19:38 GMT+01:00 Asmus Freytag via Unicode : > On 3/13/2018 11:20 AM, Marcel Schneider via Unicode wrote: > > On Mon, 12 Mar 2018 14:55:28 +0000, Michel Suignard wrote: > > Time to correct some facts. > The French version of ISO/IEC 10646 (2003 version) were done in a separate effort by Canada and France NBs and not within SC2 proper. > ... > > Then it can be referred to as ?French version of ISO/IEC 10646? but I?ve got Andrew?s point, too. > > Correction: if a project is not carried out by SC2 (the proper ISO/IEC > subcommittee) then it is not a "version" of the ISO/IEC standard. > > A./ > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Mar 13 16:59:50 2018 From: unicode at unicode.org (John H. Jenkins via Unicode) Date: Tue, 13 Mar 2018 15:59:50 -0600 Subject: Unicode 11.0 and 12.0 Cover Design Art In-Reply-To: <6b9ce906-6d98-4bba-3f97-8b841dbc65ce@lisamoore.us.com> References: <6b9ce906-6d98-4bba-3f97-8b841dbc65ce@lisamoore.us.com> Message-ID: <6450641E-C9F1-4EDF-8715-C22D9A1D533F@apple.com> Maybe we should just throw in the towel and put "DON'T PANIC" on the cover in big, friendly letters. ?? From unicode at unicode.org Tue Mar 13 18:48:51 2018 From: unicode at unicode.org (Asmus Freytag (c) via Unicode) Date: Tue, 13 Mar 2018 16:48:51 -0700 Subject: Translating the standard In-Reply-To: References: <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net> <944626039.25442.1520472427152.JavaMail.www@wwinf1m17> <20180308090328.336a734f@JRWUBU2> <2eVJnayaB1fI6v03wDXiNHjzi3sd4iQVcP3B6zKo0jNlJZbIv3NBHTTxbNF09FVxQnyHi-in9vx9eReFVRc2Cg==@protonmail.ch> <53251075.18695.1520532306737.JavaMail.www@wwinf1m17> <20180308183304.GB2050855@phare.normalesup.org> <1296021581.14406.1520607509951.JavaMail.www@wwinf1m17> <8f0369d7-4fea-8685-1770-cfb5b895fbae@att.net> <1359906050.2399.1520841589070.JavaMail.www@wwinf1m17> <2130020491.9884.1520855755457.JavaMail.www@wwinf1m17> <894159522.24446.1520965256249.JavaMail.www@wwinf1p23> <8995b710-0e68-9a53-ad39-39c668564840@ix.netcom.com> Message-ID: On 3/13/2018 12:55 PM, Philippe Verdy wrote: > It is then a version of the matching standards from Canadian and > French standard bodies. This does not make a big difference, except > that those national standards (last editions in 2003) are not kept in > sync with evolutions of the ISO/IEC standard. So it can be said that > this was a version for the 2003 version of the ISO/IEC standard, > supported and sponsored by some of their national members. There is a way to transpose international standards to national standards, but they then pick up a new designation, e.g. ANSI for US or DIN for German or EN for European Norm. A./ > > 2018-03-13 19:38 GMT+01:00 Asmus Freytag via Unicode > >: > > On 3/13/2018 11:20 AM, Marcel Schneider via Unicode wrote: >> On Mon, 12 Mar 2018 14:55:28 +0000, Michel Suignard wrote: >>> Time to correct some facts. >>> The French version of ISO/IEC 10646 (2003 version) were done in a separate effort by Canada and France NBs and not within SC2 proper. >>> ... >> Then it can be referred to as ?French version of ISO/IEC 10646? but I?ve got Andrew?s point, too. > Correction: if a project is not carried out by SC2 (the proper > ISO/IEC subcommittee) then it is not a "version" of the ISO/IEC > standard. > > A./ > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Mar 13 23:37:09 2018 From: unicode at unicode.org (Marcel Schneider via Unicode) Date: Wed, 14 Mar 2018 05:37:09 +0100 (CET) Subject: Translating the standard In-Reply-To: References: <5cba0f20-a89b-ce0c-6efb-5154d27e2e17@att.net> <944626039.25442.1520472427152.JavaMail.www@wwinf1m17> <20180308090328.336a734f@JRWUBU2> <2eVJnayaB1fI6v03wDXiNHjzi3sd4iQVcP3B6zKo0jNlJZbIv3NBHTTxbNF09FVxQnyHi-in9vx9eReFVRc2Cg==@protonmail.ch> <53251075.18695.1520532306737.JavaMail.www@wwinf1m17> <20180308183304.GB2050855@phare.normalesup.org> <1296021581.14406.1520607509951.JavaMail.www@wwinf1m17> <8f0369d7-4fea-8685-1770-cfb5b895fbae@att.net> <1359906050.2399.1520841589070.JavaMail.www@wwinf1m17> <2130020491.9884.1520855755457.JavaMail.www@wwinf1m17> <894159522.24446.1520965256249.JavaMail.www@wwinf1p23> <8995b710-0e68-9a53-ad39-39c668564840@ix.netcom.com> Message-ID: <10963408.182.1521002229592.JavaMail.www@wwinf1m17> On?Tue, 13 Mar 2018 16:48:51 -0700,?Asmus Freytag (c) via Unicode wrote: On 3/13/2018 12:55 PM, Philippe Verdy wrote: It is then a version of the matching standards from Canadian and French standard bodies. This does not make a big difference, except that those national standards (last editions in 2003) are not kept in sync with evolutions of the ISO/IEC standard. So it can be said that this was a version for the 2003 version of the ISO/IEC standard, supported and sponsored by some of their national members. There is a way to transpose international standards to national standards, but they then pick up a new designation, e.g. ANSI for US or DIN for German or EN for European Norm. A./ 2018-03-13 19:38 GMT+01:00 Asmus Freytag via Unicode?: On 3/13/2018 11:20 AM, Marcel Schneider via Unicode wrote: On Mon, 12 Mar 2018 14:55:28 +0000, Michel Suignard wrote: Time to correct some facts. The French version of ISO/IEC 10646 (2003 version) were done in a separate effort by Canada and France NBs and not within SC2 proper. ... Then it can be referred to as ?French version of ISO/IEC 10646? but I?ve got Andrew?s point, too. Correction: if a project is not carried out by SC2 (the proper ISO/IEC subcommittee) then it is not a "version" of the ISO/IEC standard. A./ ? Thanks for correction. And I confess and apologize that on Patrick?s French Unicode 5.0 Code Charts page ( http://hapax.qc.ca/Tableaux-5.0.htm ), there is no instance of "version", although the item is referred to as "ISO 10646:2003 (F)", from which it can ordinarily be inferred that "ISO" did back the project and that it is considered as the French version of the standard. ? I wasn?t aware that this kind of parsing the facts is somewhat informal and shouldn?t be handled on mailing lists without a caveat.? ? That said, the French transposition of ISO/IEC 10646 was not carried out as just sort of a joint venture of Canada and France (which btw has stepped out, leaving Qu?bec alone supporting the cost of future editions! Really ugly), given that it got feedback from numerous countries, part of which was written in French, and went through a heavy ballot process. Thus, getting it changed is not easy since it was approved by the time, and any change requests should be documented and are primarily damageable as threatening stability. Name changes affecting rare characters prove to be feasible, while on the other hand, syncing the French name of U+202F with common practice and TUS is obviously more complicated, which in turn compromises usability in UIs, where we?re therefore likely to use descriptors i.e. altered names for roughly half of the characters bearing a specific name. Somehow the same rationale as for UTN #24 but somewhat less apposite given that the French transposition is not constrained by stability policies. ? Best regards, ? Marcel ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Mar 14 06:55:08 2018 From: unicode at unicode.org (Andre Schappo via Unicode) Date: Wed, 14 Mar 2018 11:55:08 +0000 Subject: Translating the standard In-Reply-To: References: <217796935.14420.1520793311427.JavaMail.www@wwinf1m17> <20180311200503.GB216921@phare.normalesup.org> <1354541898.17674.1520802858975.JavaMail.www@wwinf1m17> Message-ID: On 13 Mar 2018, at 02:49, Yif?n W?ng via Unicode > wrote: Somewhat digressing from the topic, but I'd like to make some comment on this part as I smell a persistent myth among some, hopefully small number of, software engineers in Anglosphere. First, the fact that computer languages are written using English words doesn't mean that programmers are supposed to have proportional English knowledge. Take the word of Matz, the creator of Ruby language: "The English skill is a super-powerful rare card (in the career path of a Japanese engineer)!" He then continue that you should be in keeping with most up-to-date overseas info/trend in order to be a high-tier engineer and so on. It's far from "requirement". http://eikaiwa.dmm.com/blog/3826/ I've also read somewhere a memoir of a middle-aged programmer who was already into BASIC in childhood. One day he thought he'd written off a "great" program and printed it on paper, but to his surprise, an auntie who took a look at it immediately decoded the program and roughly understood what it was meant to do; she knew English, and he didn't. Programming as such, is just like a Chinese room replaced with English, where you sit inside a cramped room night after night, communicating with a computer by typing in English words the bulky reference guide teaches you. Most East Asian countries are blessed enough with a tremendous number of translated technical publications (e.g. O'Reilly) each year, not to mention firsthand writings in their own languages. So the documentation is easily available if you don't speak English the language. Second, that English is lingua franca doesn't necessarily mean the English spoken in the wild is. The aviation industry is another field which employs English as the common language, but they exert utmost effort to maintain the system working. Namely, they have a controlled word set with semantics as disambiguated as possible, called ASD-STE100, for technical documentation, such as maintenance manuals, to minimize errors caused by limited English knowledge. Unicode, on the other hand, is merely written in a free style used when English speakers who (almost) graduated from college write to English speakers who (almost) graduated from college. Having such level of proficiency being a non-native speaker isn't something trivial, unless someone is constantly in contact with English-speaking community. (And programming community isn't contained inside English-speaking community at all.) That said, I agree to almost everything Alastair said after. If I have to add one more thing, a monolingual writing is usually too tightly coupled with the language, more than engineers may believe, even if the writer carefully chose their words to be context-neutral. Thus it's hard job to say no more and no less than the original text in another language, especially when exactitude matters. It's one of the problems prevent from fully automated translation being a thing, I guess. Best regards, Yifan When it comes to program identifiers, languages such as Chinese has a huge advantage as it is a much more compact language than English. So, one can write meaningful identifier names with a small number of Chinese characters. In an all Chinese development team, producing software for the Chinese market, why not have the program identifiers written in Chinese? Or maybe this does happen? Over the years I have talked with many Chinese students about this, and usually they tell me something like: "Our lecturers in China tell us to always use English for program identifiers". I make use of several languages for my program identifiers ? jsfiddle.net/user/coas/fiddles My use of non English languages for program identifiers is somewhat random?? Andr? Schappo -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Mar 15 16:06:32 2018 From: unicode at unicode.org (Adam Borowski via Unicode) Date: Thu, 15 Mar 2018 22:06:32 +0100 Subject: Unicode 11.0 and 12.0 Cover Design Art In-Reply-To: <6450641E-C9F1-4EDF-8715-C22D9A1D533F@apple.com> References: <6b9ce906-6d98-4bba-3f97-8b841dbc65ce@lisamoore.us.com> <6450641E-C9F1-4EDF-8715-C22D9A1D533F@apple.com> Message-ID: <20180315210632.t55kom7rwh2s3xgj@angband.pl> On Tue, Mar 13, 2018 at 03:59:50PM -0600, John H. Jenkins via Unicode wrote: > Maybe we should just throw in the towel and put "DON'T PANIC" on the cover > in big, friendly letters. ?? But what script would you use? ?????? ???????? ???????????? ?????????? ?????? ??????? ?????? ???????? ???????????? ?????????? ?????? ??????? ?????? ???????? ???????????? ?????????? ?????? ??????? ?????? ???????? ???????????? ?????????? ?????? ??????? ??? ???? ?????? ????? ??? ???? ??? ???? ?????? ????? ?? ???? ??? ???? ?????? ????? ??? ???? ?????? ???????? ???????????? ?????????? ?????? ??????? ?????? ???????? ???????????? ?????????? ?????? ??????? ?????????????????????????????? ??? ???? ?????? ????? ??? ???? ?????? ???????? ???????????? ?????????? y???? ??????? ???? ???? ?????? ?????? ???? ????? ? w? sc?t w??d ?? ??? ?t w?t sc?pt ??ld ?? ??? ?????? ??????? ???????????? ?????????? ?????? ??????? ?t w?t sc?pt ???ld ?? ??? But what script would you use? ?????? ???????? ???????????? ?????????? ?????? ??????? ?????? ???????? ???????????? ?????????? ?????? ??????? ?????? ???????? ???????????? ?????????? ?????? ??????? ?????? w?????? ???????????? w???????? y???? ??????? ???? ???? ?????? ????? ??? ???? ?????? ???????? ???????????? ?????????? ?????? ??????? ?????? ???????? ???????????? ?????????? ?????? ??????? ?????? ???????? ???????????? ?????????? ?????? ??????? ?????? ???????? ???????????? ?????????? ?????? ??????? ?????? ???????? ???????????? ?????????? ?????? ??????? ????? ???????? ???????????? ????????? ????? ?????? B?? ???? ?????? ????? ??? ???? ?????? ???????? ???????????? ?????????? ?????? ??????? -- ??????? ??????? A dumb species has no way to open a tuna can. ??????? A smart species invents a can opener. ??????? A master species delegates. From unicode at unicode.org Fri Mar 16 19:56:09 2018 From: unicode at unicode.org (Ed Borgquist via Unicode) Date: Fri, 16 Mar 2018 17:56:09 -0700 Subject: Full Emoji List Chart No Longer Displaying Emoji with Skin-tones Message-ID: <9DBCE90001255C4C8852C768DFC98D0F5F4919@ex2ksrv.ncsdi.net> Hello All, The Full Emoji List [1] had, in the past, displayed Emoji with all skin tone variants. It seems that this is no longer the case. Does anyone know if it is possible that this could return in the future? This data was useful for myself, as scraping this data allowed for me to identify "homographic" Emoji from a variety of vendors. Additionally, I could see how vendors approached skin tone variants for difficult-to-distinguish Emoji (for example, SNOWBOARDER often features a person with no visible skin). [1] https://unicode.org/emoji/charts/full-emoji-list.html Kindest Regards, ? Ed Borgquist .WS Registry From unicode at unicode.org Sat Mar 17 07:19:54 2018 From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode) Date: Sat, 17 Mar 2018 13:19:54 +0100 Subject: Full Emoji List Chart No Longer Displaying Emoji with Skin-tones In-Reply-To: <9DBCE90001255C4C8852C768DFC98D0F5F4919@ex2ksrv.ncsdi.net> References: <9DBCE90001255C4C8852C768DFC98D0F5F4919@ex2ksrv.ncsdi.net> Message-ID: We were getting so much traffic on the emoji pages that we had to produce an abbreviated version to reduce the load (without skin tones, it is about half the size). We are looking at improvements to the infrastructure and/or chart design that would let us restore them, but people are busy with other Unicode projects right now. Mark On Sat, Mar 17, 2018 at 1:56 AM, Ed Borgquist via Unicode < unicode at unicode.org> wrote: > Hello All, > > The Full Emoji List [1] had, in the past, displayed Emoji with all skin > tone variants. It seems that this is no longer the case. Does anyone know > if it is possible that this could return in the future? > > This data was useful for myself, as scraping this data allowed for me to > identify "homographic" Emoji from a variety of vendors. Additionally, I > could see how vendors approached skin tone variants for > difficult-to-distinguish Emoji (for example, SNOWBOARDER often features a > person with no visible skin). > > [1] https://unicode.org/emoji/charts/full-emoji-list.html > > Kindest Regards, > > Ed Borgquist > .WS Registry > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Sat Mar 17 11:43:49 2018 From: unicode at unicode.org (Ed Borgquist via Unicode) Date: Sat, 17 Mar 2018 09:43:49 -0700 Subject: Full Emoji List Chart No Longer Displaying Emoji with Skin-tones References: <9DBCE90001255C4C8852C768DFC98D0F5F4919@ex2ksrv.ncsdi.net> Message-ID: <9DBCE90001255C4C8852C768DFC98D0F5F491A@ex2ksrv.ncsdi.net> Thanks for the information. Does Unicode make public the source images received from vendors? Or, is there somewhere else you would recommend for me to look? Kindest Regards, Ed Borgquist .WS Registry From: mark.edward.davis at gmail.com [mailto:mark.edward.davis at gmail.com] On Behalf Of Mark Davis ?? Sent: Saturday, March 17, 2018 5:20 AM To: Ed Borgquist Cc: Unicode Public Subject: Re: Full Emoji List Chart No Longer Displaying Emoji with Skin-tones We were getting so much traffic on the emoji pages that we had to produce an abbreviated version to reduce the load (without skin tones, it is about half the size). We are looking at improvements to the infrastructure and/or chart design that would let us restore them, but people are busy with other Unicode projects right now. Mark On Sat, Mar 17, 2018 at 1:56 AM, Ed Borgquist via Unicode wrote: Hello All, The Full Emoji List [1] had, in the past, displayed Emoji with all skin tone variants. It seems that this is no longer the case. Does anyone know if it is possible that this could return in the future? This data was useful for myself, as scraping this data allowed for me to identify "homographic" Emoji from a variety of vendors. Additionally, I could see how vendors approached skin tone variants for difficult-to-distinguish Emoji (for example, SNOWBOARDER often features a person with no visible skin). [1] https://unicode.org/emoji/charts/full-emoji-list.html Kindest Regards, Ed Borgquist .WS Registry -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Sat Mar 17 12:23:51 2018 From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode) Date: Sat, 17 Mar 2018 17:23:51 +0000 Subject: Full Emoji List Chart No Longer Displaying Emoji with Skin-tones In-Reply-To: <9DBCE90001255C4C8852C768DFC98D0F5F491A@ex2ksrv.ncsdi.net> References: <9DBCE90001255C4C8852C768DFC98D0F5F4919@ex2ksrv.ncsdi.net> <9DBCE90001255C4C8852C768DFC98D0F5F491A@ex2ksrv.ncsdi.net> Message-ID: You can take a look at emojipedia. They have a good set of information about emoji glyphs. {phone} On Sat, Mar 17, 2018, 17:44 Ed Borgquist wrote: > Thanks for the information. Does Unicode make public the source images > received from vendors? Or, is there somewhere else you would recommend for > me to look? > > > > Kindest Regards, > > > > Ed Borgquist > > .WS Registry > > > > *From:* mark.edward.davis at gmail.com [mailto:mark.edward.davis at gmail.com] *On > Behalf Of *Mark Davis ?? > *Sent:* Saturday, March 17, 2018 5:20 AM > *To:* Ed Borgquist > *Cc:* Unicode Public > *Subject:* Re: Full Emoji List Chart No Longer Displaying Emoji with > Skin-tones > > > > We were getting so much traffic on the emoji pages that we had to produce > an abbreviated version to reduce the load (without skin tones, it is about > half the size). > > > > We are looking at improvements to the infrastructure and/or chart design > that would let us restore them, but people are busy with other Unicode > projects right now. > > > Mark > > > > On Sat, Mar 17, 2018 at 1:56 AM, Ed Borgquist via Unicode < > unicode at unicode.org> wrote: > > Hello All, > > The Full Emoji List [1] had, in the past, displayed Emoji with all skin > tone variants. It seems that this is no longer the case. Does anyone know > if it is possible that this could return in the future? > > This data was useful for myself, as scraping this data allowed for me to > identify "homographic" Emoji from a variety of vendors. Additionally, I > could see how vendors approached skin tone variants for > difficult-to-distinguish Emoji (for example, SNOWBOARDER often features a > person with no visible skin). > > [1] https://unicode.org/emoji/charts/full-emoji-list.html > > Kindest Regards, > > Ed Borgquist > .WS Registry > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Mar 19 10:00:04 2018 From: unicode at unicode.org (=?UTF-8?Q?Christoph_P=C3=A4per?= via Unicode) Date: Mon, 19 Mar 2018 16:00:04 +0100 (CET) Subject: Full Emoji List Chart No Longer Displaying Emoji with Skin-tones In-Reply-To: <9DBCE90001255C4C8852C768DFC98D0F5F491A@ex2ksrv.ncsdi.net> References: <9DBCE90001255C4C8852C768DFC98D0F5F4919@ex2ksrv.ncsdi.net> <9DBCE90001255C4C8852C768DFC98D0F5F491A@ex2ksrv.ncsdi.net> Message-ID: <1583983326.32942.1521471604856@ox.hosteurope.de> Ed Borgquist: > > Thanks for the information. Does Unicode make public the source images received from vendors? Or, is there somewhere else you would recommend for me to look? Besides Emojipedia.org, you will find most current sets at and many old sets at . From unicode at unicode.org Fri Mar 23 08:00:36 2018 From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode) Date: Fri, 23 Mar 2018 14:00:36 +0100 Subject: Unicode Utilities Message-ID: For testing, the Unicode Utilities now support the Unicode beta properties (with some caveats). Example: \p{gc?=Lu}-\p{gc=Lu} . Thanks to Sascha for helping to move to different infrastructure for the utilities... Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Mar 26 11:51:55 2018 From: unicode at unicode.org (William_J_G Overington via Unicode) Date: Mon, 26 Mar 2018 17:51:55 +0100 (BST) Subject: Accessibility Emoji In-Reply-To: <8570587.39967.1522077150247.JavaMail.root@webmail02.bt.ext.cpcloud.co.uk> References: <8570587.39967.1522077150247.JavaMail.root@webmail02.bt.ext.cpcloud.co.uk> Message-ID: <17804074.45213.1522083115554.JavaMail.defaultUser@defaultHost> I have been looking with interest at the following publication. Proposal For New Accessibility Emoji by Apple Inc. www.unicode.org/L2/L2018/18080-accessibility-emoji.pdf I am supportive of the proposal. Indeed please have more such emoji as well. In relation to the two dogs. My own (limited) experience of guide dogs for people with a vision disability, just from seeing them in the street and on television is that in the United Kingdom the dogs often have a yellow protective coat with silvery strips on them so that they can be more easily seen. It may also help them being more readily recognised as each being a guide dog. The dogs tend to be of a type of dog of rather wider aspect ratio, if that is the way to put it, than the dog in the sample glyph in the proposal document. The dogs tend to be a creamy yellow colour, though there was a famous guide dog who was all black, famous as the guide dog was allowed to accompany a then Member of Parliament into the House of Commons Chamber in London. So, while the two rod guide handle, contrasted with a floppy lead, is a good disambiguation guide for the two types of assistance dogs, I suggest that using the presence of what the proposal terms a vest for disambiguation may not be appropriate. Also the word vest appears to have different meanings in British English and American English. Maybe jacket might be better choice of word than vest for the standards document. What about the colour and type of the dog? Perhaps easier to add in now than later? What about a person with a hidden disability? Many people have a hidden disability yet do not have a service dog as the nature of the particular hidden disability or maybe hidden disabilities does not need the help of a service dog. Should there be an emoji for a person with a hidden disability? Or maybe more than one such emoji so as to disambiguate the types of hidden disability, always remembering to have an "other hidden disability" emoji so as to include all types of hidden disability? Those questions, and indeed the whole proposal document, lead to asking for what purposes these emoji are envisioned as becoming used? For example, a person with a hidden disability might not like to be referred to as such, yet may like to describe himself or herself as having a hidden disability if trying to find appropriate facilities relevant to the particular disability, such as a toilet for a person with a disability with the additional facilities thereof, or seeking access to a chair or a first-aid room, or seeking help for opening a door, or maybe when requesting a special diet, such as a gluten-free diet. How could the accessibility emoji in the proposal be used in practice? William Overington Monday 26 March 2018 From unicode at unicode.org Thu Mar 29 05:38:51 2018 From: unicode at unicode.org (William_J_G Overington via Unicode) Date: Thu, 29 Mar 2018 11:38:51 +0100 (BST) Subject: Accessibility Emoji In-Reply-To: <10505691.13022.1522317252542.JavaMail.root@webmail01.bt.ext.cpcloud.co.uk> References: <10505691.13022.1522317252542.JavaMail.root@webmail01.bt.ext.cpcloud.co.uk> Message-ID: <16510571.16509.1522319931314.JavaMail.defaultUser@defaultHost> I have been thinking about issues around the proposal. http://www.unicode.org/L2/L2018/18080-accessibility-emoji.pdf There is a sentence in that document that starts as follows. > Emoji are a universal language and a powerful tool for communication, .... It seems to me that what is lacking with emoji are verbs and pronouns. For example, "to be", "to have" and "to need". The verb "to need" might well be of particular importance in relation to accessibility considerations. How could verbs be introduced into emoji? The verb "to love" can already be indicated using a heart symbol. Should abstract designs be used? Or should emoji always be pictographic? If abstract designs were introduced would it be possible for the standards documents to include the meanings or would the standards documents need to simply use a geometrical description and then the meanings be regarded as a higher level protocol outside of the standard? For, if abstract emoji were introduced with the intention of them to be of use as verbs in a universal language, it would be of benefit if the meanings were in the standard. If abstract designs were used then the meanings would need to be learned. Yet if the meanings were universal that could be a useful development. I have wondered whether verb tenses could be usefully expressed using some of the existing combining accent characters following an emoji verb character.. For example, U+0302 COMBINING CIRCUMFLEX ACCENT to indicate that the verb is in the future tense, U+0304 COMBINING MACRON to indicate that the verb is in the present tense, U+030C COMBINING CARON to indicate that the verb is in the past tense, U+0303 COMBINING TILDE to indicate that the verb is in the conditional tense. The desirability of pronouns was raised by a gentleman in the audience of a lecture at the Internationalization and Unicode Conference in 2015. I tried to produce some designs. I could not find a way to do that with conventional illustrative pictures, though I did produce a set of abstract designs that could possibly be useful in application; they could be displayed in colourful emoji style yet also in monochrome without ambiguity. Yet they are abstract designs, so meanings would need to be learned rather than indicated by the picture itself. Yet if the meanings were universal, that could be useful. Should there be abstract emoji or should emoji only be conventional pictures? William Overington Thursday 29 March 2018 From unicode at unicode.org Thu Mar 29 17:23:06 2018 From: unicode at unicode.org (fantasai via Unicode) Date: Thu, 29 Mar 2018 15:23:06 -0700 Subject: Sentence_Break, Semi-colons, and Apparent Miscategorization In-Reply-To: References: Message-ID: <42a17725-bda9-0268-0296-e88b1b3c26a3@inkedblade.net> On 03/08/2018 07:04 AM, Mark Davis ?? wrote: > From the first line, I guess you mean that all three questions are having to do with the Sentence_Break property values. Namely: > > http://www.unicode.org/reports/tr29/proposed.html#Table_Sentence_Break_Property_Values > http://www.unicode.org/reports/tr29/proposed.html#SContinue Yes. > On Thu, Mar 8, 2018 at 9:25 AM, fantasai via Unicode > wrote: > > > Given that the comma and colon are categorized as SContinue, > > why is the semicolon also not SContinue? > > > Also, why is the Greek Question Mark not categorized with > > the rest of the question marks? > > ?As I recall?,?? both are because the semicolon can also represent a greek question mark > (they are canonically equivalent, so you can't reliably distinguish between them).? :/ I'm guessing this is why all other semicolons (which don't have this problem) are also categorized as Other instead of SContinue? Given SContinue is a set of punctuation that's ?softer? than STerm, it seems to me it would make more sense to categorize them all (including the Greek question mark) as SContinue, and then allow implementations to tailor the Greek question mark and semicolon to STerm as needed. Leaving them all under Other means that all semicolons would have to be individually tailored out of Other, which seems much more error-prone. > > Why aren't the vertical presentation forms categorized with > > the things they are presenting? > > ?At least some of them are: > U+FE10 ( ? ) PRESENTATION FORM FOR VERTICAL COMMA > U+FE11 ( ? ) PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC COMMA > U+FE13 ( ? ) PRESENTATION FORM FOR VERTICAL COLON > U+FE31 ( ? ) PRESENTATION FORM FOR VERTICAL EM DASH > U+FE32 ( ? ) PRESENTATION FORM FOR VERTICAL EN DASH Yes, but others aren't: ? U+FE12 PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC FULL STOP ? U+FE15 PRESENTATION FORM FOR VERTICAL EXCLAMATION MARK ? U+FE16 PRESENTATION FORM FOR VERTICAL QUESTION MARK https://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%3AGeneral_category%3DPo%3A%5D&g=Sentence_Break&i= I'm also wondering about Armenian, Coptic, and Ethiopic * Armenian exclamation mark and question mark are Other, whereas Latin (ASCII) places them as STerm. * None of the Coptic punctuation is categorized as non-Other, not even the full stop which I'd expect under STerm. * Ethiopic comma and colon are not grouped with commas and colons in general under SContinue. Were these intentionally or accidentally placed under Other? ~fantasai From unicode at unicode.org Thu Mar 29 19:31:48 2018 From: unicode at unicode.org (Marcel Schneider via Unicode) Date: Fri, 30 Mar 2018 02:31:48 +0200 (CEST) Subject: Accessibility Emoji Message-ID: <32693654.14872.1522369908394.JavaMail.www@wwinf1h30> William, ? On 29/03/18 17:03 William_J_G Overington via Unicode wrote: >? > I have been thinking about issues around the proposal. > http://www.unicode.org/L2/L2018/18080-accessibility-emoji.pdf > There is a sentence in that document that starts as follows. >? > > Emoji are a universal language and a powerful tool for communication, .... ? That is clearly overstating the capabilities of emoji, and ignoring the borderline? between verbal and pictographic expression. The appropriateness of each one? depends mainly on semantics and context. The power of emoji may rely in their? being polysemic, escaping censorship as already discussed during past years. >? > It seems to me that what is lacking with emoji are verbs and pronouns. ? Along with these, one would need more nouns, too, setting up an autonomous? language. That however is not the goal of emoji and is outside the scope of? Unicode. >? > For example, "to be", "to have" and "to need". The verb "to need" might well? be of particular importance in relation to accessibility considerations. ? When accessibility matters, devices may be missing, and then the symbol charts? are most appropriate, as seen. When somebody is pointing an object, the ?need?? case is most obvious anyway. Impaired persons may use a bundle of cards including? textual messages. None of these justifies encoding extra emoji. E.g. when somebody? wishes a relative to buy more bread while returning from work, the appropriate number of loaves followed by an exclamation mark and a smile or heart may do it. >? > How could verbs be introduced into emoji? The verb "to love" can already be indicated using a heart symbol. ? This is the one that people are likely to be most embarrassed typing out.? >? > Should abstract designs be used? Or should emoji always be pictographic? ? Yes, they should always be highly iconic, Asmus explained in detail. See: ? http://www.unicode.org/mail-arch/unicode-ml/y2015-m08/0014.html >? > If abstract designs were introduced would it be possible for the standards documents to include the meanings > or would the standards documents need to simply use a geometrical description and then the meanings be > regarded as a higher level protocol outside of the standard? ? On one hand, Unicode does not encode semantics; but on the other hand, on character level, semantics are? part of the documentation accompanying a number of characters in the Charts. There is a balance between? polysemics and disambiguation. As a thumb rule: characters are disambiguated to ensure correct processing of the data, so far as the cost induced by handling multiple characters doesn?t outweigh the benefit.? In putting your question, you already answered it, except that there are geometric figures encoded for UIs,? that therefore already have a meaning, yet are mostly generically named, leaving the door open to alternate? semantics. >? > For, if abstract emoji were introduced with the intention of them to be of use as verbs in a universal language, > it would be of benefit if the meanings were in the standard. ? But such a language has clearly been stated as being out of scope of Unicode, and we aren?t even allowed? to further discuss that particular topic, given the mass of threads and e?mails already dedicated to it in the past. >? > If abstract designs were used then the meanings would need to be learned. Yet if the meanings were > universal that could be a useful development. ? It would not, because automatic translation tools already cater for these needs, and possibly better. See: ? http://unicode.org/pipermail/unicode/2015-October/003005.html >? > I have wondered whether verb tenses could be usefully expressed using some of the existing combining > accent characters following an emoji verb character.. ? First of all, users should be likely to adopt the scheme in a fairly predictable way. I?m ignoring actual trends? and can only repeat what has been said on this list: communities are missing, and so is interest.? Hence, sadly to say, there is little through no point in elaborating further. Personally I?m poorly armed to help building a user community, as I don?t have a smartphone, while being? very busy with more and more tasks, leaving little time for many experiments. ?Sorry. ? Best regards, ? Marcel >? > For example, U+0302 COMBINING CIRCUMFLEX ACCENT to indicate that the verb is in the future tense, U+0304 COMBINING MACRON to indicate that the verb is in the present tense, U+030C COMBINING CARON to indicate that the verb is in the past tense, U+0303 COMBINING TILDE to indicate that the verb is in the conditional tense. >? > The desirability of pronouns was raised by a gentleman in the audience of a lecture at the Internationalization and Unicode Conference in 2015. >? > I tried to produce some designs. I could not find a way to do that with conventional illustrative pictures, though I did produce a set of abstract designs that could possibly be useful in application; they could be displayed in colourful emoji style yet also in monochrome without ambiguity. Yet they are abstract designs, so meanings would need to be learned rather than indicated by the picture itself. Yet if the meanings were universal, that could be useful. Should there be abstract emoji or should emoji only be conventional pictures? >? > William Overington >? > Thursday 29 March 2018 >? >? -------------- next part -------------- An HTML attachment was scrubbed... URL: