From doug at ewellic.org Tue Sep 1 11:37:03 2015 From: doug at ewellic.org (Doug Ewell) Date: Tue, 01 Sep 2015 09:37:03 -0700 Subject: Dark beer emoji Message-ID: <20150901093703.665a7a7059d7ee80bb4d670165c8327d.70846b3fe6.wbe@email03.secureserver.net> Document L2/15-211, "Letter in support of dark beer emoji" , is a request submitted by Cuauht?moc Moctezuma, a Mexican brewery. The letter refers to a petition with more than 22,000 signatures supporting such an emoji, and may have at least some commercial motivation ("We want the dark beer to be part of peoples conversations"). As an alternative to this proposal that may provide more flexibility, I propose adapting the Fitzpatrick skin-tone modifiers from U+1F3FB to U+1F3FF to be valid for use following U+1F37A BEER MUG or U+1F37B CLINKING BEER MUGS. This could be done by establishing a normative correlation between the Fitzpatrick scale and the Standard Reference Method (SRM), Lovibond, and/or European Brewery Convention (EBC) beer color scales . This mechanism would allow the entire spectrum of beer styles to be depicted, instead of dividing beers arbitrarily into "light" and "dark," in the same way (and for the same reason) that Unicode already supports a variety of skin tones. For example, a Budweiser or similar lager could be represented as ???? <1F37A, 1F3FB>, while a Newcastle Brown Ale might be ???? <1F37A, 1F3FD>. U+1F3FF could denote imperial stout or Baltic porter. There might be a need to encode an additional "Type 0" color modifier to extend the "light" end of the scale, such as for non-alcoholic brews, or for Coors Light. U+1F37B could be used to denote two beers of the same style, but for beers of different colors, the mechanism described in UTR #51, Section 2.2.1 ("Multi-Person Groupings"), involving ZWJ, could be utilized. So a toast between drinkers of the two beers above could be encoded as ????????? <1F37A, 1F3FB, 200D, 1F37A, 1F3FD>. Longer sequences would also be possible, such as for beer samplers offered in some pubs and restaurants. I have no idea whether my proposal is more or less serious, or more or less likely to be adopted, than the original. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From Shawn.Steele at microsoft.com Tue Sep 1 12:29:56 2015 From: Shawn.Steele at microsoft.com (Shawn Steele) Date: Tue, 1 Sep 2015 17:29:56 +0000 Subject: Dark beer emoji In-Reply-To: <20150901093703.665a7a7059d7ee80bb4d670165c8327d.70846b3fe6.wbe@email03.secureserver.net> References: <20150901093703.665a7a7059d7ee80bb4d670165c8327d.70846b3fe6.wbe@email03.secureserver.net> Message-ID: Ugh, should've encoded that Martian green skin-tone. Then we'd've been prepared for St. Patty's Day beers. -----Original Message----- From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Doug Ewell Sent: Tuesday, September 1, 2015 9:37 AM To: Unicode Mailing List Subject: Dark beer emoji Document L2/15-211, "Letter in support of dark beer emoji" , is a request submitted by Cuauht?moc Moctezuma, a Mexican brewery. The letter refers to a petition with more than 22,000 signatures supporting such an emoji, and may have at least some commercial motivation ("We want the dark beer to be part of peoples conversations"). As an alternative to this proposal that may provide more flexibility, I propose adapting the Fitzpatrick skin-tone modifiers from U+1F3FB to U+1F3FF to be valid for use following U+1F37A BEER MUG or U+1F37B CLINKING BEER MUGS. This could be done by establishing a normative correlation between the Fitzpatrick scale and the Standard Reference Method (SRM), Lovibond, and/or European Brewery Convention (EBC) beer color scales . This mechanism would allow the entire spectrum of beer styles to be depicted, instead of dividing beers arbitrarily into "light" and "dark," in the same way (and for the same reason) that Unicode already supports a variety of skin tones. For example, a Budweiser or similar lager could be represented as ???? <1F37A, 1F3FB>, while a Newcastle Brown Ale might be ???? <1F37A, 1F3FD>. U+1F3FF could denote imperial stout or Baltic porter. There might be a need to encode an additional "Type 0" color modifier to extend the "light" end of the scale, such as for non-alcoholic brews, or for Coors Light. U+1F37B could be used to denote two beers of the same style, but for beers of different colors, the mechanism described in UTR #51, Section 2.2.1 ("Multi-Person Groupings"), involving ZWJ, could be utilized. So a toast between drinkers of the two beers above could be encoded as ????????? <1F37A, 1F3FB, 200D, 1F37A, 1F3FD>. Longer sequences would also be possible, such as for beer samplers offered in some pubs and restaurants. I have no idea whether my proposal is more or less serious, or more or less likely to be adopted, than the original. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From asmus-inc at ix.netcom.com Tue Sep 1 12:36:11 2015 From: asmus-inc at ix.netcom.com (Asmus Freytag (t)) Date: Tue, 1 Sep 2015 10:36:11 -0700 Subject: Dark beer emoji In-Reply-To: <20150901093703.665a7a7059d7ee80bb4d670165c8327d.70846b3fe6.wbe@email03.secureserver.net> References: <20150901093703.665a7a7059d7ee80bb4d670165c8327d.70846b3fe6.wbe@email03.secureserver.net> Message-ID: <55E5E20B.2090908@ix.netcom.com> An HTML attachment was scrubbed... URL: From everson at evertype.com Tue Sep 1 12:40:13 2015 From: everson at evertype.com (Michael Everson) Date: Tue, 1 Sep 2015 18:40:13 +0100 Subject: Dark beer emoji In-Reply-To: References: <20150901093703.665a7a7059d7ee80bb4d670165c8327d.70846b3fe6.wbe@email03.secureserver.net> Message-ID: On 1 Sep 2015, at 18:29, Shawn Steele wrote: > > Ugh, should've encoded that Martian green skin-tone. Then we'd've been prepared for St. Patty's Day beers. Recte: St. Paddy?s Day Michael Everson * http://www.evertype.com/ From Shawn.Steele at microsoft.com Tue Sep 1 12:50:56 2015 From: Shawn.Steele at microsoft.com (Shawn Steele) Date: Tue, 1 Sep 2015 17:50:56 +0000 Subject: Dark beer emoji In-Reply-To: References: <20150901093703.665a7a7059d7ee80bb4d670165c8327d.70846b3fe6.wbe@email03.secureserver.net>

Message-ID: Thanks -----Original Message----- From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Michael Everson Sent: Tuesday, September 1, 2015 10:40 AM To: Unicode Mailing List Subject: Re: Dark beer emoji On 1 Sep 2015, at 18:29, Shawn Steele wrote: > > Ugh, should've encoded that Martian green skin-tone. Then we'd've been prepared for St. Patty's Day beers. Recte: St. Paddy?s Day Michael Everson * http://www.evertype.com/ From doug at ewellic.org Tue Sep 1 13:13:13 2015 From: doug at ewellic.org (Doug Ewell) Date: Tue, 01 Sep 2015 11:13:13 -0700 Subject: Dark beer emoji Message-ID: <20150901111313.665a7a7059d7ee80bb4d670165c8327d.701569f03c.wbe@email03.secureserver.net> Asmus Freytag (t) wrote: > Well, you didn't consider that each style of beer may be served in a > different style glass. :) Yay, emoji modifier chaining: U+1F37A BEER MUG U+1F3FB EMOJI MODIFIER FITZPATRICK TYPE-1-2 U+1Fxxx EMOJI MODIFIER WEIZEN GLASS -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From public at khwilliamson.com Tue Sep 1 13:37:26 2015 From: public at khwilliamson.com (Karl Williamson) Date: Tue, 1 Sep 2015 12:37:26 -0600 Subject: Dark beer emoji In-Reply-To: <20150901093703.665a7a7059d7ee80bb4d670165c8327d.70846b3fe6.wbe@email03.secureserver.net> References: <20150901093703.665a7a7059d7ee80bb4d670165c8327d.70846b3fe6.wbe@email03.secureserver.net> Message-ID: <55E5F066.3020502@khwilliamson.com> On 09/01/2015 10:37 AM, Doug Ewell wrote: > I have no idea whether my proposal is more or less serious, or more or > less likely to be adopted, than the original. When I read this, I wondered if it was April 1 instead of September 1. From Shawn.Steele at microsoft.com Tue Sep 1 13:44:00 2015 From: Shawn.Steele at microsoft.com (Shawn Steele) Date: Tue, 1 Sep 2015 18:44:00 +0000 Subject: Dark beer emoji In-Reply-To: <55E5F066.3020502@khwilliamson.com> References: <20150901093703.665a7a7059d7ee80bb4d670165c8327d.70846b3fe6.wbe@email03.secureserver.net> <55E5F066.3020502@khwilliamson.com> Message-ID: It's my birthday, so I knew it wasn't April. :) It'd be a fun font easter egg though... -----Original Message----- From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Karl Williamson Sent: Tuesday, September 1, 2015 11:37 AM To: Doug Ewell ; Unicode Mailing List Subject: Re: Dark beer emoji On 09/01/2015 10:37 AM, Doug Ewell wrote: > I have no idea whether my proposal is more or less serious, or more or > less likely to be adopted, than the original. When I read this, I wondered if it was April 1 instead of September 1. From doug at ewellic.org Tue Sep 1 13:53:09 2015 From: doug at ewellic.org (Doug Ewell) Date: Tue, 01 Sep 2015 11:53:09 -0700 Subject: Dark beer emoji Message-ID: <20150901115309.665a7a7059d7ee80bb4d670165c8327d.5a8578f302.wbe@email03.secureserver.net> Karl Williamson wrote: > When I read this, I wondered if it was April 1 instead of September 1. The opportunity wouldn't have lasted another seven months. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From richard.wordingham at ntlworld.com Tue Sep 1 15:42:45 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Tue, 1 Sep 2015 21:42:45 +0100 Subject: Dark beer emoji In-Reply-To: <20150901111313.665a7a7059d7ee80bb4d670165c8327d.701569f03c.wbe@email03.secureserver.net> References: <20150901111313.665a7a7059d7ee80bb4d670165c8327d.701569f03c.wbe@email03.secureserver.net> Message-ID: <20150901214245.52957528@JRWUBU2> On Tue, 01 Sep 2015 11:13:13 -0700 "Doug Ewell" wrote: > Asmus Freytag (t) wrote: > > > Well, you didn't consider that each style of beer may be served in a > > different style glass. :) > > Yay, emoji modifier chaining: > > U+1F37A BEER MUG > U+1F3FB EMOJI MODIFIER FITZPATRICK TYPE-1-2 > U+1Fxxx EMOJI MODIFIER WEIZEN GLASS How is that to be equated to ? Or is some rendering difference to be expected? Richard. From Shawn.Steele at microsoft.com Tue Sep 1 15:56:06 2015 From: Shawn.Steele at microsoft.com (Shawn Steele) Date: Tue, 1 Sep 2015 20:56:06 +0000 Subject: Dark beer emoji In-Reply-To: <20150901214245.52957528@JRWUBU2> References: <20150901111313.665a7a7059d7ee80bb4d670165c8327d.701569f03c.wbe@email03.secureserver.net> <20150901214245.52957528@JRWUBU2> Message-ID: In one version the beer is inside the glass, in the other, the beer is outside the glass. -----Original Message----- From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Richard Wordingham Sent: Tuesday, September 1, 2015 1:43 PM To: Unicode Mailing List Subject: Re: Dark beer emoji On Tue, 01 Sep 2015 11:13:13 -0700 "Doug Ewell" wrote: > Asmus Freytag (t) wrote: > > > Well, you didn't consider that each style of beer may be served in a > > different style glass. :) > > Yay, emoji modifier chaining: > > U+1F37A BEER MUG > U+1F3FB EMOJI MODIFIER FITZPATRICK TYPE-1-2 1Fxxx EMOJI MODIFIER > U+WEIZEN GLASS How is that to be equated to ? Or is some rendering difference to be expected? Richard. From steve at swales.us Tue Sep 1 16:01:21 2015 From: steve at swales.us (Steve Swales) Date: Tue, 1 Sep 2015 14:01:21 -0700 Subject: Dark beer emoji In-Reply-To: <20150901093703.665a7a7059d7ee80bb4d670165c8327d.70846b3fe6.wbe@email03.secureserver.net> References: <20150901093703.665a7a7059d7ee80bb4d670165c8327d.70846b3fe6.wbe@email03.secureserver.net> Message-ID: <61BE6BAF-A38E-4D20-BF27-57F81F1AB531@swales.us> Personally, I love this idea, and would like to claim first authorship ??. Here?s a snippet from the email I sent to my old colleagues at Apple back on April 15th (not the 1st): > Hi, Apple iOS/Keyboard/Design//I18n folks, > > Just wanted to say, nice work on the new Emoji keyboard design and expanded repertoire. I desperately wish the skin tone modifiers would work on the beer emoji, however. Need my porter and stout. Maybe next update? For old times' sake? ?? . > -steve > On Sep 1, 2015, at 9:37 AM, Doug Ewell wrote: > > Document L2/15-211, "Letter in support of dark beer emoji" > , is a > request submitted by Cuauht?moc Moctezuma, a Mexican brewery. > > The letter refers to a petition with more than 22,000 signatures > supporting such an emoji, and may have at least some commercial > motivation ("We want the dark beer to be part of peoples > conversations"). > > As an alternative to this proposal that may provide more flexibility, I > propose adapting the Fitzpatrick skin-tone modifiers from U+1F3FB to > U+1F3FF to be valid for use following U+1F37A BEER MUG or U+1F37B > CLINKING BEER MUGS. > > This could be done by establishing a normative correlation between the > Fitzpatrick scale and the Standard Reference Method (SRM), Lovibond, > and/or European Brewery Convention (EBC) beer color scales > . > > This mechanism would allow the entire spectrum of beer styles to be > depicted, instead of dividing beers arbitrarily into "light" and "dark," > in the same way (and for the same reason) that Unicode already supports > a variety of skin tones. > > For example, a Budweiser or similar lager could be represented as > ???? <1F37A, 1F3FB>, while a Newcastle Brown Ale might be ???? > <1F37A, 1F3FD>. U+1F3FF could denote imperial stout or Baltic porter. > There might be a need to encode an additional "Type 0" color modifier to > extend the "light" end of the scale, such as for non-alcoholic brews, or > for Coors Light. > > U+1F37B could be used to denote two beers of the same style, but for > beers of different colors, the mechanism described in UTR #51, Section > 2.2.1 ("Multi-Person Groupings"), involving ZWJ, could be utilized. So a > toast between drinkers of the two beers above could be encoded as > ????????? <1F37A, 1F3FB, 200D, 1F37A, 1F3FD>. Longer sequences > would also be possible, such as for beer samplers offered in some pubs > and restaurants. > > I have no idea whether my proposal is more or less serious, or more or > less likely to be adopted, than the original. > > -- > Doug Ewell | http://ewellic.org | Thornton, CO ???? > > From verdy_p at wanadoo.fr Wed Sep 2 03:12:24 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Wed, 2 Sep 2015 10:12:24 +0200 Subject: Dark beer emoji In-Reply-To: <61BE6BAF-A38E-4D20-BF27-57F81F1AB531@swales.us> References: <20150901093703.665a7a7059d7ee80bb4d670165c8327d.70846b3fe6.wbe@email03.secureserver.net> <61BE6BAF-A38E-4D20-BF27-57F81F1AB531@swales.us> Message-ID: now it's time to create varuations for coffea cups, ice creams, more cakes, various forms of burgers, roasted meats, sausages, chickens/turkeys, and eggs, breads... we've put the finger into an infinitely deep hole of images. the initial emojis were used express essential feelings used in interpersonal communication, niw we see attempts to use them to sell various branded products which are not even intercultural. do we need tem in plain text, hen their representation will be top poor to to show their uniqueness. images transported separateky are better. otherwise we'll use text to give real names, brands and product descriptions and characteristics. i do like the proliferation of emojis for priducrs that will fall out of use or that are too much protected and not for general sales. i don't like exclusive claims of authorship that come with those proposals. Le 1 sept. 2015 23:10, "Steve Swales" a ?crit : > Personally, I love this idea, and would like to claim first authorship > ??. Here?s a snippet from the email I sent to my old colleagues at Apple > back on April 15th (not the 1st): > > > Hi, Apple iOS/Keyboard/Design//I18n folks, > > > > Just wanted to say, nice work on the new Emoji keyboard design and > expanded repertoire. I desperately wish the skin tone modifiers would > work on the beer emoji, however. Need my porter and stout. Maybe next > update? For old times' sake? ?? . > > > > -steve > > > On Sep 1, 2015, at 9:37 AM, Doug Ewell wrote: > > > > Document L2/15-211, "Letter in support of dark beer emoji" > > , is a > > request submitted by Cuauht?moc Moctezuma, a Mexican brewery. > > > > The letter refers to a petition with more than 22,000 signatures > > supporting such an emoji, and may have at least some commercial > > motivation ("We want the dark beer to be part of peoples > > conversations"). > > > > As an alternative to this proposal that may provide more flexibility, I > > propose adapting the Fitzpatrick skin-tone modifiers from U+1F3FB to > > U+1F3FF to be valid for use following U+1F37A BEER MUG or U+1F37B > > CLINKING BEER MUGS. > > > > This could be done by establishing a normative correlation between the > > Fitzpatrick scale and the Standard Reference Method (SRM), Lovibond, > > and/or European Brewery Convention (EBC) beer color scales > > . > > > > This mechanism would allow the entire spectrum of beer styles to be > > depicted, instead of dividing beers arbitrarily into "light" and "dark," > > in the same way (and for the same reason) that Unicode already supports > > a variety of skin tones. > > > > For example, a Budweiser or similar lager could be represented as > > ???? <1F37A, 1F3FB>, while a Newcastle Brown Ale might be ???? > > <1F37A, 1F3FD>. U+1F3FF could denote imperial stout or Baltic porter. > > There might be a need to encode an additional "Type 0" color modifier to > > extend the "light" end of the scale, such as for non-alcoholic brews, or > > for Coors Light. > > > > U+1F37B could be used to denote two beers of the same style, but for > > beers of different colors, the mechanism described in UTR #51, Section > > 2.2.1 ("Multi-Person Groupings"), involving ZWJ, could be utilized. So a > > toast between drinkers of the two beers above could be encoded as > > ????????? <1F37A, 1F3FB, 200D, 1F37A, 1F3FD>. Longer sequences > > would also be possible, such as for beer samplers offered in some pubs > > and restaurants. > > > > I have no idea whether my proposal is more or less serious, or more or > > less likely to be adopted, than the original. > > > > -- > > Doug Ewell | http://ewellic.org | Thornton, CO ???? > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charupdate at orange.fr Wed Sep 2 12:20:14 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Wed, 2 Sep 2015 19:20:14 +0200 (CEST) Subject: Dark beer emoji Message-ID: <916520143.14433.1441214414817.JavaMail.www@wwinf1h08> On 01 Sep 2015 at 19:40, Shawn?Steele wrote: > Ugh, should've encoded that Martian green skin-tone. Then we'd've been prepared for St. Patty's Day beers. On 19 Aug 2015 at 22:18, Mark?E.?Shoulson wrote: > And is there an emoji for GRAIN OF SALT? (Actually, that could almost > be useful... or even just a geometric CUBE...) I see that you mock rather often and not only in one circumstance. Had I been aware, I wouldn?t have got started the way I did. Sorry. I hope that these apologies of yours have reached Mrs?Haneys? mailbox.?:) Cheers, Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: From charupdate at orange.fr Wed Sep 2 12:30:34 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Wed, 2 Sep 2015 19:30:34 +0200 (CEST) Subject: Effectiveness of locale support (was: Re: Custom source samples) Message-ID: <750143234.14655.1441215034410.JavaMail.www@wwinf1h08> I don?t want to pull interminable threads, and I even thought of leaving the List, thinking not to have anything else to contribute. But finally I?m pleased to stay tuned and would like to draw your attention to a topic I brought in when I committed myself to dig up some full answer to why people are prevented taking full control over their keyboard layout. And as it?s about locale support, this old-new issue even meets the core of Unicode, and I?m hopeful that it would make a good thread. I?ve formally promised to stop definitely criticizing other people?s work on the Unicode Mailing List. So I?ve worked hard to turn this into a constructive comment. As we know and have been refreshed by the two cited blog posts (which I don?t cite again...), French speaking users in Qu?bec are not fully granted the means of writing their language, as the keyboard layout preferred by the OEMs and their OS supplier (and pretendedly by the local population, but that?s untrue, they just aren?t given the choice) does not allow to write French. The most outstanding default is that the French letter ?? is missing. These two blog posts are seemingly just the iceberg?s top of that criticism of other people?s work that must be current practice among Apple?s competitors when the matter is what keyboard to offer in Qu?bec. The funny side is that they do worse, not better (while they should), thus missing precisely what is commonly supposed to be the condition of any criticism. So *if* they want to insist on selling that keyboard they?re selling, then they *at least* have to add ?? on AltGr+Oo, and ?? on AltGr+Aa. [They must have been told this quite a number of times. Voil? once more, in the case they?re monitoring this Mailing List.] About the alternative so-called French traditional layout that ships with Windows for use in Canada, there?s to say that to make it at least Latin-1, one should re-add the superscript two that seems to have been replaced with the at sign (while superscript one and three are there), and the masculine ordinal indicator that seems to have been replaced with the micro sign (while the feminine ordinal indicator is there). And to make it Latin-9 and definitely Unicode, one should add the ?? ligature e.g. on the ?? key which is empty on AltGr at this time. I?wonder whether they noticed the criticism of locale keyboard support flowing in at Microsoft that is mirrored in this blog post: http://www.siao2.com/2005/01/01/345222.aspx IMHO one cannot do such a bad job AND bully the Canadian Multilingual Standard keyboard at the same time, I?m sure everybody agrees. (Please see my next e-mail. To avoid sending one too long e-mail, I?ve splitted the stuff in two.) Nevertheless, whatever utterings are very useful to decrypt to learn about the inner thoughts that finally determine what companies are doing or not doing, regardless of the companies? size. It?s like French etnographer Germaine?Tillion said in an interview: One must *understand* what oppresses you. And she related this to her personal interpretation of the verb ?to?exist? (based on its Latin etymology). This recalls me that French people in Canada are a minority. Actually, Qu?bec is likely to be overrun by the road-roller of uniformization and big business that is eager to shape the market to make it fit into its business strategy, its stock flow management, by removing key #102, the Applications key, and actually the Right Control key. Too long a space bar, poor ergonomics (with Right Alt too much to the right). And by unsupporting the Canadian Multilingual keyboard. Would Microsoft, Hewlett?Packard, and the other manufacturers, please grant Qu?bec full support, and help it to fully exist? Thanks, Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: From charupdate at orange.fr Wed Sep 2 12:45:47 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Wed, 2 Sep 2015 19:45:47 +0200 (CEST) Subject: Effectiveness of locale support Message-ID: <128861395.14966.1441215947475.JavaMail.www@wwinf1h08> In my previous e-mail I?ve... a typo. Please read *ethnographer* with two greek h's, not one. And I've started proving that the Canadian Multilingual Standard keyboard is far better than its competitors. No wonder: The only reason it has been created, was to make something better fit for French (and many other languages) to be written in Canada and particularly in Qu?bec. And the reason it has been standardized, was that the users who have been asked for their opinion, clearly preferred the new keyboard over the existing ones (even if at the beginning, it wasn?t multilingual, and was restricted by the lacks of Latin-1). One could even make it a rule: Usually one is likely to consider that a national government and standards body are in a far better place to learn about?and cater for?user preferences, than anybody else. Denying people to write correctly their language and to use their preferred keyboard, is illegal discrimination. If there isn?t any legal provision prohibiting this discrimination in Qu?bec, that?s probably because Canada is not a part of the European?Union, I infer from what Richard?wrote on 28 Aug 2015 at 00:09: > I may have scared them into > silence by noting that people changing code because of one particular > *new* sentence in Section 23.2, namely: > > P2S4: Note in particular that the word joiner is ignored for word > > segmentation. > are at risk (but see below) of putting themselves in breach of the UK's > 'Equality Act 2010'; more generally, they may be in breach of > transpositions of the EU Racial Equality Directive (2000/43/EC). You > don't need to have racialist intentions to be in breach. http://www.unicode.org/mail-arch/unicode-ml/y2015-m08/0222.html To better mind what?s on, I invite you to take a glance at some details: The keyboard symbols that are puzzling strangers at the point that they may ask ?where is the right Control key?, are on the keycaps because they are in ISO?9995-7 and allow for a-linguality instead of bilingual overload. However, as they stay missing evidence, one is about to set keycaps back to text. I wrote that the Canadian Standard keyboard is a genuine ISO?9995 implementation. That?s true for the original standard. Unfortunately, this was altered when it was implemented on Windows. This issue is about the group selector, which should be Shift+AltGr, not right Ctrl, and should be remanent. The ISO standard only allows for THREE levels per group; hence, again, no Shift+AltGr level. That fully ISO 9995 conformant keyboards are restricted to three levels per group, is an accessibility issue: No user must be forced to type his language with more than *two* fingers. (Many people, including me, got started when learning about this fact, as this is also a counter-productive limitation, but I?m not discussing an ISO standard here.) Furthermore, the ISO keyboard standard 9995 always considered that all characters for natioal use must fit into Group?1, while Group?2 (and above) is to contain supplemental characters for all other supported languages. Like it or not, this principle is deeply embedded in ISO?9995. Now we may ask ?Why the ?? isn?t therein?? Because ?? had been excluded from ISO?8859-1 on the faith of French representatives (who didn?t really represent France but only one manufacturer, as for the most voicy of the two), and regardless of the Canadian representative asking for its inclusion because ?? is *necessary* in French. Standards need to be read carefully prior to making statements on what is missing in a given implementation. And sometimes you even have to investigate. [To tell it in people?s words following the above blog post: CSA didn?t do like MS is supposed to have done...] That?s how Ian?James altered the keyboard?s ergonomics, given that many dead keys, all to the right, are now to be pressed along with Right Control! It?s as if he were too tired to add some code conferring the specified behavior to the Right Alt key. I dimly suggest that there could be a relation to what is discussed in another blog post; I would say that for being disliked, the standard has been implemented carelessly: http://www.siao2.com/2008/10/23/9013000.aspx Never let other people make an OS implementation of your standard! That?s why we need to access the C sources of Windows keyboard drivers. And that?s why we need to get our drivers compiled from C sources (as opposed to KLC sources). In the scope of Unicode implementation, feeding KLC files into KbdUTool is not too bad as a method, as this allows for chained dead keys, and for ligatures under a five units length ceiling even when missing defines are added in kbd.h. But this way, locales support is suboptimal, because Windows? potential is not fully available. Best regards, Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike.mcglothlin at gmail.com Wed Sep 2 12:59:18 2015 From: mike.mcglothlin at gmail.com (Michael McGlothlin) Date: Wed, 2 Sep 2015 12:59:18 -0500 Subject: Dark beer emoji In-Reply-To: <61BE6BAF-A38E-4D20-BF27-57F81F1AB531@swales.us> References: <20150901093703.665a7a7059d7ee80bb4d670165c8327d.70846b3fe6.wbe@email03.secureserver.net> <61BE6BAF-A38E-4D20-BF27-57F81F1AB531@swales.us> Message-ID: It should be applied to all emoji. Could be fun with the poo one. Thanks, Michael McGlothlin Sent from my iPhone > On Sep 1, 2015, at 4:01 PM, Steve Swales wrote: > > Personally, I love this idea, and would like to claim first authorship ??. Here?s a snippet from the email I sent to my old colleagues at Apple back on April 15th (not the 1st): > >> Hi, Apple iOS/Keyboard/Design//I18n folks, >> >> Just wanted to say, nice work on the new Emoji keyboard design and expanded repertoire. I desperately wish the skin tone modifiers would work on the beer emoji, however. Need my porter and stout. Maybe next update? For old times' sake? ?? . > > -steve > >> On Sep 1, 2015, at 9:37 AM, Doug Ewell wrote: >> >> Document L2/15-211, "Letter in support of dark beer emoji" >> , is a >> request submitted by Cuauht?moc Moctezuma, a Mexican brewery. >> >> The letter refers to a petition with more than 22,000 signatures >> supporting such an emoji, and may have at least some commercial >> motivation ("We want the dark beer to be part of peoples >> conversations"). >> >> As an alternative to this proposal that may provide more flexibility, I >> propose adapting the Fitzpatrick skin-tone modifiers from U+1F3FB to >> U+1F3FF to be valid for use following U+1F37A BEER MUG or U+1F37B >> CLINKING BEER MUGS. >> >> This could be done by establishing a normative correlation between the >> Fitzpatrick scale and the Standard Reference Method (SRM), Lovibond, >> and/or European Brewery Convention (EBC) beer color scales >> . >> >> This mechanism would allow the entire spectrum of beer styles to be >> depicted, instead of dividing beers arbitrarily into "light" and "dark," >> in the same way (and for the same reason) that Unicode already supports >> a variety of skin tones. >> >> For example, a Budweiser or similar lager could be represented as >> ???? <1F37A, 1F3FB>, while a Newcastle Brown Ale might be ???? >> <1F37A, 1F3FD>. U+1F3FF could denote imperial stout or Baltic porter. >> There might be a need to encode an additional "Type 0" color modifier to >> extend the "light" end of the scale, such as for non-alcoholic brews, or >> for Coors Light. >> >> U+1F37B could be used to denote two beers of the same style, but for >> beers of different colors, the mechanism described in UTR #51, Section >> 2.2.1 ("Multi-Person Groupings"), involving ZWJ, could be utilized. So a >> toast between drinkers of the two beers above could be encoded as >> ????????? <1F37A, 1F3FB, 200D, 1F37A, 1F3FD>. Longer sequences >> would also be possible, such as for beer samplers offered in some pubs >> and restaurants. >> >> I have no idea whether my proposal is more or less serious, or more or >> less likely to be adopted, than the original. >> >> -- >> Doug Ewell | http://ewellic.org | Thornton, CO ???? > > From charupdate at orange.fr Wed Sep 2 14:13:20 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Wed, 2 Sep 2015 21:13:20 +0200 (CEST) Subject: Dark beer emoji In-Reply-To: References: <20150901093703.665a7a7059d7ee80bb4d670165c8327d.70846b3fe6.wbe@email03.secureserver.net> <61BE6BAF-A38E-4D20-BF27-57F81F1AB531@swales.us> Message-ID: <666369174.13853.1441221200621.JavaMail.www@wwinf2233> On 02 Sep 2015 at 20:07, Michael McGlothlin wrote: > It should be applied to all emoji. Could be fun with the poo one. > > >> On Sep 1, 2015, at 9:37 AM, Doug Ewell wrote: > >> > >> As an alternative to this proposal that may provide more flexibility, I > >> propose adapting the Fitzpatrick skin-tone modifiers from U+1F3FB to > >> U+1F3FF to be valid for use following U+1F37A BEER MUG or U+1F37B > >> CLINKING BEER MUGS. With U+1F35E BREAD it will be particularly useful to denote completeness. Whole bread vs white bread -- and all crumb tone levels between. Note: This isn't a mockery. I've thought at this when Asmus mentioned emoji for milk and bread: http://www.unicode.org/mail-arch/unicode-ml/y2015-m08/0017.html ? Thank you Doug and all who responded. Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: From gwalla at gmail.com Wed Sep 2 15:56:42 2015 From: gwalla at gmail.com (Garth Wallace) Date: Wed, 2 Sep 2015 13:56:42 -0700 Subject: Dark beer emoji In-Reply-To: <666369174.13853.1441221200621.JavaMail.www@wwinf2233> References: <20150901093703.665a7a7059d7ee80bb4d670165c8327d.70846b3fe6.wbe@email03.secureserver.net> <61BE6BAF-A38E-4D20-BF27-57F81F1AB531@swales.us> <666369174.13853.1441221200621.JavaMail.www@wwinf2233> Message-ID: On Wed, Sep 2, 2015 at 12:13 PM, Marcel Schneider wrote: > On 02 Sep 2015 at 20:07, Michael McGlothlin > wrote: > >> It should be applied to all emoji. Could be fun with the poo one. >> >> >> On Sep 1, 2015, at 9:37 AM, Doug Ewell wrote: >> >> >> >> As an alternative to this proposal that may provide more flexibility, I >> >> propose adapting the Fitzpatrick skin-tone modifiers from U+1F3FB to >> >> U+1F3FF to be valid for use following U+1F37A BEER MUG or U+1F37B >> >> CLINKING BEER MUGS. > > With U+1F35E BREAD it will be particularly useful to denote completeness. > Whole bread vs white bread -- and all crumb tone levels between. TYPE 1-2: White bread TYPE 3: Potato bread TYPE 4: Whole wheat TYPE 5: Multigrain TYPE 6: Pumpernickel But why stop there? They could also be applied to U+1F382 BIRTHDAY CAKE: TYPE 1-2: Angel food TYPE 3: Carrot cake TYPE 4: German's chocolate TYPE 5: Red velvet TYPE 6: Devil's food > Note: This isn't a mockery. I've thought at this when Asmus mentioned emoji > for milk and bread: How would the skin tone modifiers affect milk, I wonder? There's chocolate milk, sure, but shades? Would one of them be strawberry milk? From gwalla at gmail.com Wed Sep 2 16:00:25 2015 From: gwalla at gmail.com (Garth Wallace) Date: Wed, 2 Sep 2015 14:00:25 -0700 Subject: Dark beer emoji In-Reply-To: References: <20150901093703.665a7a7059d7ee80bb4d670165c8327d.70846b3fe6.wbe@email03.secureserver.net> <61BE6BAF-A38E-4D20-BF27-57F81F1AB531@swales.us> Message-ID: On Wed, Sep 2, 2015 at 10:59 AM, Michael McGlothlin wrote: > It should be applied to all emoji. Could be fun with the poo one. Who was it who proposed a set of Bristol stool scale modifiers for U+1F4A9? From doug at ewellic.org Wed Sep 2 16:26:07 2015 From: doug at ewellic.org (Doug Ewell) Date: Wed, 02 Sep 2015 14:26:07 -0700 Subject: Dark beer emoji Message-ID: <20150902142607.665a7a7059d7ee80bb4d670165c8327d.3f31c9e5b7.wbe@email03.secureserver.net> Garth Wallace wrote: > TYPE 1-2: White bread > TYPE 3: Potato bread > TYPE 4: Whole wheat > TYPE 5: Multigrain > TYPE 6: Pumpernickel While trying to construct a rejoinder involving soft drinks (variously "soda" or "pop"), I discovered that Unicode has no such emoji. This is an outrage, of course. I can't believe Unicode even calls itself a coded character set without an emoji for soft drinks. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From andrewcwest at gmail.com Wed Sep 2 17:37:31 2015 From: andrewcwest at gmail.com (Andrew West) Date: Wed, 2 Sep 2015 23:37:31 +0100 Subject: Dark beer emoji In-Reply-To: <20150901093703.665a7a7059d7ee80bb4d670165c8327d.70846b3fe6.wbe@email03.secureserver.net> References: <20150901093703.665a7a7059d7ee80bb4d670165c8327d.70846b3fe6.wbe@email03.secureserver.net> Message-ID: On 1 September 2015 at 17:37, Doug Ewell wrote: > > As an alternative to this proposal that may provide more flexibility, I > propose adapting the Fitzpatrick skin-tone modifiers from U+1F3FB to > U+1F3FF to be valid for use following U+1F37A BEER MUG or U+1F37B > CLINKING BEER MUGS. > > This could be done by establishing a normative correlation between the > Fitzpatrick scale and the Standard Reference Method (SRM), Lovibond, > and/or European Brewery Convention (EBC) beer color scales > . > > This mechanism would allow the entire spectrum of beer styles to be > depicted, instead of dividing beers arbitrarily into "light" and "dark," > in the same way (and for the same reason) that Unicode already supports > a variety of skin tones. > > For example, a Budweiser or similar lager could be represented as > ???? <1F37A, 1F3FB>, while a Newcastle Brown Ale might be ???? > <1F37A, 1F3FD>. U+1F3FF could denote imperial stout or Baltic porter. > There might be a need to encode an additional "Type 0" color modifier to > extend the "light" end of the scale, such as for non-alcoholic brews, or > for Coors Light. Yet more blatant anti-ginger discrimination. Yet another reason to encode a ginger emoji modifier at the earliest opportunity (see https://www.change.org/p/apple-redheads-should-have-emoji-too), which could then be applied to U+1F37A BEER MUG in order to depict ginger beer. Andrew From doug at ewellic.org Wed Sep 2 17:45:07 2015 From: doug at ewellic.org (Doug Ewell) Date: Wed, 02 Sep 2015 15:45:07 -0700 Subject: Dark beer emoji Message-ID: <20150902154507.665a7a7059d7ee80bb4d670165c8327d.08be4b2b0f.wbe@email03.secureserver.net> Andrew West wrote: > Yet more blatant anti-ginger discrimination. Yet another reason to > encode a ginger emoji modifier at the earliest opportunity (see > https://www.change.org/p/apple-redheads-should-have-emoji-too), which > could then be applied to U+1F37A BEER MUG in order to depict ginger > beer. Quote from the change.org page: "We can hardly believe that our petition to get ginger emoji has received over 10,000 signatures! There's still work to be done but one step closer to getting a redhead emoji!" That's the perception. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From olopierpa at gmail.com Wed Sep 2 18:09:24 2015 From: olopierpa at gmail.com (Pierpaolo Bernardi) Date: Thu, 3 Sep 2015 01:09:24 +0200 Subject: Dark beer emoji In-Reply-To: <20150902154507.665a7a7059d7ee80bb4d670165c8327d.08be4b2b0f.wbe@email03.secureserver.net> References: <20150902154507.665a7a7059d7ee80bb4d670165c8327d.08be4b2b0f.wbe@email03.secureserver.net> Message-ID: A warm beer expresses a very different concept from a cold beer. I propose a range of temperature modifiers. On Thu, Sep 3, 2015 at 12:45 AM, Doug Ewell wrote: > Andrew West wrote: > >> Yet more blatant anti-ginger discrimination. Yet another reason to >> encode a ginger emoji modifier at the earliest opportunity (see >> https://www.change.org/p/apple-redheads-should-have-emoji-too), which >> could then be applied to U+1F37A BEER MUG in order to depict ginger >> beer. > > Quote from the change.org page: > > "We can hardly believe that our petition to get ginger emoji has > received over 10,000 signatures! There's still work to be done but one > step closer to getting a redhead emoji!" > > That's the perception. > > -- > Doug Ewell | http://ewellic.org | Thornton, CO ???? > > From A.Schappo at lboro.ac.uk Thu Sep 3 03:22:33 2015 From: A.Schappo at lboro.ac.uk (Andre Schappo) Date: Thu, 3 Sep 2015 08:22:33 +0000 Subject: Decomposition/Compatibility Mapping Issue Message-ID: So ............... I was looking at http://unicode.org/cldr/utility/regex.jsp?a=%5Cp%7Bscript%3DHan%7D&b=? and getting a cool looking Modified Regex Pattern. The last range ??-?? is CJK Compatibility Ideographs Supplement U+2F800-2FA1D. [?-??-??-????-??-??-??-??-??-???-????-????-????-????-??] So ....... then ....... I decided to copy/paste the above Modified Regex Pattern into Richard Ishida's Uniview http://r12a.github.io/uniview/ So ........ I then noticed that ?? U+2F800 was listed as ? U+4E3D [CJK Unified Ideographs] Thus the decomposition/compatibility mapping U+4E3D was being substituted for the original U+2F800. I was using Safari on OS X Yosemite. I repeated the above with Chrome and Firefox and there was no problem, no substitution occurred. Thus it appears to be a copy/paste problem with Safari or code used by Safari. I could have so easily missed this problem. I wonder if there are similar decomposition/compatibility mapping issues. Andr? Schappo -------------- next part -------------- An HTML attachment was scrubbed... URL: From charupdate at orange.fr Thu Sep 3 04:09:23 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Thu, 3 Sep 2015 11:09:23 +0200 (CEST) Subject: Dark beer emoji In-Reply-To: References: <20150901093703.665a7a7059d7ee80bb4d670165c8327d.70846b3fe6.wbe@email03.secureserver.net> <61BE6BAF-A38E-4D20-BF27-57F81F1AB531@swales.us> <666369174.13853.1441221200621.JavaMail.www@wwinf2233> Message-ID: <655390662.4885.1441271363853.JavaMail.www@wwinf1h23> On Wed, 2 Sep 2015 14:00:25 -0700, Garth Wallace wrote: > > With U+1F35E BREAD it will be particularly useful to denote completeness. > > Whole bread vs white bread -- and all crumb tone levels between. > > TYPE 1-2: White bread > TYPE 3: Potato bread > TYPE 4: Whole wheat > TYPE 5: Multigrain > TYPE 6: Pumpernickel > > But why stop there? They could also be applied to U+1F382 BIRTHDAY CAKE: > > TYPE 1-2: Angel food > TYPE 3: Carrot cake > TYPE 4: German's chocolate > TYPE 5: Red velvet > TYPE 6: Devil's food > > > Note: This isn't a mockery. I've thought at this when Asmus mentioned emoji > > for milk and bread: > > How would the skin tone modifiers affect milk, I wonder? There's > chocolate milk, sure, but shades? While primarily, TYPE 3 or TYPE?4 when applied to a future MILK emoji, could denote ?coffee with milk?, I'd prefer it could be WHOLE SUGAR MILK. There are two reasons to that. 1??, nutritionists unanimously warn us that a mix of coffee and milk is harmful. 2??, the mineral content of whole sugar makes it a balanced food by contrast with the depleted sugar (whether this be refined or not, with or without caramel). > Would one of them be strawberry milk? I would like so. TYPE 5: Strawberry milk Asmus already told us that there'll be no soy beans emoji, by lack of iconicity. However, could there be a generic BEANS emoji along with the on-coming MILK emoji? A two-emoji sequence MILK, BEANS or BEANS, MILK would then denote tonyu. Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmus-inc at ix.netcom.com Thu Sep 3 04:45:26 2015 From: asmus-inc at ix.netcom.com (Asmus Freytag (t)) Date: Thu, 3 Sep 2015 02:45:26 -0700 Subject: Dark beer emoji In-Reply-To: <655390662.4885.1441271363853.JavaMail.www@wwinf1h23> References: <20150901093703.665a7a7059d7ee80bb4d670165c8327d.70846b3fe6.wbe@email03.secureserver.net> <61BE6BAF-A38E-4D20-BF27-57F81F1AB531@swales.us> <666369174.13853.1441221200621.JavaMail.www@wwinf2233> <655390662.4885.1441271363853.JavaMail.www@wwinf1h23> Message-ID: <55E816B6.8000308@ix.netcom.com> An HTML attachment was scrubbed... URL: From charupdate at orange.fr Thu Sep 3 04:48:45 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Thu, 3 Sep 2015 11:48:45 +0200 (CEST) Subject: Dark beer emoji In-Reply-To: <20150902142607.665a7a7059d7ee80bb4d670165c8327d.3f31c9e5b7.wbe@email03.secureserver.net> References: <20150902142607.665a7a7059d7ee80bb4d670165c8327d.3f31c9e5b7.wbe@email03.secureserver.net> Message-ID: <2086574028.5787.1441273725652.JavaMail.www@wwinf1h23> On Wed, 02 Sep 2015 14:26:07 -0700, Doug Ewell wrote: > Garth Wallace wrote: > > > TYPE 1-2: White bread > > TYPE 3: Potato bread > > TYPE 4: Whole wheat > > TYPE 5: Multigrain > > TYPE 6: Pumpernickel > > While trying to construct a rejoinder involving soft drinks (variously > "soda" or "pop"), I discovered that Unicode has no such emoji. > > This is an outrage, of course. I can't believe Unicode even calls itself > a coded character set without an emoji for soft drinks. Sorry to contradict. So far, not encoding soft drinks emoji is IMHO a wise decision, in accordance with governmental and public health autorities? action against soft drink consumption (along with that against alcoholic beverages consumption). The issue with soft drinks is that they?re made of water and depleted sugar instead of water and complete sugar (see my previous e-mail), while in many cases the sugar?s colour tone has strictly no impact on the beverage?s final colour. By the way, the whole sugar?s taste may pleasingly complete the overall aroma. So not having encoded any soft drink emoji before a number of other food and beverage emoji are encoded, does not detract from Unicode being a coded character set. On the other hand, I believe that encoding a glass that is not to contain alcoholic beverages, say, a soft drink glass, which is almost the same as a milk glass, could be a very useful proposal. Yet we have then a new GLASS OF MILK emoji, which being polysemic, could denote lemon soda when yellow, and so on across the colour spectrum. And of course, the same will be used for FRUIT JUICE! Most probably when preceded by a fruit emoji like DURIAN to depict a DURIAN SMOOTHIE. Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: From charupdate at orange.fr Thu Sep 3 05:01:25 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Thu, 3 Sep 2015 12:01:25 +0200 (CEST) Subject: Dark beer emoji In-Reply-To: References: <20150901093703.665a7a7059d7ee80bb4d670165c8327d.70846b3fe6.wbe@email03.secureserver.net> Message-ID: <1084893606.6075.1441274485337.JavaMail.www@wwinf1h23> On Wed, 2 Sep 2015 23:37:31 +0100, Andrew West wrote: > On 1 September 2015 at 17:37, Doug Ewell wrote: > > > > As an alternative to this proposal that may provide more flexibility, I > > propose adapting the Fitzpatrick skin-tone modifiers from U+1F3FB to > > U+1F3FF to be valid for use following U+1F37A BEER MUG or U+1F37B > > CLINKING BEER MUGS. > > > > This could be done by establishing a normative correlation between the > > Fitzpatrick scale and the Standard Reference Method (SRM), Lovibond, > > and/or European Brewery Convention (EBC) beer color scales > > . > > > > This mechanism would allow the entire spectrum of beer styles to be > > depicted, instead of dividing beers arbitrarily into "light" and "dark," > > in the same way (and for the same reason) that Unicode already supports > > a variety of skin tones. > > > > For example, a Budweiser or similar lager could be represented as > > ???? <1F37A, 1F3FB>, while a Newcastle Brown Ale might be ???? > > <1F37A, 1F3FD>. U+1F3FF could denote imperial stout or Baltic porter. > > There might be a need to encode an additional "Type 0" color modifier to > > extend the "light" end of the scale, such as for non-alcoholic brews, or > > for Coors Light. > > Yet more blatant anti-ginger discrimination. Yet another reason to > encode a ginger emoji modifier at the earliest opportunity (see > https://www.change.org/p/apple-redheads-should-have-emoji-too), which > could then be applied to U+1F37A BEER MUG in order to depict ginger > beer. Given all the colours to be encoded as a modifier, could there be a way to encode a MULTICOLOUR EMOJI MODIFIER? It could consist of two hex digits for the code range, and three hex digits for the colour (based on the HTML three digit hex codes for colours). This would solve at once all colour requirements in a reasonable resolution (as I?believe that defining emoji tones with six digits is uselessly precise). Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: From charupdate at orange.fr Thu Sep 3 06:20:15 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Thu, 3 Sep 2015 13:20:15 +0200 (CEST) Subject: Dark beer emoji In-Reply-To: <55E816B6.8000308@ix.netcom.com> References: <20150901093703.665a7a7059d7ee80bb4d670165c8327d.70846b3fe6.wbe@email03.secureserver.net> <61BE6BAF-A38E-4D20-BF27-57F81F1AB531@swales.us> <666369174.13853.1441221200621.JavaMail.www@wwinf2233> <655390662.4885.1441271363853.JavaMail.www@wwinf1h23> <55E816B6.8000308@ix.netcom.com> Message-ID: <304482719.8028.1441279215723.JavaMail.www@wwinf1j19> On Thu, 3 Sep 2015 02:45:26 -0700, Asmus Freytag (t) wrote: > On 9/3/2015 2:09 AM, Marcel Schneider wrote: > > Asmus already told us that there'll be no soy beans emoji, by lack of iconicity. However, could there be a generic BEANS emoji > A coffee bean has a very recognizable shape, and I personally would consider it of sufficient "iconicity" to be considered, although, not uncommonly the representations seem to involve more than a single coffee bean... IMHO *three* beans per emoji may be a suitable compromise between iconicity and the idea of a plural. So will we have a COFFEE BEANS emoji along with a generic BEANS emoji? Supposing that a coffee bean has so peculiar a shape that it cannot represent any other species. But the *two* would be very useful, each one in its domain. Thanks, Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: From wjgo_10009 at btinternet.com Thu Sep 3 07:00:56 2015 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Thu, 3 Sep 2015 13:00:56 +0100 (BST) Subject: Dark beer emoji In-Reply-To: <1084893606.6075.1441274485337.JavaMail.www@wwinf1h23> References: <20150901093703.665a7a7059d7ee80bb4d670165c8327d.70846b3fe6.wbe@email03.secureserver.net> <1084893606.6075.1441274485337.JavaMail.www@wwinf1h23> Message-ID: <27446542.29776.1441281656629.JavaMail.defaultUser@defaultHost> Marcel Schneider wrote as follows: > Given all the colours to be encoded as a modifier, could there be a way to encode a MULTICOLOUR EMOJI MODIFIER? It could consist of two hex digits for the code range, and three hex digits for the colour (based on the HTML three digit hex codes for colours). This would solve at once all colour requirements in a reasonable resolution (as I believe that defining emoji tones with six digits is uselessly precise). May I refer you to the following thread please? Tag characters and in-line graphics (from Tag characters) The first post in the thread is as follows. http://www.unicode.org/mail-arch/unicode-ml/y2015-m05/0218.html There is within that post a suggested format for specifying a custom colour for a local palette using tag characters. That is for use within an inline-graphic yet could be adapted to applying one or more colours to the glyph of an individual character. With the in-line graphic there would be a particular base character used for the in-line graphic. For colourizing an individual character the character itself would be the base character. I would prefer base 10 numbers to specify colour components, as used in many graphics programs. 192R224G64B2s means store as local palette colour 2 the colour (R=192, G=224, B=64) For each glyph with more than one colour used within the glyph, The Unicode Standard would need to state the palette colour number for each part of the glyph. William Overington 3 September 2015 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ken.shirriff at gmail.com Thu Sep 3 09:27:41 2015 From: ken.shirriff at gmail.com (Ken Shirriff) Date: Thu, 3 Sep 2015 07:27:41 -0700 Subject: Upcoming proposal for Bitcoin sign Message-ID: I'm putting together a proposal for the Bitcoin sign to be added to Unicode, so I wanted to check here if people have any comments/concerns/objections. I'm aware of the previous rejected proposal L2/11-130 and I address the issues from its rejection . In particular, my proposal includes many examples of the symbol in running text. I also checked with bitcoin.org that they have no trademark on the logo. Please let me know of any other potential issues. Ken -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.buenzli at erratique.ch Thu Sep 3 10:06:01 2015 From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=) Date: Thu, 3 Sep 2015 16:06:01 +0100 Subject: Technical or encoding sub mailing list ? Message-ID: <8B7636C94F06431C85F9DF6C8C67C020@erratique.ch> Hello, Since I implement parts of the Unicode standard I'm interested in keeping in touch with discussions about the standard and its evolution from a technical point of view. I'm however not interested in the encoding point of view and all the discussions of whichever pet symbol or concept random people from the internet want to assign an integer to. With respect to these interests the amount of noise and off-topic threads I get from this list is considerate and I'm considering unsubscribing. Before I do so I would like to ask the moderators of this mailing list if they would consider creating either a more technically focused mailing list for implementers or, alternatively, forking off encoding discussions to a dedicated mailing list. Thanks, Daniel From Shawn.Steele at microsoft.com Thu Sep 3 11:15:19 2015 From: Shawn.Steele at microsoft.com (Shawn Steele) Date: Thu, 3 Sep 2015 16:15:19 +0000 Subject: Dark beer emoji In-Reply-To: <655390662.4885.1441271363853.JavaMail.www@wwinf1h23> References: <20150901093703.665a7a7059d7ee80bb4d670165c8327d.70846b3fe6.wbe@email03.secureserver.net> <61BE6BAF-A38E-4D20-BF27-57F81F1AB531@swales.us> <666369174.13853.1441221200621.JavaMail.www@wwinf2233> <655390662.4885.1441271363853.JavaMail.www@wwinf1h23> Message-ID: If we have a bunch of ingredients emoji, then do yeast + grain + hops emoji combine into beer emoji? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rick at unicode.org Thu Sep 3 11:32:42 2015 From: rick at unicode.org (Rick McGowan) Date: Thu, 03 Sep 2015 09:32:42 -0700 Subject: The proposed update LDML specification for CLDR Release 28 now available for review Message-ID: <55E8762A.5010101@unicode.org> A proposed update to the LDML specification (UTS #35) will be available for review as of Monday, September 7 at 06:00 GMT. The open review period closes on Monday, September 14 at 06:00 GMT. (This is a short review period, because CLDR 28 is scheduled for release in the week of September 16.) The proposed update will be at http://unicode.org/reports/tr35/proposed.html To report bugs in the specification, please use http://unicode.org/cldr/trac/newticket From doug at ewellic.org Thu Sep 3 11:41:39 2015 From: doug at ewellic.org (Doug Ewell) Date: Thu, 03 Sep 2015 09:41:39 -0700 Subject: Technical or encoding sub mailing list =?UTF-8?Q?=3F?= Message-ID: <20150903094139.665a7a7059d7ee80bb4d670165c8327d.3fabbe5441.wbe@email03.secureserver.net> Daniel B?nzli wrote: > Since I implement parts of the Unicode standard I'm interested in > keeping in touch with discussions about the standard and its evolution > from a technical point of view. > > I'm however not interested in the encoding point of view and all the > discussions of whichever pet symbol or concept random people from the > internet want to assign an integer to. > > With respect to these interests the amount of noise and off-topic > threads I get from this list is considerate and I'm considering > unsubscribing. > > Before I do so I would like to ask the moderators of this mailing list > if they would consider creating either a more technically focused > mailing list for implementers or, alternatively, forking off encoding > discussions to a dedicated mailing list. Well, that's not elitist or anything. Many of us are also implementers of the Unicode Standard, have been on the Unicode list for a long time (17 years in my case), and hardly think of ourselves as "random people from the internet." -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From daniel.buenzli at erratique.ch Thu Sep 3 11:59:59 2015 From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=) Date: Thu, 3 Sep 2015 17:59:59 +0100 Subject: Technical or encoding sub mailing list ? In-Reply-To: <20150903094139.665a7a7059d7ee80bb4d670165c8327d.3fabbe5441.wbe@email03.secureserver.net> References: <20150903094139.665a7a7059d7ee80bb4d670165c8327d.3fabbe5441.wbe@email03.secureserver.net> Message-ID: Le jeudi, 3 septembre 2015 ? 17:41, Doug Ewell a ?crit : > Well, that's not elitist or anything. > > Many of us are also implementers of the Unicode Standard, have been on > the Unicode list for a long time (17 years in my case), and hardly think > of ourselves as "random people from the internet." If that can reassure you I do consider myself a random person from the internet on this list. It just turns out that random persons from the internet do have different interests, hence my request. Daniel From doug at ewellic.org Thu Sep 3 12:33:32 2015 From: doug at ewellic.org (Doug Ewell) Date: Thu, 03 Sep 2015 10:33:32 -0700 Subject: Technical or encoding sub mailing list =?UTF-8?Q?=3F?= Message-ID: <20150903103332.665a7a7059d7ee80bb4d670165c8327d.78816c6054.wbe@email03.secureserver.net> For 75 USD per year, or about 73 CHF, you can join the Unicode Consortium as an individual member, and thereby have full access to the Unicore list and other internal technical discussion lists. There are discounts on membership rates if you pay for 3 or more years at a time. http://www.unicode.org/consortium/levels.html -- Doug Ewell | http://ewellic.org | Thornton, CO ???? -------- Original Message -------- Subject: Re: Technical or encoding sub mailing list ? From: Daniel_B?nzli Date: Thu, September 03, 2015 10:59 am To: Doug Ewell Cc: Unicode Mailing List Le jeudi, 3 septembre 2015 ? 17:41, Doug Ewell a ?crit : > Well, that's not elitist or anything. > > Many of us are also implementers of the Unicode Standard, have been on > the Unicode list for a long time (17 years in my case), and hardly think > of ourselves as "random people from the internet." If that can reassure you I do consider myself a random person from the internet on this list. It just turns out that random persons from the internet do have different interests, hence my request. Daniel From unicode at maxtruxa.com Thu Sep 3 13:40:35 2015 From: unicode at maxtruxa.com (Max Truxa) Date: Thu, 3 Sep 2015 20:40:35 +0200 Subject: Technical or encoding sub mailing list ? In-Reply-To: <55e89284.b2b2320a.183f4.fffffe12SMTPIN_ADDED_MISSING@mx.google.com> References: <20150903103332.665a7a7059d7ee80bb4d670165c8327d.78816c6054.wbe@email03.secureserver.net> <55e89284.b2b2320a.183f4.fffffe12SMTPIN_ADDED_MISSING@mx.google.com> Message-ID: On Sep 3, 2015 5:11 PM, "Daniel B?nzli" wrote: > > Hello, > > Since I implement parts of the Unicode standard I'm interested in keeping in touch with discussions about the standard and its evolution from a technical point of view. > > I'm however not interested in the encoding point of view and all the discussions of whichever pet symbol or concept random people from the internet want to assign an integer to. > > With respect to these interests the amount of noise and off-topic threads I get from this list is considerate and I'm considering unsubscribing. > > Before I do so I would like to ask the moderators of this mailing list if they would consider creating either a more technically focused mailing list for implementers or, alternatively, forking off encoding discussions to a dedicated mailing list. > > Thanks, > > Daniel > > I feel you. Even though I probably would have worded such a request a little less offensive (or at least in a way people are less likely to take offense in it). Personally i find many non-technical discussions very interesting to read but the effort required to parse all that information to find something that is actually technically relevant can be quite huge at times. On Sep 3, 2015 7:39 PM, "Doug Ewell" wrote: > > For 75 USD per year, or about 73 CHF, you can join the Unicode > Consortium as an individual member, and thereby have full access to the > Unicore list and other internal technical discussion lists. > > There are discounts on membership rates if you pay for 3 or more years > at a time. > > http://www.unicode.org/consortium/levels.html > > -- > Doug Ewell | http://ewellic.org | Thornton, CO ???? > I didn't know it was that simple to get access to the Unicore list. Thank you very much! Best regards, Max Truxa From daniel.buenzli at erratique.ch Thu Sep 3 13:42:37 2015 From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=) Date: Thu, 3 Sep 2015 19:42:37 +0100 Subject: Technical or encoding sub mailing list ? In-Reply-To: <20150903103332.665a7a7059d7ee80bb4d670165c8327d.78816c6054.wbe@email03.secureserver.net> References: <20150903103332.665a7a7059d7ee80bb4d670165c8327d.78816c6054.wbe@email03.secureserver.net> Message-ID: <6F47D1886B614A0282BDA84FAF76445A@erratique.ch> Le jeudi, 3 septembre 2015 ? 18:33, Doug Ewell a ?crit : > For 75 USD per year, or about 73 CHF, you can join the Unicode > Consortium as an individual member, and thereby have full access to the > Unicore list and other internal technical discussion lists. Well that sounds elitist... Joke apart, I still think that most of the time a good distinction can be made between the standard and the encoding process. The latter being a much more political procedure to which as an implementer I prefer to remain neutral to (and am not interested in following). Best, Daniel From doug at ewellic.org Thu Sep 3 13:53:12 2015 From: doug at ewellic.org (Doug Ewell) Date: Thu, 03 Sep 2015 11:53:12 -0700 Subject: Technical or encoding sub mailing list =?UTF-8?Q?=3F?= Message-ID: <20150903115312.665a7a7059d7ee80bb4d670165c8327d.4f20bad301.wbe@email03.secureserver.net> Daniel B?nzli wrote: >> For 75 USD per year, or about 73 CHF, you can join the Unicode >> Consortium as an individual member, and thereby have full access to >> the Unicore list and other internal technical discussion lists. > > Well that sounds elitist... FWIW, I'm not a member due to the cost. > Joke apart, I still think that most of the time a good distinction can > be made between the standard and the encoding process. The latter > being a much more political procedure to which as an implementer I > prefer to remain neutral to (and am not interested in following). Most Internet mailing lists contain threads that may not be of interest to every subscriber. The Delete button is your friend. Waiting to see if Sarasvati decides to weigh in on the proposal to split the list. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From charupdate at orange.fr Thu Sep 3 14:30:34 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Thu, 3 Sep 2015 21:30:34 +0200 (CEST) Subject: Technical or encoding sub mailing list ? In-Reply-To: <20150903094139.665a7a7059d7ee80bb4d670165c8327d.3fabbe5441.wbe@email03.secureserver.net> References: <20150903094139.665a7a7059d7ee80bb4d670165c8327d.3fabbe5441.wbe@email03.secureserver.net> Message-ID: <1737048593.21455.1441308634210.JavaMail.www@wwinf1f21> On Thu, 03 Sep 2015 09:41:39 -0700, Doug Ewell wrote: > > Daniel B?nzli wrote: > > > Since I implement parts of the Unicode standard I'm interested in > > keeping in touch with discussions about the standard and its evolution > > from a technical point of view. > > > > I'm however not interested in the encoding point of view and all the > > discussions of whichever pet symbol or concept random people from the > > internet want to assign an integer to. > > > > With respect to these interests the amount of noise and off-topic > > threads I get from this list is considerate and I'm considering > > unsubscribing. > > > > Before I do so I would like to ask the moderators of this mailing list > > if they would consider creating either a more technically focused > > mailing list for implementers or, alternatively, forking off encoding > > discussions to a dedicated mailing list. > > Well, that's not elitist or anything. > > Many of us are also implementers of the Unicode Standard, have been on > the Unicode list for a long time (17 years in my case), and hardly think > of ourselves as "random people from the internet." I believe that Daniel targets rather people like me, who am new on the List and have (unfortunately) never been a Unicode staff member. Nevertheless, I don't believe that anybody's subscription to this List result from a ?random?. To meet Daniel's request, the ?technical? threads Daniel is likely to be interested in, might be given a ?(TECHNICAL)? attribute in the Subject at some point of the thread, so that it will be easy to filter them and follow back in the Archive. I hope that helps... Marcel (?implementing? Unicode on a keyboard layout hopefully designed for a national standard) From asmus-inc at ix.netcom.com Thu Sep 3 14:41:42 2015 From: asmus-inc at ix.netcom.com (Asmus Freytag (t)) Date: Thu, 3 Sep 2015 12:41:42 -0700 Subject: Dark beer emoji In-Reply-To: References: <20150901093703.665a7a7059d7ee80bb4d670165c8327d.70846b3fe6.wbe@email03.secureserver.net> <61BE6BAF-A38E-4D20-BF27-57F81F1AB531@swales.us> <666369174.13853.1441221200621.JavaMail.www@wwinf2233> <655390662.4885.1441271363853.JavaMail.www@wwinf1h23> Message-ID: <55E8A276.8020304@ix.netcom.com> An HTML attachment was scrubbed... URL: From asmus-inc at ix.netcom.com Thu Sep 3 14:54:30 2015 From: asmus-inc at ix.netcom.com (Asmus Freytag (t)) Date: Thu, 3 Sep 2015 12:54:30 -0700 Subject: Technical or encoding sub mailing list ? In-Reply-To: <20150903094139.665a7a7059d7ee80bb4d670165c8327d.3fabbe5441.wbe@email03.secureserver.net> References: <20150903094139.665a7a7059d7ee80bb4d670165c8327d.3fabbe5441.wbe@email03.secureserver.net> Message-ID: <55E8A576.3000409@ix.netcom.com> An HTML attachment was scrubbed... URL: From doug at ewellic.org Fri Sep 4 10:06:24 2015 From: doug at ewellic.org (Doug Ewell) Date: Fri, 04 Sep 2015 08:06:24 -0700 Subject: Another attempt at plain language Message-ID: <20150904080624.665a7a7059d7ee80bb4d670165c8327d.43d7553337.wbe@email03.secureserver.net> Mark Davis ?? wrote: > However, if it ends up not being added as a BCP47 variant, one could > file a ticket for consideration as a BCP47 locale variant. The syntax > would be a bit different, eg en-u-va-plain vs en-plain. To clarify, this would be an extension-U subtag in accordance with RFC 6067. I'm confused how "plain English" (or German or what have you) represents any sort of aspect of a locale. Is "special variant" that open-ended? -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From doug at ewellic.org Fri Sep 4 10:10:52 2015 From: doug at ewellic.org (Doug Ewell) Date: Fri, 04 Sep 2015 08:10:52 -0700 Subject: Please disregard my post to the wrong list Message-ID: <20150904081052.665a7a7059d7ee80bb4d670165c8327d.7b7ff9941c.wbe@email03.secureserver.net> From chris.fynn at gmail.com Fri Sep 4 12:22:33 2015 From: chris.fynn at gmail.com (Christopher Fynn) Date: Fri, 4 Sep 2015 22:52:33 +0530 Subject: "Unicode of Death" In-Reply-To: <20150528075342.665a7a7059d7ee80bb4d670165c8327d.f8c9f482c0.wbe@email03.secureserver.net> References: <20150528075342.665a7a7059d7ee80bb4d670165c8327d.f8c9f482c0.wbe@email03.secureserver.net> Message-ID: On 28 May 2015 at 20:23, Doug Ewell wrote: .... > "Every character you use has a unicode value which tells your phone what > to display. One of the unicode values is actually never-ending and so > when the phone tries to read it it goes into an infinite loop which > crashes it." > > I've read TUS Chapter 4 and UTR #23 and I still can't find the > "never-ending" Unicode property. > > Perhaps astonishingly to some, the string displays fine on all my > Windows devices. Not all apps get the directionality right, but no > crashes. > Well isn't Apple's street address "Infinite Loop"? -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.fynn at gmail.com Fri Sep 4 12:31:09 2015 From: chris.fynn at gmail.com (Christopher Fynn) Date: Fri, 4 Sep 2015 23:01:09 +0530 Subject: "Unicode of Death" In-Reply-To: References: <20150528075342.665a7a7059d7ee80bb4d670165c8327d.f8c9f482c0.wbe@email03.secureserver.net>

Message-ID: Perhaps there should be a "tounge in cheek" emoji to indicate this On 30 May 2015 at 04:50, Andrew Cunningham wrote: > Geez Philippe, > > It was tounge in cheek. > > A. > > > On Saturday, 30 May 2015, Philippe Verdy wrote: > > > > 2015-05-28 23:36 GMT+02:00 Andrew Cunningham : > >> > >> Not the first time unicode crashes things. There was the google chrome > bug on osx that crashed the tab for any syriac text. > > > > "Unicode crashes things"? Unicode has nothing to do in those crashes > caused by bugs in applications that make incorrect assumptions (in fact not > even related to characters themselves but to the supposed behavior of the > layout engine. Programmers and designers for example VERY frequently forget > the constraints for RTL languages and make incorrect assumptions about left > and right sides when sizing objects, or they don't expect that the cursor > will advance backward and forget that some measurements can be negative: if > they use this negative value to compute the size of a bitmap redering > surface, they'll get out of memory, unchecked null pointers returned, then > they will crash assuming the buffer was effectively allocated. > > These are the same kind of bugs as with the too common buffer overruns > with unchecked assumtions: the code is kept because "it works as is" in > their limited immediate tests. > > Producing full coverage tests is a difficult and lengthy task, that > programmers not always have the time to do, when they are urged to produce > a workable solution for some clients and then given no time to improve the > code before the same code is distributed to a wider range of clients. > > Commercial staffs do that frequently, they can't even read the technical > limitations even when they are documented by programmers... in addition the > commercial staff like selling softwares that will cause customers to ask > for support... that will be billed ! After that, programmers are > overwhelmed by bug reports and support requests, and have even less time to > design other thigs that they are working on and still have to produce. QA > tools may help programmers in this case by providing statistics about the > effective costs of producing new software with better quality, and the cost > of supporting it when it contains too many bugs: commercial teams like > those statistics because they can convert them to costs, commercial > margins, and billing rates. (When such QA tools are not used, programmers > will rapidly leave the place, they are fed up by the growing pressure to do > always more in the same time, with also a growing number of "urgent" > support requests.). > > Those that say "Unicode crashes things" do the same thing: they make > broad unchecked assumptions about how things are really made or how things > are actually working. > > > > -- > Andrew Cunningham > Project Manager, Research and Development > (Social and Digital Inclusion) > Public Libraries and Community Engagement > State Library of Victoria > 328 Swanston Street > Melbourne VIC 3000 > Australia > > Ph: +61-3-8664-7430 > Mobile: 0459 806 589 > Email: acunningham at slv.vic.gov.au > lang.support at gmail.com > > http://www.openroad.net.au/ > http://www.mylanguage.gov.au/ > http://www.slv.vic.gov.au/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charupdate at orange.fr Fri Sep 4 13:11:09 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Fri, 4 Sep 2015 20:11:09 +0200 (CEST) Subject: "Unicode of Death" Message-ID: <627754852.23518.1441390269789.JavaMail.www@wwinf1e22> On Fri, 4 Sep 2015 23:01:09 +0530, Christopher Fynn wrote: [...] >> On Saturday, 30 May 2015, Philippe Verdy wrote: >> >>> 2015-05-28 23:36 GMT+02:00 Andrew Cunningham : >>>> >>>> Not the first time unicode crashes things. There was the google chrome bug on osx that crashed the tab for any syriac text. >>> >>> "Unicode crashes things"? Unicode has nothing to do in those crashes caused by bugs in applications that make incorrect assumptions (in fact not even related to characters themselves but to the supposed behavior of the layout engine. Programmers and designers for example VERY frequently forget the constraints for RTL languages and make incorrect assumptions about left and right sides when sizing objects, or they don't expect that the cursor will advance backward and forget that some measurements can be negative: if they use this negative value to compute the size of a bitmap redering surface, they'll get out of memory, unchecked null pointers returned, then they will crash assuming the buffer was effectively allocated. >>> These are the same kind of bugs as with the too common buffer overruns with unchecked assumtions: the code is kept because "it works as is" in their limited immediate tests. >>> Producing full coverage tests is a difficult and lengthy task, that programmers not always have the time to do, when they are urged to produce a workable solution for some clients and then given no time to improve the code before the same code is distributed to a wider range of clients. >>> Commercial staffs do that frequently, they can't even read the technical limitations even when they are documented by programmers... in addition the commercial staff like selling softwares that will cause customers to ask for support... that will be billed ! After that, programmers are overwhelmed by bug reports and support requests, and have even less time to design other thigs that they are working on and still have to produce. QA tools may help programmers in this case by providing statistics about the effective costs of producing new software with better quality, and the cost of supporting it when it contains too many bugs: commercial teams like those statistics because they can convert them to costs, commercial margins, and billing rates. (When such QA tools are not used, programmers will rapidly leave the place, they are fed up by the growing pressure to do always more in the same time, with also a growing number of "urgent" support requests.). >>> Those that say "Unicode crashes things" do the same thing: they make broad unchecked assumptions about how things are really made or how things are actually working. Voil? a very huge part of the answer to my various questions. I?ve joined up too late... >>> commercial staff like selling softwares that will cause customers to ask for support... that will be billed ! That was my suspicion when I faced so much problems. So there?s nothing more to await? Thanks Philippe! Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: From charupdate at orange.fr Sat Sep 5 02:35:15 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Sat, 5 Sep 2015 09:35:15 +0200 (CEST) Subject: "Unicode of Death" In-Reply-To: References: <20150528075342.665a7a7059d7ee80bb4d670165c8327d.f8c9f482c0.wbe@email03.secureserver.net>

Message-ID: <1731639002.1272.1441438515747.JavaMail.www@wwinf1g33> On Fri, 4 Sep 2015 23:01:09 +0530, Christopher Fynn wrote: > Perhaps there should be a "tounge in cheek" emoji to indicate this I didn?t notice the joke. Did you mean ?tongue in cheek?? (I?ve checked there are two spellings.) You may feel free to laugh. I too did at Shawn?s and Asmus? joke?:?D http://unicode.org/mail-arch/unicode-ml/y2015-m09/0042.html [but not, of course, when I read about people having their devices crashing]?:?( Thanks! Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Sat Sep 5 09:14:31 2015 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Sat, 5 Sep 2015 16:14:31 +0200 Subject: Upcoming proposal for Bitcoin sign In-Reply-To: References: Message-ID: At one point, the proposal states: Another alternative is ? THAI CURRENCY SYMBOL BAHT. This has the advantage of already being in Unicode and somewhat resembling the Bitcoin sign. A major disadvantage is this symbol is already in use as a currency symbol for a different currency, so using it to represent Bitcoin will lead to confusion.The Baht and the Bitcoin sign are two different symbols for two different currencies. Currency symbols are quite often used for very different currencies, with very different values. The $, for example, is used for currencies all over the world, including many not called 'dollar'. I'd suggest that you amend your proposal to address why the case of Bitcoin and Baht are different than the case of Dollar and Peso (and other currencies using $). Mark *? Il meglio ? l?inimico del bene ?* On Thu, Sep 3, 2015 at 4:27 PM, Ken Shirriff wrote: > I'm putting together a proposal for the Bitcoin sign to be added to > Unicode, so I wanted to check here if people have any > comments/concerns/objections. > > I'm aware of the previous rejected proposal L2/11-130 > and I address the > issues from its rejection > . In particular, my > proposal includes many examples of the symbol in running text. I also > checked with bitcoin.org that they have no trademark on the logo. > > Please let me know of any other potential issues. > > Ken > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ken.shirriff at gmail.com Sat Sep 5 10:24:44 2015 From: ken.shirriff at gmail.com (Ken Shirriff) Date: Sat, 5 Sep 2015 08:24:44 -0700 Subject: Upcoming proposal for Bitcoin sign In-Reply-To: References: Message-ID: Thanks for your comment, Mark. I've rewritten the baht section. Let me know if this addresses your concerns. Another alternative is ? THAI CURRENCY SYMBOL BAHT. The bitcoin sign and baht symbol are two unrelated symbols that have some visual similarity. They are not variants of the same symbol, unlike single-bar and double-bar dollar signs. Some websites use the baht symbol to represent bitcoins due to the lack of the bitcoin symbol in Unicode. However, this is considered by some to be ?hijacking? and ?stealing? of the bhat symbol. [footnote] While the same symbol can be used for two currencies (e.g. $ for dollars and pesos), reusing the baht symbol for bitcoin is not a good solution when two different symbols currently exist. Footnote: Some Bitcoin enthusiasts want to hijack the symbol for Thailand?s currency, Tech in Asia. https://www.techinasia.com/bitcoin-enthusiasts-steal-symbol-thailands-currency/ To ? or not to ?: Bitcoin debates stealing Thai baht's identity. http://bangkok.coconuts.co/2014/04/22/bh-or-not-b-bitcoin-movement-debates-stealing-thai-bahts-identity Ken On Sat, Sep 5, 2015 at 7:14 AM, Mark Davis ?? wrote: > At one point, the proposal states: > > Another alternative is ? THAI CURRENCY SYMBOL BAHT. This has the advantage > of already being in Unicode and somewhat resembling the Bitcoin sign. A > major disadvantage is this symbol is already in use as a currency symbol > for a different currency, so using it to represent Bitcoin will lead to > confusion.The Baht and the Bitcoin sign are two different symbols for two > different currencies. > > > Currency symbols are quite often used for very different currencies, with > very different values. The $, for example, is used for currencies all over > the world, including many not called 'dollar'. I'd suggest that you amend > your proposal to address why the case of Bitcoin and Baht are different > than the case of Dollar and Peso (and other currencies using $). > > > Mark > > *? Il meglio ? l?inimico del bene ?* > > On Thu, Sep 3, 2015 at 4:27 PM, Ken Shirriff > wrote: > >> I'm putting together a proposal for the Bitcoin sign to be added to >> Unicode, so I wanted to check here if people have any >> comments/concerns/objections. >> >> I'm aware of the previous rejected proposal L2/11-130 >> and I address the >> issues from its rejection >> . In particular, my >> proposal includes many examples of the symbol in running text. I also >> checked with bitcoin.org that they have no trademark on the logo. >> >> Please let me know of any other potential issues. >> >> Ken >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From duerst at it.aoyama.ac.jp Mon Sep 7 00:10:45 2015 From: duerst at it.aoyama.ac.jp (=?UTF-8?Q?Martin_J._D=c3=bcrst?=) Date: Mon, 7 Sep 2015 14:10:45 +0900 Subject: Upcoming proposal for Bitcoin sign In-Reply-To: References:

Message-ID: <55ED1C55.7060702@it.aoyama.ac.jp> Hello Ken, You write "The bitcoin sign and baht symbol are two unrelated symbols that have some visual similarity.", but don't really give any supporting information for that claim. For example, searching for images of bitcoin and bath symbols shows that the Bitcoin usually has two vertical bars, which however show only above and below the B, whereas the baht sign usually has one bar going through the B. But first, this distinction is not always maintained. Second, I extremely strongly doubt that people are making the distinction in handwriting. The 'bath form' of the symbol is much easier to write by hand that the 'bitcoin form', and so most people in handwriting will use the former even for bitcoins. Just try to correctly write the four little strokes of the 'bitcoin form', and you will understand easily. Regards, Martin. On 2015/09/06 00:24, Ken Shirriff wrote: > Thanks for your comment, Mark. I've rewritten the baht section. Let me know > if this addresses your concerns. > > > Another alternative is ? THAI CURRENCY SYMBOL BAHT. The bitcoin sign and > baht symbol are two unrelated symbols that have some visual similarity. > They are not variants of the same symbol, unlike single-bar and double-bar > dollar signs. Some websites use the baht symbol to represent bitcoins due > to the lack of the bitcoin symbol in Unicode. However, this is considered > by some to be ?hijacking? and ?stealing? of the bhat symbol. [footnote] > While the same symbol can be used for two currencies (e.g. $ for dollars > and pesos), reusing the baht symbol for bitcoin is not a good solution when > two different symbols currently exist. > > Footnote: > > Some Bitcoin enthusiasts want to hijack the symbol for Thailand?s currency, > Tech in Asia. > https://www.techinasia.com/bitcoin-enthusiasts-steal-symbol-thailands-currency/ > To ? or not to ?: Bitcoin debates stealing Thai baht's identity. > http://bangkok.coconuts.co/2014/04/22/bh-or-not-b-bitcoin-movement-debates-stealing-thai-bahts-identity > > > Ken > > On Sat, Sep 5, 2015 at 7:14 AM, Mark Davis ?? wrote: > >> At one point, the proposal states: >> >> Another alternative is ? THAI CURRENCY SYMBOL BAHT. This has the advantage >> of already being in Unicode and somewhat resembling the Bitcoin sign. A >> major disadvantage is this symbol is already in use as a currency symbol >> for a different currency, so using it to represent Bitcoin will lead to >> confusion.The Baht and the Bitcoin sign are two different symbols for two >> different currencies. >> >> >> Currency symbols are quite often used for very different currencies, with >> very different values. The $, for example, is used for currencies all over >> the world, including many not called 'dollar'. I'd suggest that you amend >> your proposal to address why the case of Bitcoin and Baht are different >> than the case of Dollar and Peso (and other currencies using $). >> >> >> Mark >> >> *? Il meglio ? l?inimico del bene ?* >> >> On Thu, Sep 3, 2015 at 4:27 PM, Ken Shirriff >> wrote: >> >>> I'm putting together a proposal for the Bitcoin sign to be added to >>> Unicode, so I wanted to check here if people have any >>> comments/concerns/objections. >>> >>> I'm aware of the previous rejected proposal L2/11-130 >>> and I address the >>> issues from its rejection >>> . In particular, my >>> proposal includes many examples of the symbol in running text. I also >>> checked with bitcoin.org that they have no trademark on the logo. >>> >>> Please let me know of any other potential issues. >>> >>> Ken >>> >> >> > From richard.wordingham at ntlworld.com Mon Sep 7 01:23:21 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Mon, 7 Sep 2015 07:23:21 +0100 Subject: String Ranges in Unicode Sets In-Reply-To: <55E8762A.5010101@unicode.org> References: <55E8762A.5010101@unicode.org> Message-ID: <20150907072321.48321560@JRWUBU2> On Thu, 03 Sep 2015 09:32:42 -0700 Rick McGowan wrote: > A proposed update to the LDML specification (UTS #35) will be > available for review as of Monday, September 7 at 06:00 GMT. The open > review period closes on Monday, September 14 at 06:00 GMT. (This is a > short review period, because CLDR 28 is scheduled for release in the > week of September 16.) > > The proposed update will be at > http://unicode.org/reports/tr35/proposed.html > > To report bugs in the specification, please use > http://unicode.org/cldr/trac/newticket > Have the implications of adding string ranges to Unicode sets been considered? I'm mentioning them on the list because their impact goes beyond locales, and I haven't worked out their implications myself. By my reading, adding string ranges will initially make regular expression engines that don't use ICU non-compliant with Level 1 of UTS#18 Unicode Regular Expressions, in particular RL1.3 'subtraction and intersection'. I don't imagine the extra work of set operations on Unicode sets containing string ranges will be popular. It may be worst for the minority of regular expression engines that use the regularity of regular expressions. I note that the safety feature of requiring the start and end points to have the same length has been removed from their design. String ranges seem particularly vulnerable to the ill-effects of unpredictable normalisation. Richard. From asmus-inc at ix.netcom.com Mon Sep 7 01:24:51 2015 From: asmus-inc at ix.netcom.com (Asmus Freytag (t)) Date: Sun, 6 Sep 2015 23:24:51 -0700 Subject: Upcoming proposal for Bitcoin sign In-Reply-To: <55ED1C55.7060702@it.aoyama.ac.jp> References:

<55ED1C55.7060702@it.aoyama.ac.jp> Message-ID: <55ED2DB3.5040009@ix.netcom.com> An HTML attachment was scrubbed... URL: From mark at macchiato.com Mon Sep 7 09:54:16 2015 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Mon, 7 Sep 2015 16:54:16 +0200 Subject: String Ranges in Unicode Sets In-Reply-To: <20150907072321.48321560@JRWUBU2> References: <55E8762A.5010101@unicode.org> <20150907072321.48321560@JRWUBU2> Message-ID: Thanks for the feedback. >By my reading, adding string ranges will initially make regular expression engines that don't use ICU non-compliant with Level 1 of UTS#18 Unicode Regular Expressions, in particular RL1.3 'subtraction and I don't see where you are getting that. UTS 35 isn't referenced by UTS 18 except for some examples of possible extensions in 1.2.3 Other Properties, and locale id syntax in level 3. I may be missing something, however. Can you tell me where #18 is referencing UnicodeSet? > I don't imagine the extra work of set operations String ranges need not be implemented internally (and I don't think the CLDR committee would expect them to be, in general). They are simply a way of expressing the *string format* of a UnicodeSet in a more compact fashion. (And UnicodeSets themselves can have a variety of different implementations, in any event). ?> ? String ? ? ranges seem particularly vulnerable to the ill-effects of unpredictable UnicodeSets are low level constructs, as are their string representations. Like all strings, the string format of a UnicodeSet may change if it is normalized. That is nothing new. - The string format "[a-?]" (that is, U+0061 LATIN SMALL LETTER A through U+2126 OHM SIGN) represents a UnicodeSet that contains 8,390 code points. - Under NFC it would change to "[a-?]" (that is, U+0061 LATIN SMALL LETTER A through U+03A9 GREEK CAPITAL LETTER OMEGA), and contain 841 code points. You really don't want to normalize the string format of UnicodeSets. Or if you suspect that those string formats might be normalized, then just use escaped format \x{...} for anything that might change under normalization. === Note that while it is fine to bring up topics for discussion here (or, better yet, on the "cldr-users at unicode.org" list), anything that requires a change will have to be filed as a CLDR ticket. Richard, I'm sure you know this, and also raised this topic here because of the relation to UTS18, so this is a reminder for others. Mark *? Il meglio ? l?inimico del bene ?* On Mon, Sep 7, 2015 at 8:23 AM, Richard Wordingham < richard.wordingham at ntlworld.com> wrote: > On Thu, 03 Sep 2015 09:32:42 -0700 > Rick McGowan wrote: > > > A proposed update to the LDML specification (UTS #35) will be > > available for review as of Monday, September 7 at 06:00 GMT. The open > > review period closes on Monday, September 14 at 06:00 GMT. (This is a > > short review period, because CLDR 28 is scheduled for release in the > > week of September 16.) > > > > The proposed update will be at > > http://unicode.org/reports/tr35/proposed.html > > > > To report bugs in the specification, please use > > http://unicode.org/cldr/trac/newticket > > > > Have the implications of adding string ranges to Unicode sets been > considered? I'm mentioning them on the list because their impact goes > beyond locales, and I haven't worked out their implications myself. > > By my reading, adding string ranges will initially make regular > expression engines that don't use ICU non-compliant with Level 1 of > UTS#18 Unicode Regular Expressions, in particular RL1.3 'subtraction and > intersection'. I don't imagine the extra work of set operations on > Unicode sets containing string ranges will be popular. It may be worst > for the minority of regular expression engines that use the regularity > of regular expressions. > > I note that the safety feature of requiring the start and end points > to have the same length has been removed from their design. String > ranges seem particularly vulnerable to the ill-effects of unpredictable > normalisation. > > Richard. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wjgo_10009 at btinternet.com Mon Sep 7 09:11:12 2015 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Mon, 7 Sep 2015 15:11:12 +0100 (BST) Subject: A song in Esperanto Message-ID: <32856325.49589.1441635072798.JavaMail.defaultUser@defaultHost> A song in Esperanto I have written a song in Esperanto and published it on the web. http://www.users.globalnet.co.uk/~ngo/song1023.htm The publication process was interesting and I applied information that I found in the following Unicode code chart. Latin Extended-A http://www.unicode.org/charts/PDF/U0100.pdf I used the following two characters from that code chart. U+011D LATIN SMALL LETTER G WITH CIRCUMFLEX U+015D LATIN SMALL LETTER S WITH CIRCUMFLEX I wrote the HTML code directly into WordPad and saved as a Text Document from WordPad. I encoded the two accented characters each by using an ampersand followed by a U+0023 NUMBER SIGN character followed by an x followed by a four hexadecimal character code point followed by a semicolon. I have also published some other songs on the web. There is an index page as follows. http://www.users.globalnet.co.uk/~ngo/song0001.htm Two of the songs are as a result of topics on this mailing list. They are on the following pages. http://www.users.globalnet.co.uk/~ngo/song1018.htm http://www.users.globalnet.co.uk/~ngo/song1021.htm There is also the following which is about colour fonts. http://www.users.globalnet.co.uk/~ngo/une_chanson.pdf William Overington 7 September 2015 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ken.shirriff at gmail.com Mon Sep 7 11:26:59 2015 From: ken.shirriff at gmail.com (Ken Shirriff) Date: Mon, 7 Sep 2015 09:26:59 -0700 Subject: Upcoming proposal for Bitcoin sign In-Reply-To: <55ED1C55.7060702@it.aoyama.ac.jp> References:

<55ED1C55.7060702@it.aoyama.ac.jp> Message-ID: On Sun, Sep 6, 2015 at 10:10 PM, Martin J. D?rst wrote: > Hello Ken, > > You write "The bitcoin sign and baht symbol are two unrelated symbols that > have some visual similarity.", but don't really give any supporting > information for that claim. > Thanks for your comments, Martin. Asmus Freytag gave a detailed response, but I'd like to add a few things. The bitcoin sign is unrelated to the baht in origin. The bitcoin sign was first used in an icon replacing the software's "BC" logo with the bitcoin sign logo, showing the roots of the bitcoin sign are the letter B. There's no historical connection to the baht, unlike the multiple uses of $ which are historically related. The baht sign and the bitcoin sign are viewed as two distinct symbols by most of the Bitcoin community. Evidence for this is the bitcoin.org forum, which implemented a special mechanism to insert the bitcoin sign in text. This was done because the baht sign are bitcoin sign are considered different by the community. If the bitcoin sign were considered interchangeable with ?, it would have been much easier to just use ?. Other evidence is the development of special fonts to display the bitcoin sign. I believe (based on my reading) that the Thai community views the baht sign and the bitcoin sign as two distinct symbols. I have never seen the bitcoin sign used to represent baht (except one case widely viewed as a mistake ). As a thought experiment, consider a font that rendered the baht sign with the bitcoin glyph. I expect this would be extremely unpopular in Thailand, showing the bitcoin sign is not just a glyph variant of the baht sign. I linked to a couple articles from Thailand criticizing use of ? as "stealing" the baht sign, but use of the bitcoin sign is not viewed as a problem, showing that the bitcoin sign is not viewed in Thailand as a variant of the baht sign. Visually, the bitcoin sign and baht sign are distinct. The bitcoin sign is almost invariably represented with two vertical bars, which are not visible through the center of the B. This is how it is described on the bitcoin wiki . The baht sign is almost invariably represented with one vertical bar, which is visible through the B. (I couldn't find any official definition of the baht sign.) This is a different situation from the dollar sign, where single-bar and double-bar forms are interchangeable. A font can't provide a single glyph that will be satisfactory for both baht and bitcoin signs. To summarize, the bitcoin community and the Thai community both view the bitcoin sign and baht sign as two separate symbols. They shouldn't be unified. Ken > For example, searching for images of bitcoin and bath symbols shows that > the Bitcoin usually has two vertical bars, which however show only above > and below the B, whereas the baht sign usually has one bar going through > the B. > > But first, this distinction is not always maintained. Second, I extremely > strongly doubt that people are making the distinction in handwriting. The > 'bath form' of the symbol is much easier to write by hand that the 'bitcoin > form', and so most people in handwriting will use the former even for > bitcoins. Just try to correctly write the four little strokes of the > 'bitcoin form', and you will understand easily. > > Regards, Martin. > > > On 2015/09/06 00:24, Ken Shirriff wrote: > >> Thanks for your comment, Mark. I've rewritten the baht section. Let me >> know >> if this addresses your concerns. >> >> >> Another alternative is ? THAI CURRENCY SYMBOL BAHT. The bitcoin sign and >> baht symbol are two unrelated symbols that have some visual similarity. >> They are not variants of the same symbol, unlike single-bar and double-bar >> dollar signs. Some websites use the baht symbol to represent bitcoins due >> to the lack of the bitcoin symbol in Unicode. However, this is considered >> by some to be ?hijacking? and ?stealing? of the bhat symbol. [footnote] >> While the same symbol can be used for two currencies (e.g. $ for dollars >> and pesos), reusing the baht symbol for bitcoin is not a good solution >> when >> two different symbols currently exist. >> >> Footnote: >> >> Some Bitcoin enthusiasts want to hijack the symbol for Thailand?s >> currency, >> Tech in Asia. >> >> https://www.techinasia.com/bitcoin-enthusiasts-steal-symbol-thailands-currency/ >> To ? or not to ?: Bitcoin debates stealing Thai baht's identity. >> >> http://bangkok.coconuts.co/2014/04/22/bh-or-not-b-bitcoin-movement-debates-stealing-thai-bahts-identity >> >> >> Ken >> >> On Sat, Sep 5, 2015 at 7:14 AM, Mark Davis ?? wrote: >> >> At one point, the proposal states: >>> >>> Another alternative is ? THAI CURRENCY SYMBOL BAHT. This has the >>> advantage >>> of already being in Unicode and somewhat resembling the Bitcoin sign. A >>> major disadvantage is this symbol is already in use as a currency symbol >>> for a different currency, so using it to represent Bitcoin will lead to >>> confusion.The Baht and the Bitcoin sign are two different symbols for two >>> different currencies. >>> >>> >>> Currency symbols are quite often used for very different currencies, with >>> very different values. The $, for example, is used for currencies all >>> over >>> the world, including many not called 'dollar'. I'd suggest that you amend >>> your proposal to address why the case of Bitcoin and Baht are different >>> than the case of Dollar and Peso (and other currencies using $). >>> >>> >>> Mark >>> >>> *? Il meglio ? l?inimico del bene ?* >>> >>> On Thu, Sep 3, 2015 at 4:27 PM, Ken Shirriff >>> wrote: >>> >>> I'm putting together a proposal for the Bitcoin sign to be added to >>>> Unicode, so I wanted to check here if people have any >>>> comments/concerns/objections. >>>> >>>> I'm aware of the previous rejected proposal L2/11-130 >>>> and I address the >>>> issues from its rejection >>>> . In particular, my >>>> proposal includes many examples of the symbol in running text. I also >>>> checked with bitcoin.org that they have no trademark on the logo. >>>> >>>> Please let me know of any other potential issues. >>>> >>>> Ken >>>> >>>> >>> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From everson at evertype.com Mon Sep 7 12:27:55 2015 From: everson at evertype.com (Michael Everson) Date: Mon, 7 Sep 2015 19:27:55 +0200 Subject: Upcoming proposal for Bitcoin sign In-Reply-To: References:

<55ED1C55.7060702@it.aoyama.ac.jp> Message-ID: <844DFF91-522F-4D55-BDCB-53267BBC640C@evertype.com> Just want to say, I don?t think this one is a runner right now. I spent many months recently working with people associated with Bitcoin and they could not decide what they wanted to do. Michael Everson * http://www.evertype.com/ From unicode at mva.name Mon Sep 7 12:49:03 2015 From: unicode at mva.name (Vadim A. Misbakh-Soloviov) Date: Mon, 07 Sep 2015 23:49:03 +0600 Subject: [RFC] Discussion about chances of some characters to be added in Unicode Message-ID: <1866954.ovXpklIGud@hp> Hello there! First of all, I'm sorry in advance, if my message's tone is not suitable for that mail list. Next, I'd like to discuss the chances of some characters to be added in Unicode at all. Most of all I interested about: 1) Full-height right and left isosceles triangles, positioned in the edges of the glyph space (so, when concatinated with space symbol on the background of same color of it's foreground, it looks integrally [ref: triangles_demo attach, although there is font rendering artefacts anyway, but, I hope, I clearly decribed the idea]). ref: symbols on the both edges on the attached "pwl" picture 2) "Forking" characher (not the math one, but VCS one). ref: in the middle on the attached "pwl" picture. 3) "Pause" (media) character (it is ones for "play/pause" and "play" in the unicode already, but it not for "pause"). There is "cheats" like using two vertical bars instead, usually it looks very ugly. 4) "Power" (like on power buttons on electronic devices) And, actually, imho, it also be nice to have all of symbols from the picture in the Unicode. P.S. I'd also ask about some more symbols, which is "missed" in everyday life and substituted with glyphicons on the web (but, you know, it is impossible to use glyphicons in CLI/console applications ?), like: "cart", "exit", "barcode" (ideally, including also "qr" and "datamatrix" ones), and more, and more, but let's initially talk about that ones I talked initially ? P.P.S.: and also it would be nice, I think, to have "icons" symbols of major OS brands (at least, Windows, MacOS, Linux, FreeBSD) to stop them (first two ones) of using Private set for that. -- Best regards, mva -------------- next part -------------- A non-text attachment was scrubbed... Name: pwl.png Type: image/png Size: 2611 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: triangles_demo.png Type: image/png Size: 3640 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part. URL: From srl at icu-project.org Mon Sep 7 13:18:12 2015 From: srl at icu-project.org (Steven R. Loomis) Date: Mon, 7 Sep 2015 11:18:12 -0700 Subject: [RFC] Discussion about chances of some characters to be added in Unicode In-Reply-To: <1866954.ovXpklIGud@hp> References: <1866954.ovXpklIGud@hp> Message-ID: Hello! The power symbol was already accepted, see http://unicode.org/alloc/Pipeline.html Steven Enviado desde nuestro iPhone. > El sept 7, 2015, a las 10:49 AM, Vadim A. Misbakh-Soloviov escribi?: > > Hello there! > > First of all, I'm sorry in advance, if my message's tone is not suitable for > that mail list. > Next, I'd like to discuss the chances of some characters to be added in > Unicode at all. > Most of all I interested about: > > 1) Full-height right and left isosceles triangles, positioned in the edges of > the glyph space (so, when concatinated with space symbol on the background of > same color of it's foreground, it looks integrally [ref: triangles_demo > attach, although there is font rendering artefacts anyway, but, I hope, I > clearly decribed the idea]). > ref: symbols on the both edges on the attached "pwl" picture > > 2) "Forking" characher (not the math one, but VCS one). > ref: in the middle on the attached "pwl" picture. > > 3) "Pause" (media) character (it is ones for "play/pause" and "play" in the > unicode already, but it not for "pause"). There is "cheats" like using two > vertical bars instead, usually it looks very ugly. > > 4) "Power" (like on power buttons on electronic devices) > > And, actually, imho, it also be nice to have all of symbols from the picture > in the Unicode. > > P.S. I'd also ask about some more symbols, which is "missed" in everyday life > and substituted with glyphicons on the web (but, you know, it is impossible to > use glyphicons in CLI/console applications ?), like: "cart", "exit", "barcode" > (ideally, including also "qr" and "datamatrix" ones), and more, and more, but > let's initially talk about that ones I talked initially ? > > P.P.S.: and also it would be nice, I think, to have "icons" symbols of major > OS brands (at least, Windows, MacOS, Linux, FreeBSD) to stop them (first two > ones) of using Private set for that. > > > -- > Best regards, > mva > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Mon Sep 7 14:46:06 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Mon, 7 Sep 2015 20:46:06 +0100 Subject: String Ranges in Unicode Sets In-Reply-To: References: <55E8762A.5010101@unicode.org> <20150907072321.48321560@JRWUBU2> Message-ID: <20150907204606.799fa7c0@JRWUBU2> On Mon, 7 Sep 2015 16:54:16 +0200 Mark Davis ?? wrote: > On Mon, Sep 7, 2015 at 8:23 AM, Richard Wordingham < > richard.wordingham at ntlworld.com> wrote: >> By my reading, adding string ranges will initially make regular >> expression engines that don't use ICU non-compliant with Level 1 of >> UTS#18 Unicode Regular Expressions, in particular RL1.3 'subtraction >> and > I don't see where you are getting that. UTS 35 isn't referenced by > UTS 18 except for some examples of possible extensions in 1.2.3 Other > Properties, and locale id syntax in level 3. I may be missing > something, however. Can you tell me where #18 is referencing > UnicodeSet? In http://unicode.org/mail-arch/unicode-ml/y2014-m05/0052.html , you stated that the Unicode sets referred to in UTS#18 RL1.3 are the Unicode sets defined in UTS #35. We are now waiting for you to add the reference under Action 141-A76 - 'Make changes in UTS #18 based on general feedback in L2/14-277' (http://www.unicode.org/L2/L2014/14277-pubrev-ovrflw.html). I presume no change has been made yet because there are no *urgent* changes for UTS #18. > String ranges need not be implemented internally (and I don't think > the CLDR committee would expect them to be, in general). They are > simply a way of expressing the *string format* of a UnicodeSet in a > more compact fashion. (And UnicodeSets themselves can have a variety > of different implementations, in any event). [\x{0000 0000 0000 0000} - \x{DFFFF DFFFF DFFFF DFFFF}] is a very compact way of expressing a lot of strings. You wouldn't decompose that into a list of strings. >> String ? ? >> ranges seem particularly vulnerable to the ill-effects of >> unpredictable > UnicodeSets are low level constructs, as are their string > representations. Like all strings, the string format of a UnicodeSet > may change if it is normalized. That is nothing new. > - The string format "[a-?]" (that is, U+0061 LATIN SMALL LETTER A > through U+2126 OHM SIGN) represents a UnicodeSet that contains 8,390 > code points. > - Under NFC it would change to "[a-?]" (that is, U+0061 LATIN > SMALL LETTER A through U+03A9 GREEK CAPITAL LETTER OMEGA), and > contain 841 code points. At least this gives the same range whether normalised to NFC or to NFD. Using NFD, the preferred normalisation for regular expressions semi-respecting canonical equivalence, [{x?}-{?}] would not include the 2-character string "xa", as both bounds would decompose to two characters. Using NFC, the preferred normalisation for LDML (and for XML, I think), this would be a contraction for [{x?}-{x?}], and would include the 2-character string "xa". If the two strings had to have the same length, [{x?}-{?}] would be flagged as erroneous if interpreted in NFC, and with any luck, similar errors that were not detected would then also be corrected. It's not perfect, but il meglio ? l?inimico del bene. > You really don't want to normalize the string format of UnicodeSets. > Or if you suspect that those string formats might be normalized, then > just use escaped format \x{...} for anything that might change under > normalization. It would probably be sensible to issue a warning if the specification of a string bound had more than one canonical equivalent. I'm thinking of accidents. While an XML processor must not be Unicode compliant, I thought most regular expression engine environments were allowed to be Unicode compliant. TUS 8.0 Chapter 3 C6: "A process shall not assume that the interpretations of two canonical-equivalent character sequences are distinct." > Note that while it is fine to bring up topics for discussion here (or, > better yet, on the "cldr-users at unicode.org" > list), As this impacts regular expressions in general, I think this is the better list for the impact on Unicode sets outside CLDR. > anything that requires a change will have to be filed as a > CLDR ticket. Richard, I'm sure you know this, and also raised this > topic here because of the relation to UTS18, so this is a reminder > for others. Exactly. Richard. From richard.wordingham at ntlworld.com Mon Sep 7 16:43:01 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Mon, 7 Sep 2015 22:43:01 +0100 Subject: Upcoming proposal for Bitcoin sign In-Reply-To: References:

<55ED1C55.7060702@it.aoyama.ac.jp> Message-ID: <20150907224301.10bc5aaf@JRWUBU2> On Mon, 7 Sep 2015 09:26:59 -0700 Ken Shirriff wrote: > The bitcoin sign is unrelated to the baht in origin. The bitcoin sign > was first used in an icon replacing > > the software's "BC" logo with the bitcoin sign logo, showing the > roots of the bitcoin sign are the letter B. There's no historical > connection to the baht, unlike the multiple uses of $ which are > historically related. The bitcoin sign and the baht sign are very closely related. Both are a combination of 'B' and the vertical strokes of the dollar symbol. Indeed, if you look at the first picture at http://www.goabroad.com/articles/study-abroad/thai-cuisine-the-spicy-truth , you can see a plain 'B' on the left and in the middle what looks like a B with two strokes below. A lot of handwritten baht signs end with a rightward flourish from the centre. It would seem that the preferred visible currency sign in Thailand is actually the two-character string ".-"! In a lot of cases, there's either no indicator of currency, or the word is written out in full. Perhaps a saving argument is the two forms of the pound sign - U+00A3 POUND SIGN and U+20A4 LIRA SIGN. Proper blue five pound notes had the two-barred form U+20A4 (which is how I learnt to write the pound sign); as the notes became greener, their lesser value was indicated by the use of the one-barred form U+00A3. The code chart notes that the preferred form for the lira is POUND SIGN, and I can tell you that my preferred form for the pound sterling is the so-called LIRA SIGN. Richard. From asmus-inc at ix.netcom.com Mon Sep 7 17:11:44 2015 From: asmus-inc at ix.netcom.com (Asmus Freytag (t)) Date: Mon, 7 Sep 2015 15:11:44 -0700 Subject: String Ranges in Unicode Sets In-Reply-To: <20150907072321.48321560@JRWUBU2> References: <55E8762A.5010101@unicode.org> <20150907072321.48321560@JRWUBU2> Message-ID: <55EE0BA0.9020105@ix.netcom.com> An HTML attachment was scrubbed... URL: From mark at macchiato.com Tue Sep 8 02:14:44 2015 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Tue, 8 Sep 2015 09:14:44 +0200 Subject: String Ranges in Unicode Sets In-Reply-To: <20150907204606.799fa7c0@JRWUBU2> References: <55E8762A.5010101@unicode.org> <20150907072321.48321560@JRWUBU2> <20150907204606.799fa7c0@JRWUBU2> Message-ID: Mark *? Il meglio ? l?inimico del bene ?* On Mon, Sep 7, 2015 at 9:46 PM, Richard Wordingham < richard.wordingham at ntlworld.com> wrote: > On Mon, 7 Sep 2015 16:54:16 +0200 > Mark Davis ?? wrote: > > > On Mon, Sep 7, 2015 at 8:23 AM, Richard Wordingham < > > richard.wordingham at ntlworld.com> wrote: > > >> By my reading, adding string ranges will initially make regular > >> expression engines that don't use ICU non-compliant with Level 1 of > >> UTS#18 Unicode Regular Expressions, in particular RL1.3 'subtraction > >> and > > > I don't see where you are getting that. UTS 35 isn't referenced by > > UTS 18 except for some examples of possible extensions in 1.2.3 Other > > Properties, and locale id syntax in level 3. I may be missing > > something, however. Can you tell me where #18 is referencing > > UnicodeSet? > > In http://unicode.org/mail-arch/unicode-ml/y2014-m05/0052.html , > you stated that the Unicode sets referred to in UTS#18 RL1.3 are the > Unicode sets defined in UTS #35. We are now waiting for you to add the > reference under Action 141-A76 - 'Make changes in UTS #18 based on > general feedback in > L2/14-277' (http://www.unicode.org/L2/L2014/14277-pubrev-ovrflw.html). > ?Good point. I tend to think that any new syntax would need to be approached charfully, and might only be mentioned as optional at first. But you'll get a chance for public review ? once you see them.? > I presume no change has been made yet because there are no *urgent* > changes for UTS #18. > ?Right, it was backed up behind Unicode 8.0.? > > String ranges need not be implemented internally (and I don't think > > the CLDR committee would expect them to be, in general). They are > > simply a way of expressing the *string format* of a UnicodeSet in a > > more compact fashion. (And UnicodeSets themselves can have a variety > > of different implementations, in any event). > > [\x{0000 0000 0000 0000} - \x{DFFFF DFFFF DFFFF DFFFF}] is a > very compact way of expressing a lot of strings. You wouldn't > decompose that into a list of strings. > Clearly there will be various memory/performance issues that ?would need to be taken into account. Not every implementation will be designed to handle extreme cases, and may simply not allow the creation of such as set. Not every string can be parsed by a BigDecimal system, etc. Not every regex expressions can be used (without DOS) on common implementations, and so on. > >> String ? ? > >> ranges seem particularly vulnerable to the ill-effects of > >> unpredictable > > > UnicodeSets are low level constructs, as are their string > > representations. Like all strings, the string format of a UnicodeSet > > may change if it is normalized. That is nothing new. > > > - The string format "[a-?]" (that is, U+0061 LATIN SMALL LETTER A > > through U+2126 OHM SIGN) represents a UnicodeSet that contains 8,390 > > code points. > > - Under NFC it would change to "[a-?]" (that is, U+0061 LATIN > > SMALL LETTER A through U+03A9 GREEK CAPITAL LETTER OMEGA), and > > contain 841 code points. > > At least this gives the same range whether normalised to NFC or to > NFD. Using NFD, the preferred normalisation for regular > expressions semi-respecting canonical equivalence, [{x?}-{?}] would > not include the 2-character string "xa", as both bounds would decompose > to two characters. Using NFC, the preferred normalisation for LDML > (and for XML, I think), this would be a contraction for [{x?}-{x?}], > and would include the 2-character string "xa". > If the two strings had > to have the same length, [{x?}-{?}] would be flagged as erroneous if > interpreted in NFC, ?If you look at the text in http://unicode.org/repos/cldr/trunk/specs/ldml/tr35.html#Lists_of_Code_Points, there was already a a restriction on the lengths. > and with any luck, similar errors that were not > detected would then also be corrected. It's not perfect, but ?I think that would just give people a false sense of security. Normalizing string format of a UnicodeSet (or regex) can change what the set matches, pretty dramatically, and is to be avoided (or as I said, one should use escaped strings where it can't be avoided). > il meglio > ? l?inimico del bene. > ?LOL? > > You really don't want to normalize the string format of UnicodeSets. > > Or if you suspect that those string formats might be normalized, then > > just use escaped format \x{...} for anything that might change under > > normalization. > > It would probably be sensible to issue a warning if the specification > of a string bound had more than one canonical equivalent. > ?Issue a warning works in a UI. Not necessarily so well in production code... ? > > I'm thinking of accidents. While an XML processor must not be Unicode > compliant, I thought most regular expression engine environments were > allowed to be Unicode compliant. > > TUS 8.0 Chapter 3 C6: "A process shall not assume that the > interpretations of two canonical-equivalent character sequences are > distinct." > ?A compiler will take source code containing String x="?"; and compile it to a certain binary. If that same source code is NFD'd, the compiler will produce a different result. Do you really think that such compiler is not compliant to Unicode?? If so, then we should add some more clarifications around C6. > > Note that while it is fine to bring up topics for discussion here (or, > > better yet, on the "cldr-users at unicode.org" > > list), > > As this impacts regular expressions in general, I think this is the > better list for the impact on Unicode sets outside CLDR. > ? > ?? > > anything that requires a change will have to be filed as a > > CLDR ticket. Richard, I'm sure you know this, and also raised this > > topic here because of the relation to UTS18, so this is a reminder > > for others. > > Exactly. > > Richard. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmus-inc at ix.netcom.com Tue Sep 8 02:53:47 2015 From: asmus-inc at ix.netcom.com (Asmus Freytag (t)) Date: Tue, 8 Sep 2015 00:53:47 -0700 Subject: String Ranges in Unicode Sets In-Reply-To: References: <55E8762A.5010101@unicode.org> <20150907072321.48321560@JRWUBU2> <20150907204606.799fa7c0@JRWUBU2> Message-ID: <55EE940B.2060103@ix.netcom.com> An HTML attachment was scrubbed... URL: From mark at macchiato.com Tue Sep 8 06:46:48 2015 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Tue, 8 Sep 2015 13:46:48 +0200 Subject: String Ranges in Unicode Sets In-Reply-To: <55EE940B.2060103@ix.netcom.com> References: <55E8762A.5010101@unicode.org> <20150907072321.48321560@JRWUBU2> <20150907204606.799fa7c0@JRWUBU2> <55EE940B.2060103@ix.netcom.com> Message-ID: On Tue, Sep 8, 2015 at 9:53 AM, Asmus Freytag (t) wrote: > it is implied the String Range formulation is a compact form. > > Can you prove that it doesn't create any set of strings that can't be > specified in other ways (other than full enumeration of the strings?). > I ?t is simply a compact string representation, and is defined semantically by what it expands to. ? ? Just like character ranges, [a-z], etc. Of course, the underlying implementation *could* differ, but that doesn't affect the semantics. > What about set operations on sets with string ranges? > ?Again, the range notation is just a formatting issue. Anything you can do with [{ax}-{bz}?] you can also do with [{ax}{ay}{az}{bx}{by}{bz}?], and vice versa, since the former is defined to be equivalent to the latter. These are just string representations of the same *logical* underlying implementation. > Can they be expressed (other than working them out and writing down the > full enumeration of the resulting set)? > I'm not quite sure what you mean. That's like asking, "Can [a-z] be expressed, ?other than by writing out the full enumeration [a b c d e ... z]?". Well, yes. You could represent [a-z] in many ways: [\p{ASCII}&\p{lu}], for example. Or [\u0061 \u0062 ...]. Or.... ?But I'm probably misunderstanding what you are trying to say.? Mark *? Il meglio ? l?inimico del bene ?* -------------- next part -------------- An HTML attachment was scrubbed... URL: From frederic.grosshans at gmail.com Tue Sep 8 07:08:26 2015 From: frederic.grosshans at gmail.com (=?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Grosshans?=) Date: Tue, 8 Sep 2015 14:08:26 +0200 Subject: [RFC] Discussion about chances of some characters to be added in Unicode In-Reply-To: References: <1866954.ovXpklIGud@hp> Message-ID: <55EECFBA.1060308@gmail.com> Le 07/09/2015 20:18, Steven R. Loomis a ?crit : > Hello! > The power symbol was already accepted, see > http://unicode.org/alloc/Pipeline.html > And the proposal for the power symbol(s) is here http://www.unicode.org/L2/L2014/14009r-power-symbol.pdf . Fr?d?ric From frederic.grosshans at gmail.com Tue Sep 8 07:43:12 2015 From: frederic.grosshans at gmail.com (=?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Grosshans?=) Date: Tue, 8 Sep 2015 14:43:12 +0200 Subject: [RFC] Discussion about chances of some characters to be added in Unicode In-Reply-To: <1866954.ovXpklIGud@hp> References: <1866954.ovXpklIGud@hp> Message-ID: <55EED7E0.6060804@gmail.com> Le 07/09/2015 19:49, Vadim A. Misbakh-Soloviov a ?crit : > Hello there! > > First of all, I'm sorry in advance, if my message's tone is not suitable for > that mail list. > Next, I'd like to discuss the chances of some characters to be added in > Unicode at all. > Most of all I interested about: > > 1) Full-height right and left isosceles triangles, positioned in the edges of > the glyph space (so, when concatinated with space symbol on the background of > same color of it's foreground, it looks integrally [ref: triangles_demo > attach, although there is font rendering artefacts anyway, but, I hope, I > clearly decribed the idea]). > ref: symbols on the both edges on the attached "pwl" picture Your description looks more like a glyph specification than a (more semantic) character description. I suspect that ?U+23F4 BLACK MEDIUM LEFT-POINTING TRIANGLE and ?U+23F5 BLACK MEDIUM RIGHT-POINTING TRIANGLE, introduced in Unicode 7.0 as interface symbols (anyone remember which proposal it was ?) are what you are looking for. > > 2) "Forking" characher (not the math one, but VCS one). > ref: in the middle on the attached "pwl" picture. This one seems legit to me, but the ?external link sign? seemed legit to me and was rejected (see http://unicode.org/alloc/nonapprovals.html ). > > 3) "Pause" (media) character (it is ones for "play/pause" and "play" in the > unicode already, but it not for "pause"). There is "cheats" like using two > vertical bars instead, usually it looks very ugly. You are looking for ?U+23F8 DOUBLE VERTICAL BAR (alternate name: pause), introduced in Unicode 7.0 for that specific purpose (I don?t remember the proposal) > > 4) "Power" (like on power buttons on electronic devices) As said by Steven, this one is already in the pipeline, even if not accepted yet > > And, actually, imho, it also be nice to have all of symbols from the picture > in the Unicode. Things like ?? U+1F512 LOCK ? > > P.S. I'd also ask about some more symbols, which is "missed" in everyday life > and substituted with glyphicons on the web (but, you know, it is impossible to > use glyphicons in CLI/console applications ?), That is not a Unicode problem, it is an interface problem, arguably a bug in CLI/console developement. > like: "cart", "exit", "barcode" The shopping cart is currently under consideration (see http://www.unicode.org/L2/L2015/15195r2-emoji-add-tranche6.pdf, as U+1F6D2) > (ideally, including also "qr" and "datamatrix" ones), and more, and more, but > let's initially talk about that ones I talked initially ? > > P.P.S.: and also it would be nice, I think, to have "icons" symbols of major > OS brands (at least, Windows, MacOS, Linux, FreeBSD) to stop them (first two > ones) of using Private set for that. That?s a big No since 1999 : these symbols are logos, and excluded with Unicode. No one wants to deal with the legal nightmares of doing so. Fr?d?ric From wjgo_10009 at btinternet.com Tue Sep 8 09:05:03 2015 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Tue, 8 Sep 2015 15:05:03 +0100 (BST) Subject: Technical or encoding sub mailing list ? In-Reply-To: <55E8A576.3000409@ix.netcom.com> References: <20150903094139.665a7a7059d7ee80bb4d670165c8327d.3fabbe5441.wbe@email03.secureserver.net> <55E8A576.3000409@ix.netcom.com> Message-ID: <23571788.47434.1441721103109.JavaMail.defaultUser@defaultHost> Asmus Freytag wrote as follows: > There is a small set of people who like to hi-jack the list for their personal agendas, even after being told that the audience on the list has no interest. Some compound the issue by letting loose an inordinate number of posts in a short time, or don't know how to write anything short of a novella. I wonder if I may comment please. In the following post http://www.unicode.org/mail-arch/unicode-ml/y2014-m12/0032.html Asmus wrote as follows: quote ... Unicode has matured to the point of being the only game in town. end quote So there is a balance between the ways of regarding posts by an enthusiastic individual who is seeking to make progress with his or her research and who is seeking advice and constructive helpful comments on what he or she is suggesting should be encoded. As if in a research common room and floating ideas to experts in a variety of specialties, such as encoding, linguistics and software programming, seeking opinions, while each participant is sat enjoying a hot beverage, be it tea, coffee, hot chocolate or peppermint tea. > A bit of occasional "water-cooler" style banter, on the other hand, while off-topic and distracting, is also amusing and diverting. It's the social-media part of Unicode and goes back to before "social media" was a term. Yes, indeed. Fine. > I would agree that the former at times feels abusive, but the latter is tradition. Well, that the former feeling is felt is unfortunate. For myself, that is not my intention. I am seeking to make progress with my research. I want to submit a proposal to encode one character into regular Unicode so that it can be used with the base character followed by a sequence of tag characters method that was recently invented for encoding flags: a method that can have application for various purposes, including in-line graphics encoded in a plain text document. Yet discussion of my ideas in this mailing list is not allowed at present and maybe it never will be allowed. This makes it difficult for me to have discussions prior to submitting a proposal document. May I mention that if anyone is interested in viewing my latest research there are four transcripts available at the following place? http://www.users.globalnet.co.uk/~ngo/locsetag.htm William Overington 8 September 2015 -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Tue Sep 8 10:19:03 2015 From: doug at ewellic.org (Doug Ewell) Date: Tue, 08 Sep 2015 08:19:03 -0700 Subject: String Ranges in Unicode Sets Message-ID: <20150908081903.665a7a7059d7ee80bb4d670165c8327d.295ea8ba4b.wbe@email03.secureserver.net> Mark Davis ??? wrote: >> TUS 8.0 Chapter 3 C6: "A process shall not assume that the >> interpretations of two canonical-equivalent character sequences are >> distinct." > > ?A compiler will take source code containing String x="?"; and compile > it to a certain binary. If that same source code is NFD'd, the > compiler will produce a different result. > > Do you really think that such compiler is not compliant to Unicode?? > If so, then we should add some more clarifications around C6. I agree. The word "interpretations" in C6 can't have been intended to include the interpretation of code points qua code points. That would make a great many internal processes impossible. I think of C6 as meaning that spell-checkers, for example, should not treat Jos? (NFC, four code points) and Jose? (NFD, five code points) as separate entries. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From richard.wordingham at ntlworld.com Tue Sep 8 16:41:08 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Tue, 8 Sep 2015 22:41:08 +0100 Subject: String Ranges in Unicode Sets In-Reply-To: <20150908081903.665a7a7059d7ee80bb4d670165c8327d.295ea8ba4b.wbe@email03.secureserver.net> References: <20150908081903.665a7a7059d7ee80bb4d670165c8327d.295ea8ba4b.wbe@email03.secureserver.net> Message-ID: <20150908224108.07969c71@JRWUBU2> On Tue, 08 Sep 2015 08:19:03 -0700 "Doug Ewell" wrote: > Mark Davis ??? wrote: > > >> TUS 8.0 Chapter 3 C6: "A process shall not assume that the > >> interpretations of two canonical-equivalent character sequences are > >> distinct." > > > > ?A compiler will take source code containing String x="?"; and > > compile it to a certain binary. If that same source code is NFD'd, > > the compiler will produce a different result. > > > > Do you really think that such compiler is not compliant to Unicode?? > > If so, then we should add some more clarifications around C6. It's not me who put mens rea into the conformance requirements. If a compiler does no more than check strings for validity, than it may simply naively copy the sequence of scalar values without being non-compliant, so long as the *intent* is not to preserve differences. For example, if a process changes strings to preferred canonically equivalent strings, but treats characters with ccc=9 as though they had ccc=0, it probably is in breach. On the other hand, if it treated characters with ccc=9 as though they had ccc=300 (not a possible value of ccc), it is compliant. I think it is quite possible to have two identical pieces of code of which one is compliant and the other is non-compliant. It all depends on the code's motive, which I can only think refers to the motives of the intelligent entity that caused the code to be as it is. > I agree. The word "interpretations" in C6 can't have been intended to > include the interpretation of code points qua code points. That would > make a great many internal processes impossible. I would make it even more extreme by saying that the intent is that the rule apply to encoded text, as opposed to mere strings of code units. The problem is that some procedures allow a character to represent itself even where that is not consistent because the data will be seen as text. For example, it is my opinion that combining marks and control characters only belong in the representation of Unicode sets when they part of a non-defective string element. > I think of C6 as meaning that spell-checkers, for example, should not > treat Jos? (NFC, four code points) and Jose? (NFD, five code points) > as separate entries. C6 does not prohibit spell-checkers from neglecting to normalise. The authors of the code of a spell-checker could take the view that the database writers should have included all canonically equivalent forms. Practically, that allows a spell-checker to enforce normalisation. There's another, subtle feature for spell checkers. By any reading, C6 does not require a spell-checker to realise that 'find' might be spelt with U+FB01 LATIN SMALL LIGATURE FI. Applying NFKC or NFKD to the Thai word for 'water' would be wrong, for that converts to , which is wrong and looks quite different. Moreover, U+FB01 is not an acceptable alternative to in Turkish. Richard. From richard.wordingham at ntlworld.com Tue Sep 8 17:01:35 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Tue, 8 Sep 2015 23:01:35 +0100 Subject: String Ranges in Unicode Sets In-Reply-To: References: <55E8762A.5010101@unicode.org> <20150907072321.48321560@JRWUBU2> <20150907204606.799fa7c0@JRWUBU2> <55EE940B.2060103@ix.netcom.com> Message-ID: <20150908230135.314377eb@JRWUBU2> On Tue, 8 Sep 2015 13:46:48 +0200 Mark Davis ?? wrote: > On Tue, Sep 8, 2015 at 9:53 AM, Asmus Freytag (t) > wrote: > > What about set operations on sets with string ranges? > ?Again, the range notation is just a formatting issue. Anything you > can do with [{ax}-{bz}?] you can also do with > [{ax}{ay}{az}{bx}{by}{bz}?], and vice versa, since the former is > defined to be equivalent to the latter. These are just string > representations of the same *logical* underlying implementation. > > Can they be expressed (other than working them out and writing down > > the full enumeration of the resulting set)? > I'm not quite sure what you mean. That's like asking, "Can [a-z] be > expressed, ?other than by writing out the full enumeration [a b c d > e ... z]?". Well, yes. You could represent [a-z] in many ways: > [\p{ASCII}&\p{lu}], for example. Or [\u0061 \u0062 ...]. Or.... > ?But I'm probably misunderstanding what you are trying to say.? I think Asmus is asking if there is a more compact representation of the result of a string operation than just listing all the string elements. The answer would then be yes. Just [a-z]~~[e-s] can be written (and represented internally) as [a-dt-z], so [{aa}-{zz}]-[{ee}-{ss}] can be written (and represented internally) as the union of four non-overlapping string ranges [{aa}-{dz} {ea}-{sd} {et}-{sz} {ta}-{tz}]. Fortunately, unions of string ranges of the same length commute, which is not necessarily the case for Unicode sets. (It is possible that [[a][{ab}]] might preferentially match "a" while [[{ab}][a]] preferentially matched "ab".) Richard. From petercon at microsoft.com Thu Sep 10 13:04:33 2015 From: petercon at microsoft.com (Peter Constable) Date: Thu, 10 Sep 2015 18:04:33 +0000 Subject: [somewhat off topic] straw poll Message-ID: I was having an offline discussion with someone regarding certain topics that may show up on this list on occasion, and the question came up of what evidence we might have of sentiment on the list. So, I thought I'd conduct a simple straw poll - respond if you feel inclined. The questions are framed around this hypothetical scenario: Suppose I were to post a message to the list describing some experiment I did, creating a Web page containing (say) some Latin characters - not obscure, just-added-in-Unicode-8 characters, but ones that have been in the standard for some time; that my process for creating the file was to use (say) Notepad and entering HTML numeric character references; and that my findings were that it worked. Q1: Would you find that to be an interesting post that adds makes your participation in the list more useful, or would you find it a noisy distraction that reduces the value you get from participating in the list? Q2: If I were to send messages along that line on a regular basis, would that add value to your participation in the list, or reduce it? Q3: If 50 people (still a small portion of the list membership) were to send messages along that line on a regular basis, would that add value to your participation in the list, or reduce it? Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: From frederic.grosshans at gmail.com Thu Sep 10 13:21:25 2015 From: frederic.grosshans at gmail.com (=?UTF-8?B?RnLDqWTDqXJpYyBHcm9zc2hhbnM=?=) Date: Thu, 10 Sep 2015 18:21:25 +0000 Subject: [somewhat off topic] straw poll In-Reply-To: References: Message-ID: Q1: neutral Q2: annoying Q3: reducing value of the list for me Le jeu. 10 sept. 2015 20:10, Peter Constable a ?crit : > I was having an offline discussion with someone regarding certain topics > that may show up on this list on occasion, and the question came up of what > evidence we might have of sentiment on the list. So, I thought I?d conduct > a simple straw poll ? respond if you feel inclined. > > > > The questions are framed around this hypothetical scenario: Suppose I were > to post a message to the list describing some experiment I did, creating a > Web page containing (say) some Latin characters ? not obscure, > just-added-in-Unicode-8 characters, but ones that have been in the standard > for some time; that my process for creating the file was to use (say) > Notepad and entering HTML numeric character references; and that my > findings were that it worked. > > > > Q1: Would you find that to be an interesting post that adds makes your > participation in the list more useful, or would you find it a noisy > distraction that reduces the value you get from participating in the list? > > > > Q2: If I were to send messages along that line on a regular basis, would > that add value to your participation in the list, or reduce it? > > > > Q3: If 50 people (still a small portion of the list membership) were to > send messages along that line on a regular basis, would that add value to > your participation in the list, or reduce it? > > > > > > > > Peter > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmus-inc at ix.netcom.com Thu Sep 10 13:23:25 2015 From: asmus-inc at ix.netcom.com (Asmus Freytag (t)) Date: Thu, 10 Sep 2015 11:23:25 -0700 Subject: [somewhat off topic] straw poll In-Reply-To: References: Message-ID: <55F1CA9D.5090200@ix.netcom.com> An HTML attachment was scrubbed... URL: From Shawn.Steele at microsoft.com Thu Sep 10 13:29:45 2015 From: Shawn.Steele at microsoft.com (Shawn Steele) Date: Thu, 10 Sep 2015 18:29:45 +0000 Subject: [somewhat off topic] straw poll In-Reply-To: References: Message-ID: Q1 I ignore threads that aren?t of interest (outlook even has a handy ?ignore thread? button - though lists like this tend to break it) Q2 If they get too annoying and don?t have useful content, then I make a rule to send that person?s mail to the trashcan. I include their name in the body to catch replies as well. Q3 If there were too many of those folks, then I?d have more rules. From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Fr?d?ric Grosshans Sent: Thursday, September 10, 2015 11:21 AM To: Peter Constable ; Unicode Mailing List Subject: Re: [somewhat off topic] straw poll Q1: neutral Q2: annoying Q3: reducing value of the list for me Le jeu. 10 sept. 2015 20:10, Peter Constable > a ?crit : I was having an offline discussion with someone regarding certain topics that may show up on this list on occasion, and the question came up of what evidence we might have of sentiment on the list. So, I thought I?d conduct a simple straw poll ? respond if you feel inclined. The questions are framed around this hypothetical scenario: Suppose I were to post a message to the list describing some experiment I did, creating a Web page containing (say) some Latin characters ? not obscure, just-added-in-Unicode-8 characters, but ones that have been in the standard for some time; that my process for creating the file was to use (say) Notepad and entering HTML numeric character references; and that my findings were that it worked. Q1: Would you find that to be an interesting post that adds makes your participation in the list more useful, or would you find it a noisy distraction that reduces the value you get from participating in the list? Q2: If I were to send messages along that line on a regular basis, would that add value to your participation in the list, or reduce it? Q3: If 50 people (still a small portion of the list membership) were to send messages along that line on a regular basis, would that add value to your participation in the list, or reduce it? Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: From petercon at microsoft.com Thu Sep 10 13:33:57 2015 From: petercon at microsoft.com (Peter Constable) Date: Thu, 10 Sep 2015 18:33:57 +0000 Subject: [somewhat off topic] straw poll In-Reply-To: <55F1CA9D.5090200@ix.netcom.com> References: <55F1CA9D.5090200@ix.netcom.com> Message-ID: Asmus, this came out of a friendly conversation meant to understand what kinds of topics do or don?t seem interesting to people, and how people might react. There was real interest in getting some indication of list sentiment. I certainly don?t mean to cause offense, or get too off topic. But I won?t push this if it?s felt to be that ? I am certainly willing to follow the sentiments of list members on this and any whether any other topics are appropriate. Peter From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Asmus Freytag (t) Sent: Thursday, September 10, 2015 11:23 AM To: unicode at unicode.org Subject: Re: [somewhat off topic] straw poll On 9/10/2015 11:04 AM, Peter Constable wrote: I was having an offline discussion with someone regarding certain topics that may show up on this list on occasion, and the question came up of what evidence we might have of sentiment on the list. So, I thought I?d conduct a simple straw poll ? respond if you feel inclined. This whole exercise strikes me as off topic. :) A./ The questions are framed around this hypothetical scenario: Suppose I were to post a message to the list describing some experiment I did, creating a Web page containing (say) some Latin characters ? not obscure, just-added-in-Unicode-8 characters, but ones that have been in the standard for some time; that my process for creating the file was to use (say) Notepad and entering HTML numeric character references; and that my findings were that it worked. Q1: Would you find that to be an interesting post that adds makes your participation in the list more useful, or would you find it a noisy distraction that reduces the value you get from participating in the list? Q2: If I were to send messages along that line on a regular basis, would that add value to your participation in the list, or reduce it? Q3: If 50 people (still a small portion of the list membership) were to send messages along that line on a regular basis, would that add value to your participation in the list, or reduce it? Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: From KalvesmakiJ at doaks.org Thu Sep 10 13:44:47 2015 From: KalvesmakiJ at doaks.org (Kalvesmaki, Joel) Date: Thu, 10 Sep 2015 18:44:47 +0000 Subject: [somewhat off topic] straw poll In-Reply-To: References: <55F1CA9D.5090200@ix.netcom.com> Message-ID: Dear Peter, This is the sort of inquiry that would be more efficiently conducted as a poll independent of the listserv, say with Google forms, to get a broader, more representative response from list members, many of whom wish neither to post nor to read individual responses on the listserv. jk From: Peter Constable > Date: Thursday, September 10, 2015 at 2:33 PM To: "Asmus Freytag (t)" >, "unicode at unicode.org" > Subject: RE: [somewhat off topic] straw poll this came out of a friendly conversation meant to understand what kinds of topics do or don?t seem interesting to people, and how people might react. There was real interest in getting some indication of list sentiment. I certainly don?t mean to cause offense, or get too off topic. But I won?t push this if it?s felt to be that ? I am certainly willing to follow the sentiments of list members on this and any whether any other topics are appropriate. Peter From richard.wordingham at ntlworld.com Thu Sep 10 14:49:06 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Thu, 10 Sep 2015 20:49:06 +0100 Subject: [somewhat off topic] straw poll In-Reply-To: References: Message-ID: <20150910204906.067d1fe0@JRWUBU2> On Thu, 10 Sep 2015 18:04:33 +0000 Peter Constable wrote: > The questions are framed around this hypothetical scenario: Suppose I > were to post a message to the list describing some experiment I did, > creating a Web page containing (say) some Latin characters - not > obscure, just-added-in-Unicode-8 characters, but ones that have been > in the standard for some time; that my process for creating the file > was to use (say) Notepad and entering HTML numeric character > references; and that my findings were that it worked. > > Q1: Would you find that to be an interesting post that adds makes > your participation in the list more useful, or would you find it a > noisy distraction that reduces the value you get from participating > in the list? Q1. It would tell me nothing I didn't know. That is because the usual expectation is now that Unicode works, so failures are of greater interest. Of course, news that significant hold-outs against Unicode had seen the light would also be useful. On the other hand, some people might respond with useful alternative tricks for arbitrary text entry - keyboards that will take hex input as an exceptional case (e.g. m17n Unicode for BMP on many Linux systems), alt/x in Word, a reminder of the existence of MSKLC and Tavultesoft keyman for making one's own keyboards and so on, and it could become a useful thread for some lurkers. > Q2: If I were to send messages along that line on a regular basis, > would that add value to your participation in the list, or reduce it? If nothing useful happened, it would probably reduce, but a hint of the week thread would be tolerable and excused by the thought that it may be helping some people. And then I would probably learn something useful, and my attitude would become more favourable. > Q3: If 50 people (still a small portion of the list membership) were > to send messages along that line on a regular basis, would that add > value to your participation in the list, or reduce it? They'd probably soon run out of new, useful or interesting things to say. So, what has become of Sarasvati? She hasn't scolded list participants for a long time. Richard. From charupdate at orange.fr Fri Sep 11 02:10:33 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Fri, 11 Sep 2015 09:10:33 +0200 (CEST) Subject: [somewhat off topic] straw poll Message-ID: <858151477.2818.1441955433203.JavaMail.www@wwinf1f13> On 10 Sep 2015 at 20:30, Asmus Freytag (t) wrote: > On 9/10/2015 11:04 AM, Peter Constable wrote: >> I was having an offline discussion with someone regarding certain topics that may show up on this list on occasion, and the question came up of what evidence we might have of sentiment on the list. So, I thought I?d conduct a simple straw poll ? respond if you feel inclined. > This whole exercise strikes me as off topic.? :) > > A./ > >> The questions are framed around this hypothetical scenario: Suppose I were to post a message to the list describing some experiment I did, creating a Web page containing (say) some Latin characters ? not obscure, just-added-in-Unicode-8 characters, but ones that have been in the standard for some time; that my process for creating the file was to use (say) Notepad and entering HTML numeric character references; and that my findings were that it worked. >> Q1: Would you find that to be an interesting post that adds makes your participation in the list more useful, or would you find it a noisy distraction that reduces the value you get from participating in the list? >> Q2: If I were to send messages along that line on a regular basis, would that add value to your participation in the list, or reduce it? >> Q3: If 50 people (still a small portion of the list membership) were to send messages along that line on a regular basis, would that add value to your participation in the list, or reduce it? I?m not about to fill up the frightening number of metadiscussions that arouse since I?ve been mailing to the List, but after having posted all my main concerns and thanked for the answers, I see myself faced with the need for some kind of debrief, since an influential subscriber started using the strawmen technique to gather testimonies against another subscriber. I can?t find another explanation for puffing up the issue by asking for statements about *fifty* persons sharing basic experiences on Unicode use, while AFAIK there have never been more than two, William?Overington and myself, of whose *only one* is left. Talking about a multitude of people is a totally unrealistic scenario, Richard?Wordingham outlined, because the stuff then inevitably runs out very soon: http://www.unicode.org/mail-arch/unicode-ml/y2015-m09/0079.html Making any decisions based upon opinions gathered by this technique, results in using an unfair methodology. I?ve been stating that only *one* person is left, and I?m happy to add my response, which doesn?t fit any of the three artificially built-up questions, but well the one that is tacitly underlying to each one of them: I?ve been glad to learn how William?Overington is using HTML character hex codes. IIRW, it?s even in the wake that I?ve added the &#x sequence in Shift on Numpad?0 when KanaLock is on, and the semicolon on + (while hex digits, and U+ and 0x, are on my numpad since a longer time). That?s what I?ll use when creating my next web page, as professionals are said to use text editors to achieve this (and I already did for charupdate.info; except that now it?ll be Notepad++ instead of Notepad that Peter?Constable cites again). To conclude, I wonder how Microsoft?which should ship a whole bunch of ultimately completed Unicode keyboard layouts with Windows since Unicode is thriving?I?wonder how Microsoft justify their cynism about seeing people discovering each one for himself what MSFT should have hurried to serve on a tray to all users, provided that Windows is the productivity worktool it claims to be. Well, basically this List is not the right spot to place that criticism. This is why I?ve to thank William and Peter for having brought up the occasion, each one in his way. I?confess that I prefer William?s. By far. Best wishes, Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: From wjgo_10009 at btinternet.com Fri Sep 11 08:06:49 2015 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Fri, 11 Sep 2015 14:06:49 +0100 (BST) Subject: [somewhat off topic] straw poll In-Reply-To: <20150910204906.067d1fe0@JRWUBU2> References: <20150910204906.067d1fe0@JRWUBU2> Message-ID: <26872019.36726.1441976809728.JavaMail.defaultUser@defaultHost> Richard Wordingham wrote: > So, what has become of ... I hope that that does not start again. It is unfair dealing. Please look at the way that I was treated by a person or persons unknown. http://www.unicode.org/mail-arch/unicode-ml/y2015-m06/0208.html I do not understand why the request for a moratorium was not made either to Unicode Inc. or to one or more of the people named on the following web page. http://www.unicode.org/consortium/directors.html I do not know why the moratorium was imposed by a person or persons unknown. I have been put back onto moderated post status as a result of the moratorium being imposed and I am still on it. Although it is called a moratorium there was no indication of whether or how the moratorium would be removed. I have simply had to try to make progress in other ways than by posting to the Unicode list. I am hoping to send a document to the Unicode Technical Committee about encoding one character into Unicode so as to enable my invention to become implemented. The moratorium prevents discussion in the mailing list prior to submission. I am hoping that the moratorium will be removed, yet it is something that I cannot apply for as I do not know where to apply! William Overington 11 September 2015 ! From wjgo_10009 at btinternet.com Fri Sep 11 09:13:15 2015 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Fri, 11 Sep 2015 15:13:15 +0100 (BST) Subject: [somewhat off topic] straw poll In-Reply-To: <858151477.2818.1441955433203.JavaMail.www@wwinf1f13> References: <858151477.2818.1441955433203.JavaMail.www@wwinf1f13> Message-ID: <31826964.42623.1441980795598.JavaMail.defaultUser@defaultHost> I am grateful to Marcel for his comments. I received some email responses to my post entitled A song in Esperanto http://www.unicode.org/mail-arch/unicode-ml/y2015-m09/0056.html Only one of the email responses was in any way whatsoever critical of me posting that post. I responded and there was a continuing exchange of emails for a short while. I wrote two emails, the other person wrote three emails. There is the issue of netiquette so I feel that there is little more that I can add. However, I am entirely happy and indeed would be pleased for the other person who participated in the email exchange to publish to this mailing list a full, unedited transcript of the five emails if that person is willing to do so. I feel that in the discussion in this present thread it is important to remember the scope that the rules for the mailing list state. William Overington 11 September 2015 -------------- next part -------------- An HTML attachment was scrubbed... URL: From eik at iki.fi Fri Sep 11 10:14:04 2015 From: eik at iki.fi (Erkki I Kolehmainen) Date: Fri, 11 Sep 2015 18:14:04 +0300 Subject: VS: [somewhat off topic] straw poll In-Reply-To: <26872019.36726.1441976809728.JavaMail.defaultUser@defaultHost> References: <20150910204906.067d1fe0@JRWUBU2> <26872019.36726.1441976809728.JavaMail.defaultUser@defaultHost> Message-ID: <000301d0eca4$7fd91830$7f8b4890$@fi> I, for one, don't see any reason to lift the moratorium on that particular worn-out topic. Sincerely, Erkki I. Kolehmainen -----Alkuper?inen viesti----- L?hett?j?: Unicode [mailto:unicode-bounces at unicode.org] Puolesta William_J_G Overington L?hetetty: 11. syyskuuta 2015 16:07 Vastaanottaja: asmus-inc at ix.netcom.com; richard.wordingham at ntlworld.com; Marcel Schneider; unicode at unicode.org; Shawn.Steele at microsoft.com; petercon at microsoft.com Aihe: Re: [somewhat off topic] straw poll Richard Wordingham wrote: > So, what has become of ... I hope that that does not start again. It is unfair dealing. Please look at the way that I was treated by a person or persons unknown. http://www.unicode.org/mail-arch/unicode-ml/y2015-m06/0208.html I do not understand why the request for a moratorium was not made either to Unicode Inc. or to one or more of the people named on the following web page. http://www.unicode.org/consortium/directors.html I do not know why the moratorium was imposed by a person or persons unknown. I have been put back onto moderated post status as a result of the moratorium being imposed and I am still on it. Although it is called a moratorium there was no indication of whether or how the moratorium would be removed. I have simply had to try to make progress in other ways than by posting to the Unicode list. I am hoping to send a document to the Unicode Technical Committee about encoding one character into Unicode so as to enable my invention to become implemented. The moratorium prevents discussion in the mailing list prior to submission. I am hoping that the moratorium will be removed, yet it is something that I cannot apply for as I do not know where to apply! William Overington 11 September 2015 ! From webalorixa at gmail.com Fri Sep 11 10:45:01 2015 From: webalorixa at gmail.com (Luis de la Orden) Date: Fri, 11 Sep 2015 16:45:01 +0100 Subject: [somewhat off topic] straw poll In-Reply-To: <31826964.42623.1441980795598.JavaMail.defaultUser@defaultHost> References: <858151477.2818.1441955433203.JavaMail.www@wwinf1f13> <31826964.42623.1441980795598.JavaMail.defaultUser@defaultHost> Message-ID: Q1: It doesn't matter. These problems are inherent of the format the discussions are constrained to happen: mailing lists. Mailing lists are like the old postal service in the countryside, the mailmen would lazily get to the house at the crossroads and dump everyone's mail there for the whole community to collect. Q2: It doesn't matter. Mailing lists are not organised around topics, it is your free and conscious choice of topics that matters. I have no interest in anything else than African Languages written in Latin characters, but a mailing list forces us all to receive everyone else's emails. I neither want to receive everyone's emails nor send emails to everyone. I want a little corner where perhaps every other month someone will come by and talk about Yoruba, Igbo, even off-topic religious concept of Yoruba divination etc.. and that I will not hear about anything else until I *decide* to browse around. The biggest confusion going on here is that the model treats this list's topic as "Unicode" whilst the reality is that Unicode is a universe of diverse topics. Another thing is that Unicode makes 10 - 15% of my main professional interests as a User Experience Architect. For those amongst us that work 70% - 100% of the time with Unicode technology, email selection and deletion makes sense but for others this is another stream of messages that needs to be managed side by side with what they really work with. Q3: It doesn't matter. Since now we all know I am just interested in African languages, anyone can only assume that the value I am getting is as much as these African Languages topics appear subtracted by as many times all the other topics do multiplied by the amount of times I have to delete something I am not interested. And now testing an old functionality of mailing lists: UNSUBSCRIBE On 11 September 2015 at 15:13, William_J_G Overington < wjgo_10009 at btinternet.com> wrote: > I am grateful to Marcel for his comments. > > I received some email responses to my post entitled > > A song in Esperanto > > http://www.unicode.org/mail-arch/unicode-ml/y2015-m09/0056.html > > Only one of the email responses was in any way whatsoever critical of me > posting that post. > > I responded and there was a continuing exchange of emails for a short > while. > > I wrote two emails, the other person wrote three emails. > > There is the issue of netiquette so I feel that there is little more that > I can add. > > However, I am entirely happy and indeed would be pleased for the other > person who participated in the email exchange to publish to this mailing > list a full, unedited transcript of the five emails if that person is > willing to do so. > > I feel that in the discussion in this present thread it is important to > remember the scope that the rules for the mailing list state. > > William Overington > > 11 September 2015 > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wjgo_10009 at btinternet.com Fri Sep 11 10:33:31 2015 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Fri, 11 Sep 2015 16:33:31 +0100 (BST) Subject: VS: [somewhat off topic] straw poll In-Reply-To: <000301d0eca4$7fd91830$7f8b4890$@fi> References: <20150910204906.067d1fe0@JRWUBU2> <26872019.36726.1441976809728.JavaMail.defaultUser@defaultHost> <000301d0eca4$7fd91830$7f8b4890$@fi> Message-ID: <5920554.49990.1441985611875.JavaMail.defaultUser@defaultHost> Erkki I. Kolehmainen wrote: > I, for one, don't see any reason to lift the moratorium on that particular worn-out topic. One reason is that there is the new idea of using the base character followed by a sequence of tag characters technique to represent each localizable sentence. Thus only one new character would need to become encoded into regular Unicode. Yet there are other reasons too. One is that the moratorium was not stated as being imposed either by Unicode Inc. or by any of the people named on the following web page. http://www.unicode.org/consortium/directors.html If Unicode Inc. chooses to impose a moratorium on discussing this development in information technology then Unicode Inc. should say so officially and post a policy document and not have this unfair imposition of a moratorium by a person or persons unknown. Also, it is not a worn-out topic. It is a wonderful possibility for the future. William Overington 11 September 2015 From richard.wordingham at ntlworld.com Fri Sep 11 12:04:04 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Fri, 11 Sep 2015 18:04:04 +0100 Subject: [somewhat off topic] straw poll In-Reply-To: <26872019.36726.1441976809728.JavaMail.defaultUser@defaultHost> References: <20150910204906.067d1fe0@JRWUBU2> <26872019.36726.1441976809728.JavaMail.defaultUser@defaultHost> Message-ID: <20150911180404.378279ed@JRWUBU2> On Fri, 11 Sep 2015 14:06:49 +0100 (BST) William_J_G Overington wrote: > Richard Wordingham wrote: > > So, what has become of ... > Please look at the way that I was treated by a person or persons > unknown. > http://www.unicode.org/mail-arch/unicode-ml/y2015-m06/0208.html I'd forgotten that posting. As to who Sarasvati is, try Wikipedia: https://en.wikipedia.org/wiki/Saraswati . Recording her role on the Unicode list would probably count as 'original research'. Sarasvati appears to be in charge of the email lists, though I will admit I'm not sure where to find a statement of this. I am quite sure that Sarasvati enjoys the confidence of the Unicode Consortium. Richard. From frederic.grosshans at gmail.com Fri Sep 11 12:18:34 2015 From: frederic.grosshans at gmail.com (=?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Grosshans?=) Date: Fri, 11 Sep 2015 19:18:34 +0200 Subject: [somewhat off topic] straw poll In-Reply-To: <20150911180404.378279ed@JRWUBU2> References: <20150910204906.067d1fe0@JRWUBU2> <26872019.36726.1441976809728.JavaMail.defaultUser@defaultHost> <20150911180404.378279ed@JRWUBU2> Message-ID: <55F30CEA.1050806@gmail.com> Le 11/09/2015 19:04, Richard Wordingham a ?crit : > As to who Sarasvati is, try Wikipedia: > https://en.wikipedia.org/wiki/Saraswati . Recording her role on the > Unicode list would probably count as 'original research' Thanks for this link ! I?m ashamed to confess I ignored her identity, and I thought she was an employee of some IT company, managing the mailing list. Fr?d?ric From doug at ewellic.org Fri Sep 11 12:25:44 2015 From: doug at ewellic.org (Doug Ewell) Date: Fri, 11 Sep 2015 10:25:44 -0700 Subject: VS: [somewhat off topic] straw poll Message-ID: <20150911102544.665a7a7059d7ee80bb4d670165c8327d.ddb725c04d.wbe@email03.secureserver.net> William_J_G Overington wrote: > If Unicode Inc. chooses to impose a moratorium on discussing this > development in information technology then Unicode Inc. should say so > officially and post a policy document and not have this unfair > imposition of a moratorium by a person or persons unknown. Finally, something on which William and I can agree. I absolutely agree that UTC -- the technical committee, not the corporation -- should issue a formal statement expressing its position as to: 1. Generally, whether novel and untested concepts, particularly those for which a sizable body of popular support has not been established, are viewed by UTC as suitable and appropriate candidates for encoding in the Unicode Standard, on the basis of their perceived future usefulness. (I believe this statement has been made already; if so, a reference that can be easily cited would serve the purpose.) 2. Specifically, whether the particular concept that William proposes, to encode entities that are not characters into the Unicode Standard on the basis of their perceived future usefulness, is viewed by UTC as being suitable for and appropriate to the standard. Whichever position is taken by this statement, pro or con, this list should honor it. > Also, it is not a worn-out topic. It is a wonderful possibility for > the future. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From mark at macchiato.com Fri Sep 11 12:35:58 2015 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Fri, 11 Sep 2015 19:35:58 +0200 Subject: VS: [somewhat off topic] straw poll In-Reply-To: <20150911102544.665a7a7059d7ee80bb4d670165c8327d.ddb725c04d.wbe@email03.secureserver.net> References: <20150911102544.665a7a7059d7ee80bb4d670165c8327d.ddb725c04d.wbe@email03.secureserver.net> Message-ID: I suggest that you create a proposal for the UTC so that it can go on record; I suspect it will get a favorable reception. Mark *? Il meglio ? l?inimico del bene ?* On Fri, Sep 11, 2015 at 7:25 PM, Doug Ewell wrote: > William_J_G Overington > wrote: > > > If Unicode Inc. chooses to impose a moratorium on discussing this > > development in information technology then Unicode Inc. should say so > > officially and post a policy document and not have this unfair > > imposition of a moratorium by a person or persons unknown. > > Finally, something on which William and I can agree. > > I absolutely agree that UTC -- the technical committee, not the > corporation -- should issue a formal statement expressing its position > as to: > > 1. Generally, whether novel and untested concepts, particularly those > for which a sizable body of popular support has not been established, > are viewed by UTC as suitable and appropriate candidates for encoding in > the Unicode Standard, on the basis of their perceived future usefulness. > (I believe this statement has been made already; if so, a reference that > can be easily cited would serve the purpose.) > > 2. Specifically, whether the particular concept that William proposes, > to encode entities that are not characters into the Unicode Standard on > the basis of their perceived future usefulness, is viewed by UTC as > being suitable for and appropriate to the standard. > > Whichever position is taken by this statement, pro or con, this list > should honor it. > > > Also, it is not a worn-out topic. It is a wonderful possibility for > > the future. > > -- > Doug Ewell | http://ewellic.org | Thornton, CO ???? > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Fri Sep 11 12:37:42 2015 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Fri, 11 Sep 2015 19:37:42 +0200 Subject: VS: [somewhat off topic] straw poll In-Reply-To: References: <20150911102544.665a7a7059d7ee80bb4d670165c8327d.ddb725c04d.wbe@email03.secureserver.net> Message-ID: BTW, the only way I see anything from Overington is when a message is quoted by someone else, since I long ago filtered those out of my email inbox. Mark *? Il meglio ? l?inimico del bene ?* On Fri, Sep 11, 2015 at 7:35 PM, Mark Davis ?? wrote: > I suggest that you create a proposal for the UTC so that it can go on > record; I suspect it will get a favorable reception. > > > Mark > > *? Il meglio ? l?inimico del bene ?* > > On Fri, Sep 11, 2015 at 7:25 PM, Doug Ewell wrote: > >> William_J_G Overington >> wrote: >> >> > If Unicode Inc. chooses to impose a moratorium on discussing this >> > development in information technology then Unicode Inc. should say so >> > officially and post a policy document and not have this unfair >> > imposition of a moratorium by a person or persons unknown. >> >> Finally, something on which William and I can agree. >> >> I absolutely agree that UTC -- the technical committee, not the >> corporation -- should issue a formal statement expressing its position >> as to: >> >> 1. Generally, whether novel and untested concepts, particularly those >> for which a sizable body of popular support has not been established, >> are viewed by UTC as suitable and appropriate candidates for encoding in >> the Unicode Standard, on the basis of their perceived future usefulness. >> (I believe this statement has been made already; if so, a reference that >> can be easily cited would serve the purpose.) >> >> 2. Specifically, whether the particular concept that William proposes, >> to encode entities that are not characters into the Unicode Standard on >> the basis of their perceived future usefulness, is viewed by UTC as >> being suitable for and appropriate to the standard. >> >> Whichever position is taken by this statement, pro or con, this list >> should honor it. >> >> > Also, it is not a worn-out topic. It is a wonderful possibility for >> > the future. >> >> -- >> Doug Ewell | http://ewellic.org | Thornton, CO ???? >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From root at unicode.org Fri Sep 11 12:51:37 2015 From: root at unicode.org (Sarasvati) Date: Fri, 11 Sep 2015 12:51:37 -0500 Subject: [somewhat off topic] straw poll Message-ID: <201509111751.t8BHpbrs029759@sarasvati.unicode.org> Greetings to all: Mr Wordingham wondered, > So, what has become of Sarasvati? > She hasn't scolded list participants for a long time. Most list participants continue to behave in a civil manner that doesn't require much scolding. Although sometimes individuals may be escorted to the woodshed where I store my lart. Let me take this opportunity to remind everyone to please remain tolerably on-topic, an admittedly wide range. As people stray further into realms of meta-discussion, other subscribers become increasingly annoyed. Mr Overington wondered, > there was no indication of whether or how > the moratorium would be removed. Once a moratorium has been declared here, it will not be lifted. The topic to which Mr Overington refers will never be suitable for discussion here. Your ever-watchful-even-when-silent, -- Sarasvati From rick at unicode.org Fri Sep 11 13:07:47 2015 From: rick at unicode.org (Rick McGowan) Date: Fri, 11 Sep 2015 11:07:47 -0700 Subject: VS: [somewhat off topic] straw poll In-Reply-To: <20150911102544.665a7a7059d7ee80bb4d670165c8327d.ddb725c04d.wbe@email03.secureserver.net> References: <20150911102544.665a7a7059d7ee80bb4d670165c8327d.ddb725c04d.wbe@email03.secureserver.net> Message-ID: <55F31873.9030009@unicode.org> Doug, et al -- The primordial statement you're looking for is in TUS, Chapter 1 and has been there forever. See: http://www.unicode.org/versions/Unicode8.0.0/ch01.pdf In section 1.1, page 3: *Note, however, that the Unicode Standard does not encode idiosyncratic, personal, novel, or private-use characters, nor does it encode logos or graphics.* I'm not sure UTC has ever made any specific pronouncement on the topic, but they do sometimes add things to the notice of non-approvals, which can generally be taken as a precedent. http://unicode.org/alloc/nonapprovals.html If there is any such statement from the UTC, Ken Whsitler would probably be the one who could put his hand upon it most quickly. :-) R. On 9/11/2015 10:25 AM, Doug Ewell wrote: > I absolutely agree that UTC -- the technical committee, not the > corporation -- should issue a formal statement expressing its position > as to: > > 1. Generally, whether novel and untested concepts, particularly those > for which a sizable body of popular support has not been established, > are viewed by UTC as suitable and appropriate candidates for encoding in > the Unicode Standard, on the basis of their perceived future usefulness. > (I believe this statement has been made already; if so, a reference that > can be easily cited would serve the purpose.) > > 2. Specifically, whether the particular concept that William proposes, > to encode entities that are not characters into the Unicode Standard on > the basis of their perceived future usefulness, is viewed by UTC as > being suitable for and appropriate to the standard. -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Fri Sep 11 13:11:16 2015 From: doug at ewellic.org (Doug Ewell) Date: Fri, 11 Sep 2015 11:11:16 -0700 Subject: VS: [somewhat off topic] straw poll Message-ID: <20150911111116.665a7a7059d7ee80bb4d670165c8327d.cce00ab0d1.wbe@email03.secureserver.net> Mark Davis ?? wrote: > I suggest that you create a proposal for the UTC so that it can go on > record; I suspect it will get a favorable reception. I assume this was not meant for me personally. I have no authority to speak for UTC. The closest I ever got to that was when I got UTN #14 published. I'm serious about this (unlike the beer color modifiers). This statement needs to come officially and formally from UTC, as William suggested, not from randoms like me. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From petercon at microsoft.com Fri Sep 11 13:26:06 2015 From: petercon at microsoft.com (Peter Constable) Date: Fri, 11 Sep 2015 18:26:06 +0000 Subject: VS: [somewhat off topic] straw poll In-Reply-To: <20150911111116.665a7a7059d7ee80bb4d670165c8327d.cce00ab0d1.wbe@email03.secureserver.net> References: <20150911111116.665a7a7059d7ee80bb4d670165c8327d.cce00ab0d1.wbe@email03.secureserver.net> Message-ID: UTC can act on documents submitted to it, or to input submitted to it via the contact form (http://www.unicode.org/reporting.html), but will not act in response solely to topics discussed in this list. -----Original Message----- From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Doug Ewell Sent: Friday, September 11, 2015 11:11 AM To: Mark Davis ?? Cc: Unicode Mailing List Subject: RE: VS: [somewhat off topic] straw poll Mark Davis ?? wrote: > I suggest that you create a proposal for the UTC so that it can go on > record; I suspect it will get a favorable reception. I assume this was not meant for me personally. I have no authority to speak for UTC. The closest I ever got to that was when I got UTN #14 published. I'm serious about this (unlike the beer color modifiers). This statement needs to come officially and formally from UTC, as William suggested, not from randoms like me. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From doug at ewellic.org Fri Sep 11 13:34:37 2015 From: doug at ewellic.org (Doug Ewell) Date: Fri, 11 Sep 2015 11:34:37 -0700 Subject: VS: [somewhat off topic] straw poll Message-ID: <20150911113437.665a7a7059d7ee80bb4d670165c8327d.d978b6f58c.wbe@email03.secureserver.net> Rick McGowan wrote: > In section 1.1, page 3: > > *Note, however, that the Unicode Standard does not encode > idiosyncratic, personal, novel, or private-use characters, nor does it > encode logos or graphics.* Is there a statement anywhere about entities that aren't characters in any sense, other than having an arbitrary glyph assigned to them in a font somewhere? What about encoding things on speculation of future use, without a clear indication of imminent adoption -- the criterion applied to the euro sign, and more recently to emoji? > I'm not sure UTC has ever made any specific pronouncement on the > topic, but they do sometimes add things to the notice of non-approvals, which > can generally be taken as a precedent. Unfortunately for those hoping for a definitive statement, even non-approvals are occasionally overturned; U+1E9E?LATIN CAPITAL LETTER SHARP S leaps to mind. Evidently nothing short of a specific pronouncement on this specific topic will suffice. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From petercon at microsoft.com Fri Sep 11 13:36:58 2015 From: petercon at microsoft.com (Peter Constable) Date: Fri, 11 Sep 2015 18:36:58 +0000 Subject: [somewhat off topic] straw poll In-Reply-To: <201509111751.t8BHpbrs029759@sarasvati.unicode.org> References: <201509111751.t8BHpbrs029759@sarasvati.unicode.org> Message-ID: I did not intend to create a disturbance. Nor did I intend to do anything that might possibly be perceived as seeking action from the list administrator. (I mention that since Sarasvati was invoked.) And I certainly was not intending in any way to bring up moratoria that may have been declared on past topics or to suggest moratoria on new topics. (I mention that since somehow a previously-declared moratorium was raised in a reply to my original post.) I was merely seeking an indication of sentiment on the list regarding certain topics. This arose from an off-list discussion with one list member who has on occasion posted on certain topics and who indicated interest in seeing an indication of sentiment from the list. But it seem like my approach may be stirring up trouble and hence was not well-conceived. Hence, I apologize to the list and to any individuals I may have offended by this. Peter From richard.wordingham at ntlworld.com Fri Sep 11 14:46:15 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Fri, 11 Sep 2015 20:46:15 +0100 Subject: [somewhat off topic] straw poll In-Reply-To: References: <201509111751.t8BHpbrs029759@sarasvati.unicode.org> Message-ID: <20150911204615.39bbe997@JRWUBU2> On Fri, 11 Sep 2015 18:36:58 +0000 Peter Constable wrote: > I did not intend to create a disturbance. Nor did I intend to do > anything that might possibly be perceived as seeking action from the > list administrator. (I mention that since Sarasvati was invoked.) > But it seem like my approach may be stirring up trouble and hence was > not well-conceived. That's the trouble with staying civil! We have to guess when others are angry, and guess wrong. > Hence, I apologize to the list and to any individuals I may have > offended by this. Accepted, but with the observation that you are blameless. Richard. From daniel.buenzli at erratique.ch Fri Sep 11 18:14:10 2015 From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=) Date: Sat, 12 Sep 2015 00:14:10 +0100 Subject: VS: [somewhat off topic] straw poll In-Reply-To: References: <20150911102544.665a7a7059d7ee80bb4d670165c8327d.ddb725c04d.wbe@email03.secureserver.net> Message-ID: Le vendredi, 11 septembre 2015 ? 18:37, Mark Davis ?? a ?crit : > BTW, the only way I see anything from Overington is when a message is quoted by someone else, since I long ago filtered those out of my email inbox. When I read this message [1] (which I disagree with but that's another issue) I thought you were a moderator on this list. If that is the case then I don't think you should base your moderation of having your own personal filter over the mailing list. If you are not the actual moderator for the list then forget about this message. Whoever the moderator is on this list, I think (s)he doing a pretty bad job at it. Best, Daniel [1] http://www.unicode.org/mail-arch/unicode-ml/y2015-m06/0249.html From textexin at xencraft.com Fri Sep 11 18:26:40 2015 From: textexin at xencraft.com (Tex Texin) Date: Fri, 11 Sep 2015 16:26:40 -0700 Subject: the wheels on the bus Message-ID: <006001d0ece9$504468a0$f0cd39e0$@xencraft.com> Why do so many of the threads on this list seem best described as wheels coming off the bus? (Where is the emoji for that?) It is all too common for a thread to start, its appropriateness questioned, and then meta, policy and legalistic analysis ensue to no real end. I understand we often enter gray areas for what is appropriate for Unicode to include and to what the interest of a diverse list going from newbies to experts and longstanding members, innovative and pragmatic folks, so we get discussion at all levels and we want to be extremely tolerant. (OK, we means me. Not sure what you all think, but this is how I interpret the policies here and the comments being made). Since we don?t want to ban people but we want to improve the quality of the discussions, perhaps we can do the following. Create another list for meta, policy, and topics that are not directly encoding related. If a thread starts here, and a number of voices indicate it is off topic or if the mighty Sarasvati deems so, the discussion gets moved to the "meta list" (by Satasvati or a UTC delegate). There the idea can evolve, be debated, or die on the vine. At some point if it becomes a proposal to the UTC, or is refined enough that Sarasvati or some delegate ordained by Unicode can bring the idea back to this list. But it should only come back if authorized. Violating that policy is grounds for banishment. By "move" I do not mean deleted from this list. We just need to stipulate further discussion is on the "meta" list. An approach like this gives ideas that are not of obvious interest or relevance to this list a place to go. And yes the decision as to which subjects should be moved over is still gray and Solomon-like, but since the discussion has a home those who want to pursue it can do so, so the practice isn?t harmful. And it should reduce the urge for advocates to keep bringing the unwanted subjects up on this list. The other benefit is I, and I am sure many others wanted to echo Asmus and others comments about the poll or other topics being off topic. I didn?t respond as me too messages make the problem worse. If an off topic thread is moved over, then even the "so glad it moved" messages can go there. Or messages of a new type "Please bring this off topic thread from the Unicode list over here..." Ok, I have rolled out a new bus and I know the wheels are coming off. -----Original Message----- From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Doug Ewell Sent: Friday, September 11, 2015 11:35 AM To: Rick McGowan Cc: Unicode Mailing List Subject: RE: VS: [somewhat off topic] straw poll Rick McGowan wrote: > In section 1.1, page 3: > > *Note, however, that the Unicode Standard does not encode > idiosyncratic, personal, novel, or private-use characters, nor does it > encode logos or graphics.* Is there a statement anywhere about entities that aren't characters in any sense, other than having an arbitrary glyph assigned to them in a font somewhere? What about encoding things on speculation of future use, without a clear indication of imminent adoption -- the criterion applied to the euro sign, and more recently to emoji? > I'm not sure UTC has ever made any specific pronouncement on the > topic, but they do sometimes add things to the notice of > non-approvals, which can generally be taken as a precedent. Unfortunately for those hoping for a definitive statement, even non-approvals are occasionally overturned; U+1E9E LATIN CAPITAL LETTER SHARP S leaps to mind. Evidently nothing short of a specific pronouncement on this specific topic will suffice. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From root at unicode.org Fri Sep 11 19:31:01 2015 From: root at unicode.org (Sarasvati) Date: Fri, 11 Sep 2015 19:31:01 -0500 Subject: VS: [somewhat off topic] straw poll Message-ID: <201509120031.t8C0V1Dx017549@sarasvati.unicode.org> Good morning everyone! This topic has probably now received enough attention, and thank you to all who have contributed. Let us please move along to something else. Everyone should now point their web browsers at the list policies and guidelines to refresh themselves: http://unicode.org/policies/mail_policy.html The main point to remember in the current context is that discussions of mail list policy are out of scope for this list. If you have problems with a subscriber, or with how a topic is unfolding, please write to the staff, not to the list. Moderation on this list has always been very light, and mainly to assure a basic level of civility in the discussions. If you have a problem with that, please consider filing a complaint via the contact form or contacting the offending user privately. http://www.unicode.org/reporting.html Your, -- Sarasavati From otto.stolz at uni-konstanz.de Sat Sep 12 06:21:22 2015 From: otto.stolz at uni-konstanz.de (Otto Stolz) Date: Sat, 12 Sep 2015 13:21:22 +0200 Subject: [somewhat off topic] straw poll In-Reply-To: References: Message-ID: <55F40AB2.7050303@uni-konstanz.de> Am 10. September 2015 um 20:04 h schrieb Peter Constable: > [?] creating a Web page containing (say) some Latin characters > - not obscure, [?] to use (say) Notepad and entering HTML > numeric character references; and that my findings were that > it worked. > Q1: Would you find that to be an interesting post [?] A1: No, because the scenario given is about a standard technique that every list participant is supposed to be aware of. I?d simply ignore a message of this type. If, however, a message were asking a question on this technique, I?d probably sent the author a short reply pointing to the pertinent FAQ entry, or HTML tutorial. > Q2: If I were to send messages along that line on a regular basis, > would that add value to your participation in the list, or reduce it? A2: Neither. If a particular author became notorious of this sort of contributions, I?d start to ignore his messages, altogether. If his messages would develop into a nuisance, I?d add him to the filter rules of my e-mail client. > Q3: If 50 people (still a small portion of the list membership) > were to send messages along that line on a regular basis, would > that add value to your participation in the list, or reduce it? A3: All of them would not start doing so at the same time, wouldn?t they? Hence, A2 would apply, on a per-case basis, without much ado. Best wishes, Otto From daniel.buenzli at erratique.ch Tue Sep 15 20:45:27 2015 From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=) Date: Wed, 16 Sep 2015 02:45:27 +0100 Subject: Grapheme clusters and east asian width Message-ID: Hello, Is there any guidance on how to combine the information given by grapheme clusters and the east asian width property to do fixed-width layouts in terminal emulators ? For example if we have: U+AC01 ( ? ) HANGUL SYLLABLE GAG This will delimit a single grapheme cluster with east asian width W and hence 2 columns in a tty. However if we have it as the sequence: U+1100 ( ? ) HANGUL CHOSEONG KIYEOK U+1161 ( ? ) HANGUL JUNGSEONG A U+11A8 ( ? ) HANGUL JONGSEONG KIYEOK This will delimit a single grapheme cluster, but if I try to add up their east asian widths (W, N, N), this would result in 4 columns. Does something na?ve like looking up only the east asian width of the first scalar value in the grapheme cluster and use 2 columns for it if this is F or W and 1 column otherwise work or are there counter examples where this breaks ? Or is there anything more clever that can be done ? Thanks, Daniel From daniel.buenzli at erratique.ch Wed Sep 16 12:44:38 2015 From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=) Date: Wed, 16 Sep 2015 18:44:38 +0100 Subject: Grapheme clusters and east asian width In-Reply-To: <55F9A2A3.2060500@bayarea.net> References: <55F9A2A3.2060500@bayarea.net> Message-ID: Le mercredi, 16 septembre 2015 ? 18:10, Edwin Hoogerbeets a ?crit : > Have you looked into the Unicode Normalization Algorithm? Since in general a precomposed character cannot always be found, I'll still need to apply unicode segmentation algorithm for finding grapheme clusters and I'd rather not add one more layer of processing if I can avoid it. Best, Daniel From richard.wordingham at ntlworld.com Wed Sep 16 14:33:51 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Wed, 16 Sep 2015 20:33:51 +0100 Subject: Grapheme clusters and east asian width In-Reply-To: References: Message-ID: <20150916203351.50403e3e@JRWUBU2> On Wed, 16 Sep 2015 02:45:27 +0100 Daniel B?nzli wrote: > This will delimit a single grapheme cluster, but if I try to add up > their east asian widths (W, N, N), this would result in 4 columns. > Does something na?ve like looking up only the east asian width of the > first scalar value in the grapheme cluster and use 2 columns for it > if this is F or W and 1 column otherwise work or are there counter > examples where this breaks ? Or is there anything more clever that > can be done ? The silence is a bit worrying, but I can't see why that wouldn't work for normal text in CJK scripts. (Hangul LLLLLVVVVTTTT would probably cause some problems!) Have you addressed the issue of Indic scripts? There are discontiguous grapheme clusters composed of indecomposable code points (e.g. U+17C4 KHMER VOWEL SIGN OO) and of decomposable code points (e.g. U+0BCA TAMIL VOWEL SIGN OO), and whether consonant + virama + consonant is one cell or two may even depend on the font (e.g. Devanagari). How are you handling ligatures between grapheme clusters, e.g. English ? There are Tamil and Tai Tham examples of compulsory ligatures, shri and naa. Looking further ahead, there are characters in the pipeline that should be either Mc or Mn depending on what the base consonant is! You have dealt with grapheme clusters with a width of one cell and a depth of two, haven't you? Actually, there's a good argument for some grapheme clusters occupying cells above and below the line! Richard. From lyratelle at gmx.de Wed Sep 16 15:27:25 2015 From: lyratelle at gmx.de (Dominikus Dittes Scherkl) Date: Wed, 16 Sep 2015 22:27:25 +0200 Subject: Grapheme clusters and east asian width In-Reply-To: References: Message-ID: <55F9D0AD.1010400@gmx.de> Am 16.09.2015 um 03:45 schrieb Daniel B?nzli: > Hello, > > Is there any guidance on how to combine the information given by > grapheme clusters and the east asian width property to do fixed-width > layouts in terminal emulators ? > > For example if we have: > > U+AC01 ( ? ) HANGUL SYLLABLE GAG > > This will delimit a single grapheme cluster with east asian width W > and hence 2 columns in a tty. However if we have it as the sequence: > > U+1100 ( ? ) HANGUL CHOSEONG KIYEOK U+1161 ( ? ) HANGUL JUNGSEONG A > U+11A8 ( ? ) HANGUL JONGSEONG KIYEOK > > > > This will delimit a single grapheme cluster, but if I try to add up > their east asian widths (W, N, N), this would result in 4 columns. > Why adding them up? I think every grapheme cluster of hangul syllables would have simply width 2 - that is the concept of CJK charakters. -- Dominikus Dittes Scherkl From asmus-inc at ix.netcom.com Wed Sep 16 16:14:11 2015 From: asmus-inc at ix.netcom.com (Asmus Freytag (t)) Date: Wed, 16 Sep 2015 14:14:11 -0700 Subject: Grapheme clusters and east asian width In-Reply-To: References: Message-ID: <55F9DBA3.20400@ix.netcom.com> An HTML attachment was scrubbed... URL: From daniel.buenzli at erratique.ch Wed Sep 16 16:34:17 2015 From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=) Date: Wed, 16 Sep 2015 22:34:17 +0100 Subject: Grapheme clusters and east asian width In-Reply-To: <55F9D0AD.1010400@gmx.de> References: <55F9D0AD.1010400@gmx.de> Message-ID: <804C0EBEE0E2487B91D1115BEB97922D@erratique.ch> Le mercredi, 16 septembre 2015 ? 21:27, Dominikus Dittes Scherkl a ?crit : > Why adding them up? > I think every grapheme cluster of hangul syllables would have simply > width 2 - that is the concept of CJK charakters. I don't personally know how CJK characters behave in general w.r.t. to width, that's why I'm asking. I'm just trying to find a simple, best-effort, data-driven algorithm for the problem at-hand by using standard properties and possibly without making built-in assumptions about scripts. Le mercredi, 16 septembre 2015 ? 20:33, Richard Wordingham a ?crit : > Have you addressed the issue of Indic scripts? There are > discontiguous grapheme clusters composed of indecomposable code points > (e.g. U+17C4 KHMER VOWEL SIGN OO) and of decomposable code points (e.g. > U+0BCA TAMIL VOWEL SIGN OO), Not sure I understand what you mean here. > and whether consonant + virama + consonant is one cell or two may even depend on the font (e.g. > Devanagari). Well anything that is related to font metrics is out of scope from the point of view of a tty as I can't get the information. For example it seems that U+1F400 to U+1F579 have an east-asian width of N but will actually occupy two columns in the built-in osx terminal; of course these scalar values are not east asian text per se. > How are you handling ligatures between grapheme clusters, > e.g. English ? Here again I'd need font information for that, I expect the tty not to make ligatures between f and i. Of course the best way would be to be able to hand out a string to the tty for it to measure. But then it already seems impossible to test whether a terminal is able to handle UTF-8 or not? Maybe trying to use that east asian width property, was not a good idea to start with. Best, Daniel From daniel.buenzli at erratique.ch Wed Sep 16 16:56:42 2015 From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=) Date: Wed, 16 Sep 2015 22:56:42 +0100 Subject: Grapheme clusters and east asian width In-Reply-To: <55F9DBA3.20400@ix.netcom.com> References: <55F9DBA3.20400@ix.netcom.com> Message-ID: Le mercredi, 16 septembre 2015 ? 22:14, Asmus Freytag (t) a ?crit : > "N" doesn't mean "narrow" but "neutral" - that is, the width is given by other consideration. Ah right ! Thanks. Narrow is Na. So a refined algorithm would be to actually do the summation in each grapheme cluster as I initially wanted to do with the mapping (F, W -> 2), (Na, H -> 1) (N -> 0) and if I get a 0 fallback on 1 or maybe try to make an educated guess according to the script/block. Best, Daniel From richard.wordingham at ntlworld.com Wed Sep 16 19:19:39 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Thu, 17 Sep 2015 01:19:39 +0100 Subject: Grapheme clusters and east asian width In-Reply-To: References: <55F9DBA3.20400@ix.netcom.com> Message-ID: <20150917011939.22861725@JRWUBU2> On Wed, 16 Sep 2015 22:56:42 +0100 Daniel B?nzli wrote: > Le mercredi, 16 septembre 2015 ? 22:14, Asmus Freytag (t) a ?crit : > > "N" doesn't mean "narrow" but "neutral" - that is, the width is > > given by other consideration. > > Ah right ! Thanks. Narrow is Na. > > So a refined algorithm would be to actually do the summation in each > grapheme cluster as I initially wanted to do with the mapping (F, W > -> 2), (Na, H -> 1) (N -> 0) and if I get a 0 fallback on 1 or maybe > try to make an educated guess according to the script/block. I think you have a problem with U+302E HANGUL SINGLE DOT TONE MARK and U+302F HANGUL DOUBLE DOT TONE MARK, contrary to what I said earlier. They are preposed combining marks with Grapheme_Extend=Yes and EAW=Wide. I'm not sure whether the (legacy & extended) grapheme cluster should occupy 2, 3 or 4 cells. I think 2 cells is wrong, so summation works better, contrary to what I said earlier. Does anyone know how EAW=Wide was derived for these characters? Apparently they were wide even when they were non-spacing marks (gc=Mn), e.g.. in Unicode Version 5.0, so I suspect the were not given individual consideration. I suspect they should be EAW=A(mbiguous). Richard. From richard.wordingham at ntlworld.com Wed Sep 16 20:25:47 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Thu, 17 Sep 2015 02:25:47 +0100 Subject: Grapheme clusters and east asian width In-Reply-To: <804C0EBEE0E2487B91D1115BEB97922D@erratique.ch> References: <55F9D0AD.1010400@gmx.de> <804C0EBEE0E2487B91D1115BEB97922D@erratique.ch> Message-ID: <20150917022547.5640ee26@JRWUBU2> On Wed, 16 Sep 2015 22:34:17 +0100 Daniel B?nzli wrote: > Le mercredi, 16 septembre 2015 ? 20:33, Richard Wordingham a ?crit : > > Have you addressed the issue of Indic scripts? There are > > discontiguous grapheme clusters composed of indecomposable code > > points (e.g. U+17C4 KHMER VOWEL SIGN OO) and of decomposable code > > points (e.g. U+0BCA TAMIL VOWEL SIGN OO), > > Not sure I understand what you mean here. In Khmer, a sequence is rendered with glyphs in the order /sign E, KA, sign AA/, and in Tamil a sequence is rendered with the glyphs in the order /sign EE, KA, sign AA/. All the glyphs have non-zero advance width. In both cases splits into two legacy grapheme clusters , but are a single extended grapheme cluster. In Tamil, is in NFC but not in NFD, and splits into > > and whether consonant + virama + consonant is one cell or two may > > even depend on the font (e.g. Devanagari). > > Well anything that is related to font metrics is out of scope from > the point of view of a tty as I can't get the information. You asked, "Is there any guidance on how to combine the information given by grapheme clusters and the east asian width property to do fixed-width layouts in terminal emulators ?". From this, I deduced that you are trying to write a terminal emulator. Are you actually trying to work out how a terminal emulator someone else wrote will position characters? Whether consonant + virama +consonant is once cell or two isn't a question of font metrics. For example, consider the sequence . This is composed of two legacy and extended grapheme clusters, and . In the 'Lohit Hindi' font, the two consonants are arranged vertically with no other representation of VIRAMA; horizontally, this is a single cell. In the 'gargi' font, one gets two instances of DDA side by side, with VIRAMA visible below the first. Both fonts are fully compliant with Unicode. If the terminal you are working with emulates a VT100, I believe it should be possible to ask it what the current cursor position is. At http://www.ccs.neu.edu/research/gpc/VonaUtils/vona/terminal/VT100_Escape_Codes.html , the query and response are called getcursor DSR and cursor CPR. > For > example it seems that U+1F400 to U+1F579 have an east-asian width of > N but will actually occupy two columns in the built-in osx terminal; > of course these scalar values are not east asian text per se. In so far as the property is useful, they probably should be ea=Wide. > Of course the best way would be to be able to hand out a string to > the tty for it to measure. But then it already seems impossible to > test whether a terminal is able to handle UTF-8 or not? > Maybe trying to use that east asian width property, was not a good > idea to start with. If you're trying to work out what a particular emulator will do, the starting point is its documentation. For many, the useful documentation may turn out to be the source code, which is not always available. However, a successful dialogue with the terminal would avoid these problems. It may even offer a solution to the problems of terminal size and text wrapping behaviour. Richard. From daniel.buenzli at erratique.ch Thu Sep 17 04:00:29 2015 From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=) Date: Thu, 17 Sep 2015 10:00:29 +0100 Subject: Grapheme clusters and east asian width In-Reply-To: <20150917022547.5640ee26@JRWUBU2> References: <55F9D0AD.1010400@gmx.de> <804C0EBEE0E2487B91D1115BEB97922D@erratique.ch> <20150917022547.5640ee26@JRWUBU2> Message-ID: <8523F8113D4A42CABC613375C5F95639@erratique.ch> Le jeudi, 17 septembre 2015 ? 02:25, Richard Wordingham a ?crit : > Are you actually trying to work out how a terminal emulator someone else wrote will position > characters? Yes. Basically given a, let's say single line, UTF-8 string to output to a, let's say an ANSI tty, I'd like to compute its visual extents. > In so far as the property is useful, they probably should be ea=Wide. This seems consistant with what is written in UAX #11 6.4 though. > If you're trying to work out what a particular emulator will do, the > starting point is its documentation. Unfortunately *many* emulators. Thanks, Daniel From richard.wordingham at ntlworld.com Thu Sep 17 07:27:31 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Thu, 17 Sep 2015 13:27:31 +0100 Subject: Grapheme clusters and east asian width In-Reply-To: <8523F8113D4A42CABC613375C5F95639@erratique.ch> References: <55F9D0AD.1010400@gmx.de> <804C0EBEE0E2487B91D1115BEB97922D@erratique.ch> <20150917022547.5640ee26@JRWUBU2> <8523F8113D4A42CABC613375C5F95639@erratique.ch> Message-ID: <20150917132731.77680f77@JRWUBU2> On Thu, 17 Sep 2015 10:00:29 +0100 Daniel B?nzli wrote: > Le jeudi, 17 septembre 2015 ? 02:25, Richard Wordingham a ?crit : > > If you're trying to work out what a particular emulator will do, the > > starting point is its documentation. > Unfortunately *many* emulators. The best estimator is probably the POSIX function wcswidth(). The terminal emulator might actually use that function to do its layout. Some do. If you need accuracy, you may have to resort to asking the terminal where the cursor is. Of course the latter might not work if only the general concept of a terminal (perhaps, better, teletype) is being emulated. I wouldn't expect either to work for an application being run from the emacs shell program, which works with 'proportional' fonts, though one might get a pleasant surprise. (For example, emacs *might* *convert* the cursor position to nominal cell widths.) Richard. From eliz at gnu.org Thu Sep 17 09:47:53 2015 From: eliz at gnu.org (Eli Zaretskii) Date: Thu, 17 Sep 2015 17:47:53 +0300 Subject: Grapheme clusters and east asian width In-Reply-To: <20150917132731.77680f77@JRWUBU2> References: <55F9D0AD.1010400@gmx.de> <804C0EBEE0E2487B91D1115BEB97922D@erratique.ch> <20150917022547.5640ee26@JRWUBU2> <8523F8113D4A42CABC613375C5F95639@erratique.ch> <20150917132731.77680f77@JRWUBU2> Message-ID: <83vbb9qezq.fsf@gnu.org> > Date: Thu, 17 Sep 2015 13:27:31 +0100 > From: Richard Wordingham > > The best estimator is probably the POSIX function wcswidth(). Only on glibc-based systems, I'm quite sure. > The > terminal emulator might actually use that function to do its layout. > Some do. If you need accuracy, you may have to resort to asking the > terminal where the cursor is. Of course the latter might not work > if only the general concept of a terminal (perhaps, better, teletype) > is being emulated. I wouldn't expect either to work for an application > being run from the emacs shell program, which works with 'proportional' > fonts, though one might get a pleasant surprise. (For example, emacs > *might* *convert* the cursor position to nominal cell widths.) When Emacs displays on a text terminal, it's up to the terminal to handle the font; Emacs speaks to the terminal in character cell units. When Emacs displays on a graphics terminal, it works in pixels, so cursor position in character units is not useful. In any case, where do you think Emacs takes its idea of the width of every character? What other database could it use, that can be relied upon on any of the modern OSes, except the UCD? From daniel.buenzli at erratique.ch Thu Sep 17 10:51:03 2015 From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=) Date: Thu, 17 Sep 2015 16:51:03 +0100 Subject: Grapheme clusters and east asian width In-Reply-To: <83vbb9qezq.fsf@gnu.org> References: <55F9D0AD.1010400@gmx.de> <804C0EBEE0E2487B91D1115BEB97922D@erratique.ch> <20150917022547.5640ee26@JRWUBU2> <8523F8113D4A42CABC613375C5F95639@erratique.ch> <20150917132731.77680f77@JRWUBU2> <83vbb9qezq.fsf@gnu.org> Message-ID: Le jeudi, 17 septembre 2015 ? 15:47, Eli Zaretskii a ?crit : > > Date: Thu, 17 Sep 2015 13:27:31 +0100 > > From: Richard Wordingham > > > > The best estimator is probably the POSIX function wcswidth(). > Only on glibc-based systems, I'm quite sure. Is there a formal definition of the algorithm used ? This [1] is not very helpful. Best, Daniel [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/wcswidth.html From eliz at gnu.org Thu Sep 17 11:24:02 2015 From: eliz at gnu.org (Eli Zaretskii) Date: Thu, 17 Sep 2015 19:24:02 +0300 Subject: Grapheme clusters and east asian width In-Reply-To: References: <55F9D0AD.1010400@gmx.de> <804C0EBEE0E2487B91D1115BEB97922D@erratique.ch> <20150917022547.5640ee26@JRWUBU2> <8523F8113D4A42CABC613375C5F95639@erratique.ch> <20150917132731.77680f77@JRWUBU2> <83vbb9qezq.fsf@gnu.org> Message-ID: <83oah1qajh.fsf@gnu.org> > Date: Thu, 17 Sep 2015 16:51:03 +0100 > From: Daniel B?nzli > Cc: Richard Wordingham , unicode at unicode.org > > > > Date: Thu, 17 Sep 2015 13:27:31 +0100 > > > From: Richard Wordingham > > > > > > The best estimator is probably the POSIX function wcswidth(). > > Only on glibc-based systems, I'm quite sure. > > Is there a formal definition of the algorithm used ? This [1] is not very helpful. They just use a table of values, AFAIK. From daniel.buenzli at erratique.ch Thu Sep 17 11:25:34 2015 From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=) Date: Thu, 17 Sep 2015 17:25:34 +0100 Subject: Grapheme clusters and east asian width In-Reply-To: <83oah1qajh.fsf@gnu.org> References: <55F9D0AD.1010400@gmx.de> <804C0EBEE0E2487B91D1115BEB97922D@erratique.ch> <20150917022547.5640ee26@JRWUBU2> <8523F8113D4A42CABC613375C5F95639@erratique.ch> <20150917132731.77680f77@JRWUBU2> <83vbb9qezq.fsf@gnu.org> <83oah1qajh.fsf@gnu.org> Message-ID: <85B20FFE60FC4FBA9EE6E867D19CEB90@erratique.ch> Le jeudi, 17 septembre 2015 ? 17:24, Eli Zaretskii a ?crit : > > Is there a formal definition of the algorithm used ? This [1] is not very helpful. > > They just use a table of values, AFAIK. But is it standardized or everyone has its own table ? Daniel From eliz at gnu.org Thu Sep 17 11:30:41 2015 From: eliz at gnu.org (Eli Zaretskii) Date: Thu, 17 Sep 2015 19:30:41 +0300 Subject: Grapheme clusters and east asian width In-Reply-To: <85B20FFE60FC4FBA9EE6E867D19CEB90@erratique.ch> References: <55F9D0AD.1010400@gmx.de> <804C0EBEE0E2487B91D1115BEB97922D@erratique.ch> <20150917022547.5640ee26@JRWUBU2> <8523F8113D4A42CABC613375C5F95639@erratique.ch> <20150917132731.77680f77@JRWUBU2> <83vbb9qezq.fsf@gnu.org> <83oah1qajh.fsf@gnu.org> <85B20FFE60FC4FBA9EE6E867D19CEB90@erratique.ch> Message-ID: <83mvwlqa8e.fsf@gnu.org> > Date: Thu, 17 Sep 2015 17:25:34 +0100 > From: Daniel B?nzli > Cc: richard.wordingham at ntlworld.com, unicode at unicode.org > > Le jeudi, 17 septembre 2015 ? 17:24, Eli Zaretskii a ?crit : > > > Is there a formal definition of the algorithm used ? This [1] is not very helpful. > > > > They just use a table of values, AFAIK. > > But is it standardized or everyone has its own table ? I don't know, but I'm sure you will find out if you look into the glibc sources. They are publicly available. From fantasai.lists at inkedblade.net Thu Sep 17 12:16:36 2015 From: fantasai.lists at inkedblade.net (fantasai) Date: Thu, 17 Sep 2015 13:16:36 -0400 Subject: [CSSWG][css-inline] Updated WD of CSS Inline Layout Message-ID: <55FAF574.2050501@inkedblade.net> The CSS WG has published an updated Working Draft of the CSS Inline Layout Module Level 3 http://www.w3.org/TR/css-inline-3/ This module covers inline vertical alignment and special typographic effects for initial letters, such as drop caps. Changes since the previous WD include: * Addition of initial drafts for 'dominant-baseline' as well as 'vertical-align' and its SVG longhands 'alignment-baseline' and 'baseline-shift'. http://www.w3.org/TR/css-inline-3/#line-height * Addition of the 'initial-letter-wrap' property. http://www.w3.org/TR/css-inline-3/#initial-letter-wrapping * A redesign of the 'initial-letter-align' property. http://www.w3.org/TR/css-inline-3/#aligning-initial-letter * A large variety of fixes, clarifications, and improvements to the initial letter layout model. http://www.w3.org/TR/css-inline-3/#initial-letter-styling We're actively looking for review on all aspects of the draft, and in particular need help with handling non-Western scripts. Please send any comments to the www-style mailing list, , http://lists.w3.org/Archives/Public/www-style/ and please, prefix the subject line with [css-inline] (as I did on this message). For the CSS WG, ~fantasai From richard.wordingham at ntlworld.com Thu Sep 17 13:59:04 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Thu, 17 Sep 2015 19:59:04 +0100 Subject: Grapheme clusters and east asian width In-Reply-To: <83mvwlqa8e.fsf@gnu.org> References: <55F9D0AD.1010400@gmx.de> <804C0EBEE0E2487B91D1115BEB97922D@erratique.ch> <20150917022547.5640ee26@JRWUBU2> <8523F8113D4A42CABC613375C5F95639@erratique.ch> <20150917132731.77680f77@JRWUBU2> <83vbb9qezq.fsf@gnu.org> <83oah1qajh.fsf@gnu.org> <85B20FFE60FC4FBA9EE6E867D19CEB90@erratique.ch> <83mvwlqa8e.fsf@gnu.org> Message-ID: <20150917195904.51d3cda2@JRWUBU2> On Thu, 17 Sep 2015 19:30:41 +0300 Eli Zaretskii wrote: > > Date: Thu, 17 Sep 2015 17:25:34 +0100 > > From: Daniel B?nzli > > Cc: richard.wordingham at ntlworld.com, unicode at unicode.org > > > > Le jeudi, 17 septembre 2015 ? 17:24, Eli Zaretskii a ?crit : > > > > Is there a formal definition of the algorithm used ? This [1] > > > > is not very helpful. > > > > > > They just use a table of values, AFAIK. > > > > But is it standardized or everyone has its own table ? > > I don't know, but I'm sure you will find out if you look into the > glibc sources. They are publicly available. Shouldn't be that the locale sources? That then makes sense, for ambiguous width is resolved differently in Eastern and Western traditions. However, the calculation from single character width to string width is quite na?ve - they are just added up, at least in some version of glibc! This doesn't work when a spacing mark decomposes into two spacing marks - gets a length of 2, while the canonically equivalent string gets a length of 3! This affects the positioning of text following them in gnome-terminal. Richard. From rwhlk142 at gmail.com Fri Sep 18 19:56:26 2015 From: rwhlk142 at gmail.com (Robert Wheelock) Date: Fri, 18 Sep 2015 20:56:26 -0400 Subject: Choton Alphabet Message-ID: Hello! Would anybody have a picture with the complete Choton alphabet script?! This conlang also uses a German-based transliteration alphabet ( for /x/ with for , while is for /z/ and stands in for /?/ ...). Initial words in sentences, and proper nouns (at least) get capitalized, like in German. Thank You! Robert Lloyd Wheelock INTERNATIONAL SYMBOLISM RESEARCH INSTITUTE Harmony, ME U.S.A. Augusta, ME U.S.A. -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists+unicode at seantek.com Sun Sep 20 09:48:01 2015 From: lists+unicode at seantek.com (Sean Leonard) Date: Sun, 20 Sep 2015 07:48:01 -0700 Subject: Concise term for non-ASCII Unicode characters Message-ID: <55FEC721.7040008@seantek.com> What is the most concise term for characters or code points outside of the US-ASCII range (U+0000 - U+007F)? Sometimes I have referred to these as "extended characters" or "non-ASCII Unicode" but I do not find those terms precise. We are talking about the code points U+0080 - U+10FFFF. I suppose that this also refers to code points/scalar values that are not formally Unicode characters, such as U+FFFF. Basically, I am looking for a concise term for values that would require multiple UTF-8 octets if encoded in UTF-8 (without referring to UTF-8 encoding specifically). "Non-ASCII" is not precise enough since character sets like Shift-JIS are non-ASCII. Also a citation to a relevant standard (whether Unicode or otherwise) would be helpful. The terms "supplementary character" and "supplementary code point" are defined in the Unicode standard, referring to characters or code points above U+FFFF. I am looking for something like those, but for characters or code points above U+007F. Thank you, Sean From petercon at microsoft.com Sun Sep 20 11:52:29 2015 From: petercon at microsoft.com (Peter Constable) Date: Sun, 20 Sep 2015 16:52:29 +0000 Subject: Concise term for non-ASCII Unicode characters In-Reply-To: <55FEC721.7040008@seantek.com> References: <55FEC721.7040008@seantek.com> Message-ID: You already have been using "non-ASCII Unicode", which is about as concise and sufficiently accurate as you'll get. There's no term specifically defined in any standard or conventionally used for this. Peter -----Original Message----- From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Sean Leonard Sent: Sunday, September 20, 2015 7:48 AM To: unicode at unicode.org Subject: Concise term for non-ASCII Unicode characters What is the most concise term for characters or code points outside of the US-ASCII range (U+0000 - U+007F)? Sometimes I have referred to these as "extended characters" or "non-ASCII Unicode" but I do not find those terms precise. We are talking about the code points U+0080 - U+10FFFF. I suppose that this also refers to code points/scalar values that are not formally Unicode characters, such as U+FFFF. Basically, I am looking for a concise term for values that would require multiple UTF-8 octets if encoded in UTF-8 (without referring to UTF-8 encoding specifically). "Non-ASCII" is not precise enough since character sets like Shift-JIS are non-ASCII. Also a citation to a relevant standard (whether Unicode or otherwise) would be helpful. The terms "supplementary character" and "supplementary code point" are defined in the Unicode standard, referring to characters or code points above U+FFFF. I am looking for something like those, but for characters or code points above U+007F. Thank you, Sean From addison at lab126.com Sun Sep 20 12:05:29 2015 From: addison at lab126.com (Phillips, Addison) Date: Sun, 20 Sep 2015 17:05:29 +0000 Subject: Concise term for non-ASCII Unicode characters In-Reply-To: References: <55FEC721.7040008@seantek.com> Message-ID: <2d89da431be946d7a3ec5085928e019f@EX13D08UWB002.ant.amazon.com> I agree, although I note that sometimes the additional (redundant) specificity of "non-7-bit-ASCII characters" is needed when talking to people unclear on what "ASCII" means. Addison > -----Original Message----- > From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Peter > Constable > Sent: Sunday, September 20, 2015 9:52 AM > To: Sean Leonard; unicode at unicode.org > Subject: RE: Concise term for non-ASCII Unicode characters > > You already have been using "non-ASCII Unicode", which is about as concise > and sufficiently accurate as you'll get. There's no term specifically defined in > any standard or conventionally used for this. > > > Peter > > -----Original Message----- > From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Sean > Leonard > Sent: Sunday, September 20, 2015 7:48 AM > To: unicode at unicode.org > Subject: Concise term for non-ASCII Unicode characters > > What is the most concise term for characters or code points outside of the > US-ASCII range (U+0000 - U+007F)? Sometimes I have referred to these as > "extended characters" or "non-ASCII Unicode" but I do not find those terms > precise. We are talking about the code points U+0080 - U+10FFFF. I suppose > that this also refers to code points/scalar values that are not formally > Unicode characters, such as U+FFFF. Basically, I am looking for a concise term > for values that would require multiple UTF-8 octets if encoded in UTF-8 > (without referring to UTF-8 encoding specifically). > "Non-ASCII" is not precise enough since character sets like Shift-JIS are non- > ASCII. > > Also a citation to a relevant standard (whether Unicode or otherwise) would > be helpful. > > The terms "supplementary character" and "supplementary code point" are > defined in the Unicode standard, referring to characters or code points > above U+FFFF. I am looking for something like those, but for characters or > code points above U+007F. > > Thank you, > > Sean From steve at swales.us Sun Sep 20 12:59:52 2015 From: steve at swales.us (Steve Swales) Date: Sun, 20 Sep 2015 10:59:52 -0700 Subject: Concise term for non-ASCII Unicode characters In-Reply-To: <2d89da431be946d7a3ec5085928e019f@EX13D08UWB002.ant.amazon.com> References: <55FEC721.7040008@seantek.com> <2d89da431be946d7a3ec5085928e019f@EX13D08UWB002.ant.amazon.com> Message-ID: Exactly. I think the reason that non-ASCII feels non-concise is that there is widespread confusion between ASCII and Latin-1/ISO 8859-1 (which in turn is widely confused with Windows-1252). -steve Sent from my iPhone > On Sep 20, 2015, at 10:05 AM, Phillips, Addison wrote: > > I agree, although I note that sometimes the additional (redundant) specificity of "non-7-bit-ASCII characters" is needed when talking to people unclear on what "ASCII" means. > > Addison > >> -----Original Message----- >> From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Peter >> Constable >> Sent: Sunday, September 20, 2015 9:52 AM >> To: Sean Leonard; unicode at unicode.org >> Subject: RE: Concise term for non-ASCII Unicode characters >> >> You already have been using "non-ASCII Unicode", which is about as concise >> and sufficiently accurate as you'll get. There's no term specifically defined in >> any standard or conventionally used for this. >> >> >> Peter >> >> -----Original Message----- >> From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Sean >> Leonard >> Sent: Sunday, September 20, 2015 7:48 AM >> To: unicode at unicode.org >> Subject: Concise term for non-ASCII Unicode characters >> >> What is the most concise term for characters or code points outside of the >> US-ASCII range (U+0000 - U+007F)? Sometimes I have referred to these as >> "extended characters" or "non-ASCII Unicode" but I do not find those terms >> precise. We are talking about the code points U+0080 - U+10FFFF. I suppose >> that this also refers to code points/scalar values that are not formally >> Unicode characters, such as U+FFFF. Basically, I am looking for a concise term >> for values that would require multiple UTF-8 octets if encoded in UTF-8 >> (without referring to UTF-8 encoding specifically). >> "Non-ASCII" is not precise enough since character sets like Shift-JIS are non- >> ASCII. >> >> Also a citation to a relevant standard (whether Unicode or otherwise) would >> be helpful. >> >> The terms "supplementary character" and "supplementary code point" are >> defined in the Unicode standard, referring to characters or code points >> above U+FFFF. I am looking for something like those, but for characters or >> code points above U+007F. >> >> Thank you, >> >> Sean > > From petercon at microsoft.com Sun Sep 20 14:24:14 2015 From: petercon at microsoft.com (Peter Constable) Date: Sun, 20 Sep 2015 19:24:14 +0000 Subject: Concise term for non-ASCII Unicode characters In-Reply-To: References: <55FEC721.7040008@seantek.com> <2d89da431be946d7a3ec5085928e019f@EX13D08UWB002.ant.amazon.com> Message-ID: Well, if the point is to refer to characters that would require two or more code units in UTF-8, then _accurate_ expressions would be, "Unicode characters beyond the Basic Latin block" or "Unicode characters above U+007F". Peter -----Original Message----- From: Steve Swales [mailto:steve at swales.us] Sent: Sunday, September 20, 2015 11:00 AM To: Phillips, Addison Cc: Peter Constable ; Sean Leonard ; unicode at unicode.org Subject: Re: Concise term for non-ASCII Unicode characters Exactly. I think the reason that non-ASCII feels non-concise is that there is widespread confusion between ASCII and Latin-1/ISO 8859-1 (which in turn is widely confused with Windows-1252). -steve Sent from my iPhone > On Sep 20, 2015, at 10:05 AM, Phillips, Addison wrote: > > I agree, although I note that sometimes the additional (redundant) specificity of "non-7-bit-ASCII characters" is needed when talking to people unclear on what "ASCII" means. > > Addison > >> -----Original Message----- >> From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Peter >> Constable >> Sent: Sunday, September 20, 2015 9:52 AM >> To: Sean Leonard; unicode at unicode.org >> Subject: RE: Concise term for non-ASCII Unicode characters >> >> You already have been using "non-ASCII Unicode", which is about as >> concise and sufficiently accurate as you'll get. There's no term >> specifically defined in any standard or conventionally used for this. >> >> >> Peter >> >> -----Original Message----- >> From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Sean >> Leonard >> Sent: Sunday, September 20, 2015 7:48 AM >> To: unicode at unicode.org >> Subject: Concise term for non-ASCII Unicode characters >> >> What is the most concise term for characters or code points outside >> of the US-ASCII range (U+0000 - U+007F)? Sometimes I have referred to >> these as "extended characters" or "non-ASCII Unicode" but I do not >> find those terms precise. We are talking about the code points U+0080 >> - U+10FFFF. I suppose that this also refers to code points/scalar >> values that are not formally Unicode characters, such as U+FFFF. >> Basically, I am looking for a concise term for values that would >> require multiple UTF-8 octets if encoded in UTF-8 (without referring to UTF-8 encoding specifically). >> "Non-ASCII" is not precise enough since character sets like Shift-JIS >> are non- ASCII. >> >> Also a citation to a relevant standard (whether Unicode or otherwise) >> would be helpful. >> >> The terms "supplementary character" and "supplementary code point" >> are defined in the Unicode standard, referring to characters or code >> points above U+FFFF. I am looking for something like those, but for >> characters or code points above U+007F. >> >> Thank you, >> >> Sean > > From daniel.buenzli at erratique.ch Sun Sep 20 14:57:10 2015 From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=) Date: Sun, 20 Sep 2015 20:57:10 +0100 Subject: Concise term for non-ASCII Unicode characters In-Reply-To: References: <55FEC721.7040008@seantek.com> <2d89da431be946d7a3ec5085928e019f@EX13D08UWB002.ant.amazon.com> Message-ID: Le dimanche, 20 septembre 2015 ? 18:59, Steve Swales a ?crit : > Exactly. I think the reason that non-ASCII feels non-concise is that there is widespread confusion between ASCII and Latin-1/ISO 8859-1 (which in turn is widely confused with Windows-1252). For this reason I usually use the term US-ASCII, which is the IANA name for the 7-bit-ASCII characters [1]. Someone referring to the non-US-ASCII scalar values of unicode would make precise sense to me. But then maybe Peter's very last suggestion is actually the most precise you can get to. Also if you are talking about UTF-8 I would use the term scalar values rather than "characters" or "code points" since surrogates can't be encoded in UTF-8. Best, Daniel [1] http://www.iana.org/assignments/character-sets From cph13 at case.edu Sun Sep 20 19:13:01 2015 From: cph13 at case.edu (Clive Hohberger) Date: Sun, 20 Sep 2015 19:13:01 -0500 Subject: Obituary for Adrian Frutiger Message-ID: http://www.nytimes.com/2015/09/20/arts/design/adrian-frutiger-dies-at-87-his-type-designs-show-you-the-way.html -- Clive P. Hohberger, PhD MBA Managing Director Clive Hohberger, LLC +1 847 910 8794 cph13 at case.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From duerst at it.aoyama.ac.jp Sun Sep 20 19:51:32 2015 From: duerst at it.aoyama.ac.jp (=?UTF-8?Q?Martin_J._D=c3=bcrst?=) Date: Mon, 21 Sep 2015 09:51:32 +0900 Subject: Concise term for non-ASCII Unicode characters In-Reply-To: <55FEC721.7040008@seantek.com> References: <55FEC721.7040008@seantek.com> Message-ID: <55FF5494.602@it.aoyama.ac.jp> Hello Sean, On 2015/09/20 23:48, Sean Leonard wrote: > What is the most concise term for characters or code points So we already have two different things we might need a term for. > outside of > the US-ASCII range (U+0000 - U+007F)? Sometimes I have referred to these > as "extended characters" Most of the characters outside the US-ASCII range are perfectly simple and basic characters. I don't think the term 'extended' fits well here. It gives the impression that everything except US-ASCII is somewhat extraordinary, which in this day and age shouldn't be the case anymore. > or "non-ASCII Unicode" but I do not find those > terms precise. We are talking about the code points U+0080 - U+10FFFF. I > suppose that this also refers to code points/scalar values that are not > formally Unicode characters, such as U+FFFF. Again we may need different terms depending on whether these are included or not. > Basically, I am looking for > a concise term for values that would require multiple UTF-8 octets if > encoded in UTF-8 (without referring to UTF-8 encoding specifically). > "Non-ASCII" is not precise enough since character sets like Shift-JIS > are non-ASCII. Well, the non-ASCII characters in Shift-JIS are also contained in Unicode, so depending on exactly what you want to talk about, Non-ASCII characters may be good enough. > Also a citation to a relevant standard (whether Unicode or otherwise) > would be helpful. > > The terms "supplementary character" and "supplementary code point" are > defined in the Unicode standard, referring to characters or code points > above U+FFFF. I am looking for something like those, but for characters > or code points above U+007F. And then in some cases, you may want to exclude the C0 area (U+0000-001F), or part of it, or some syntactically significant characters (e.g. punctuation) in the remaining part. Anyway, what I wanted to show is that depending on what you need it for, there are so many different variations that it doesn't pay off to create specific short terms for all of them, and the term you use currently may be short enough. Regards, Martin. From lists+unicode at seantek.com Mon Sep 21 03:22:14 2015 From: lists+unicode at seantek.com (Sean Leonard) Date: Mon, 21 Sep 2015 01:22:14 -0700 Subject: Concise term for non-ASCII Unicode characters In-Reply-To: <55FF5494.602@it.aoyama.ac.jp> References: <55FEC721.7040008@seantek.com> <55FF5494.602@it.aoyama.ac.jp> Message-ID: <55FFBE36.5030104@seantek.com> First of all, thank you all for the responses thus far. On 9/20/2015 5:51 PM, Martin J. D?rst wrote: > Hello Sean, > > On 2015/09/20 23:48, Sean Leonard wrote: >> What is the most concise term for characters or code points > > So we already have two different things we might need a term for. > [...] >> >> The terms "supplementary character" and "supplementary code point" are >> defined in the Unicode standard, referring to characters or code points >> above U+FFFF. I am looking for something like those, but for characters >> or code points above U+007F. > Anyway, what I wanted to show is that depending on what you need it > for, there are so many different variations that it doesn't pay off to > create specific short terms for all of them, and the term you use > currently may be short enough. Well what I am getting at is that when writing standards documents in various SDOs (or any other computer science text, for that matter), it is helpful to identify these characters/code points. I think we can limit our inquiry to "characters" and "code points". Both of those are well-defined in Unicode (see ). A [Unicode] code point is any value in the range 0 - 0x10FFFF. A [Unicode] character is an abstract character that is actually assigned a [Unicode] scalar value. Therefore the space is Unicode code point > Unicode scalar value > Unicode character. "supplementary" means outside the BMP, i.e., 0x10000 - 0x10FFFF. "BMP" means inside the Basic Multilingual Plane, i.e., 0x0 - 0xFFFF. The problem is that the BMP / supplementary distinction makes sense in a UCS-2 / UTF-16 universe. But for much interchange these days, UTF-8 is the way to go. I wish that "non-ASCII characters" and "non-ASCII code points" (and non-ASCII scalar values) were sufficient for me. Maybe they can be. However, in contexts where ASCII is getting extended or supplemented (e.g., in the DNS or in e-mail), one needs to be really clear that the octets 0x80 - 0xFF are Unicode (specifically UTF-8, I suppose), and not something else. The expressions "beyond [...] ASCII" or "beyond the ASCII range" (as in, characters beyond ASCII, code points beyond ASCII) have some support in the Unicode Standard; see, e.g., Section 2.5 "ASCII Transparency" paragraph. Additionally as Peter stated, an expression including "Basic Latin block" (e.g., characters beyond the Basic Latin block) could work. FWIW, the term "non-ASCII" is used in e-mail address internationalization ("EAI") in the IETF; its opposite is "all-ASCII" (or simply "ASCII"). (RFCs 6530, 6531, 6532). The term also appears in RFC 2047 from November 1996 but there it has the more expansive meaning (i.e., not limited or targeted to Unicode). Sean From Tony at Jollans.com Mon Sep 21 06:46:48 2015 From: Tony at Jollans.com (Tony Jollans) Date: Mon, 21 Sep 2015 12:46:48 +0100 Subject: Concise term for non-ASCII Unicode characters In-Reply-To: <55FFBE36.5030104@seantek.com> References: <55FEC721.7040008@seantek.com> <55FF5494.602@it.aoyama.ac.jp> <55FFBE36.5030104@seantek.com> Message-ID: <003f01d0f463$32ed5a60$98c80f20$@Jollans.com> As an interested outsider may I suggest that the term "ASCII", indeed the concept of ASCII, is only of historical interest and should not be used in any modern context. Computing is riddled with terms, "word" being another in similar vein, that are used to mean something they are not and would be best forgotten. These days, it is pretty sloppy coding that cares how many bytes an encoding of something requires, although there may be many circumstances where legacy support is required. You say that, in some contexts, one needs to be really clear that the octets 0x80 - 0xFF are Unicode. Either something "is" Unicode, or it isn't. Either something uses a recognised encoding, or it doesn't. Using these octets to represent Unicode code points is not ASCII, is not UTF-8, and is not UCS-2/UTF-16; it could, perhaps, be EBCDIC. Whatever it is, say so clearly and explicitly and, if necessary, say why; don't look for some mealy-mouthed expression to avoid so saying. Just my twopenn'orth, and no offence meant, but I can't help thinking you're looking for something that shouldn't exist. Best regards, Tony Jollans -----Original Message----- From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Sean Leonard Sent: 21 September 2015 09:22 To: unicode at unicode.org Subject: Re: Concise term for non-ASCII Unicode characters First of all, thank you all for the responses thus far. On 9/20/2015 5:51 PM, Martin J. D?rst wrote: > Hello Sean, > > On 2015/09/20 23:48, Sean Leonard wrote: >> What is the most concise term for characters or code points > > So we already have two different things we might need a term for. > [...] >> >> The terms "supplementary character" and "supplementary code point" >> are defined in the Unicode standard, referring to characters or code >> points above U+FFFF. I am looking for something like those, but for >> characters or code points above U+007F. > Anyway, what I wanted to show is that depending on what you need it > for, there are so many different variations that it doesn't pay off to > create specific short terms for all of them, and the term you use > currently may be short enough. Well what I am getting at is that when writing standards documents in various SDOs (or any other computer science text, for that matter), it is helpful to identify these characters/code points. I think we can limit our inquiry to "characters" and "code points". Both of those are well-defined in Unicode (see ). A [Unicode] code point is any value in the range 0 - 0x10FFFF. A [Unicode] character is an abstract character that is actually assigned a [Unicode] scalar value. Therefore the space is Unicode code point > Unicode scalar value > Unicode character. "supplementary" means outside the BMP, i.e., 0x10000 - 0x10FFFF. "BMP" means inside the Basic Multilingual Plane, i.e., 0x0 - 0xFFFF. The problem is that the BMP / supplementary distinction makes sense in a UCS-2 / UTF-16 universe. But for much interchange these days, UTF-8 is the way to go. I wish that "non-ASCII characters" and "non-ASCII code points" (and non-ASCII scalar values) were sufficient for me. Maybe they can be. However, in contexts where ASCII is getting extended or supplemented (e.g., in the DNS or in e-mail), one needs to be really clear that the octets 0x80 - 0xFF are Unicode (specifically UTF-8, I suppose), and not something else. The expressions "beyond [...] ASCII" or "beyond the ASCII range" (as in, characters beyond ASCII, code points beyond ASCII) have some support in the Unicode Standard; see, e.g., Section 2.5 "ASCII Transparency" paragraph. Additionally as Peter stated, an expression including "Basic Latin block" (e.g., characters beyond the Basic Latin block) could work. FWIW, the term "non-ASCII" is used in e-mail address internationalization ("EAI") in the IETF; its opposite is "all-ASCII" (or simply "ASCII"). (RFCs 6530, 6531, 6532). The term also appears in RFC 2047 from November 1996 but there it has the more expansive meaning (i.e., not limited or targeted to Unicode). Sean From daniel.buenzli at erratique.ch Mon Sep 21 06:55:04 2015 From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=) Date: Mon, 21 Sep 2015 12:55:04 +0100 Subject: Concise term for non-ASCII Unicode characters In-Reply-To: <55FFBE36.5030104@seantek.com> References: <55FEC721.7040008@seantek.com> <55FF5494.602@it.aoyama.ac.jp> <55FFBE36.5030104@seantek.com> Message-ID: <7FCFDF52D20A4BABA6389154520D5A37@erratique.ch> Le lundi, 21 septembre 2015 ? 09:22, Sean Leonard a ?crit : > I think we can limit our inquiry to "characters" and "code points". Both > of those are well-defined in Unicode (see > ). I wouldn't say so. If you actually have a look at the definition for character on this page. There are at least 4 different definitions for the notion of character and if you take the one that has formal one attached, i.e. synonym for abstract character (D7), then an abstract character can actually be represented by a *sequence* of Unicode scalar values. If you are operating in the context of a standard or technical documentation please do use either code points (D9, D10) or scalar values (D76). These notions have precise definitions which makes up for saner discussions and understandings. > I wish that "non-ASCII characters" and "non-ASCII code points" (and > non-ASCII scalar values) were sufficient for me. Maybe they can be. > However, in contexts where ASCII is getting extended or supplemented > (e.g., in the DNS or in e-mail), one needs to be really clear that the > octets 0x80 - 0xFF are Unicode (specifically UTF-8, I suppose), and not > something else. So it seems that you want terminology to talk about the *encoding* of Unicode scalar values, rather than scalar values themselves. Then I think you should specifically avoid terminology like "octets of 0x80-0xFF are Unicode" since this doesn't really make sense, there no Unicode property on octets. You should rather say something like "these octets may belong to the UTF-8 encoding scheme (D95) of Unicode scalar values greater than U+001F". Best, Daniel From doug at ewellic.org Mon Sep 21 10:42:38 2015 From: doug at ewellic.org (Doug Ewell) Date: Mon, 21 Sep 2015 08:42:38 -0700 Subject: Concise term for non-ASCII Unicode characters Message-ID: <20150921084238.665a7a7059d7ee80bb4d670165c8327d.3d8fca1ad4.wbe@email03.secureserver.net> Sean Leonard wrote: > Additionally as Peter stated, an expression including "Basic Latin > block" (e.g., characters beyond the Basic Latin block) could work. I was thinking that something like "non?Basic-Latin Unicode" might be useful. It avoids the confusion of referring to ASCII as a range of code points instead of a separate encoding standard. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From richard.wordingham at ntlworld.com Mon Sep 21 13:18:29 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Mon, 21 Sep 2015 19:18:29 +0100 Subject: Concise term for non-ASCII Unicode characters In-Reply-To: <003f01d0f463$32ed5a60$98c80f20$@Jollans.com> References: <55FEC721.7040008@seantek.com> <55FF5494.602@it.aoyama.ac.jp> <55FFBE36.5030104@seantek.com> <003f01d0f463$32ed5a60$98c80f20$@Jollans.com> Message-ID: <20150921191829.16a502d3@JRWUBU2> On Mon, 21 Sep 2015 12:46:48 +0100 "Tony Jollans" wrote: > These days, it is pretty sloppy coding that cares how many bytes an > encoding of something requires, although there may be many > circumstances where legacy support is required. Wow! Are you saying that code chopping up arbitrary character sequences for legibility (and editability!) and to avoid buffering issues should generally assume it will be read as UTF-8, and avoid splitting well-formed UTF-8 characters? (If the text is actually Windows-1252, there may be a lot of apparently ill-formed UTF-8 characters/gibberish.) > You say that, in some > contexts, one needs to be really clear that the octets 0x80 - 0xFF > are Unicode. Either something "is" Unicode, or it isn't. Either > something uses a recognised encoding, or it doesn't. Using these > octets to represent Unicode code points is not ASCII, is not UTF-8, > and is not UCS-2/UTF-16; it could, perhaps, be EBCDIC. But most of these octets *are* used to represent non-ASCII scalar values. It's just that they have to operate in combinations for UTF-8. Richard. From Tony at Jollans.com Mon Sep 21 14:54:23 2015 From: Tony at Jollans.com (Tony Jollans) Date: Mon, 21 Sep 2015 20:54:23 +0100 Subject: Concise term for non-ASCII Unicode characters In-Reply-To: <20150921191829.16a502d3@JRWUBU2> References: <55FEC721.7040008@seantek.com> <55FF5494.602@it.aoyama.ac.jp> <55FFBE36.5030104@seantek.com> <003f01d0f463$32ed5a60$98c80f20$@Jollans.com> <20150921191829.16a502d3@JRWUBU2> Message-ID: <000801d0f4a7$5643e3f0$02cbabd0$@Jollans.com> Goodness, sorry, no, I didn't mean that at all!!! What I meant was that a recognised encoding should be used consistently, regardless of the number of bytes required, and all encodings of Unicode code points are necessarily potentially multi-byte. Single-byte encodings may save a little bit of space, and may be Windows-1252, or Windows-1253, or one of many other encodings but not, in any sense, Unicode encodings. Windows code pages and their ilk predate Unicode, and I would only ever expect to see them used in environments where legacy support is needed, and would not expect a significant amount of new documentation about them to be written. When it is necessary to describe them, one should do so fully and properly, which is whatever it is, but they really have no meaning in a Unicode context. Nor, as far as I'm aware, do the 0x80 to 0xFF octets have any special meaning in Unicode that would require there to be a recognisable term to describe them. Code that processes arbitrary *character* sequences (for legibility or any other reason) should, surely, work with characters, which may be sequences of code points, each of which may be a sequence of bytes. I can think of no reason for chopping up byte sequences except where they are going to be recombined later, by the reverse treatment, and code, if required, that does so probably has no idea of, and need not have any idea of, meaning, and can only, surely, work with bytes. The actual octets are, of course, used in combinations, but not singly in any way that requires them to be described in Unicode terms. Or am I missing something fundamental? Best, Tony -----Original Message----- From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Richard Wordingham Sent: 21 September 2015 19:18 To: unicode at unicode.org Subject: Re: Concise term for non-ASCII Unicode characters On Mon, 21 Sep 2015 12:46:48 +0100 "Tony Jollans" wrote: > These days, it is pretty sloppy coding that cares how many bytes an > encoding of something requires, although there may be many > circumstances where legacy support is required. Wow! Are you saying that code chopping up arbitrary character sequences for legibility (and editability!) and to avoid buffering issues should generally assume it will be read as UTF-8, and avoid splitting well-formed UTF-8 characters? (If the text is actually Windows-1252, there may be a lot of apparently ill-formed UTF-8 characters/gibberish.) > You say that, in some > contexts, one needs to be really clear that the octets 0x80 - 0xFF are > Unicode. Either something "is" Unicode, or it isn't. Either something > uses a recognised encoding, or it doesn't. Using these octets to > represent Unicode code points is not ASCII, is not UTF-8, and is not > UCS-2/UTF-16; it could, perhaps, be EBCDIC. But most of these octets *are* used to represent non-ASCII scalar values. It's just that they have to operate in combinations for UTF-8. Richard. From lists+unicode at seantek.com Mon Sep 21 15:51:42 2015 From: lists+unicode at seantek.com (Sean Leonard) Date: Mon, 21 Sep 2015 13:51:42 -0700 Subject: Concise term for non-ASCII Unicode characters In-Reply-To: <55FEC721.7040008@seantek.com> References: <55FEC721.7040008@seantek.com> Message-ID: <56006DDE.2020408@seantek.com> Related question as I am researching this: How can I acquire (cheaply or free) the latest and most official copy of US-ASCII, namely, the version that Unicode references? The Unicode Standard 8.0 refers to the following document: ANSI X3.4: American National Standards Institute. Coded character set?7-bit American national standard code for information interchange. New York: 1986. (ANSI X3.4-1986). (See page 294.) A quick Google search did not yield results. There are public/university library hard copies but they are hundreds of miles away from my location. Sean From verdy_p at wanadoo.fr Mon Sep 21 16:34:19 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Mon, 21 Sep 2015 23:34:19 +0200 Subject: Concise term for non-ASCII Unicode characters In-Reply-To: <000801d0f4a7$5643e3f0$02cbabd0$@Jollans.com> References: <55FEC721.7040008@seantek.com> <55FF5494.602@it.aoyama.ac.jp> <55FFBE36.5030104@seantek.com> <003f01d0f463$32ed5a60$98c80f20$@Jollans.com> <20150921191829.16a502d3@JRWUBU2> <000801d0f4a7$5643e3f0$02cbabd0$@Jollans.com> Message-ID: