From c933103 at gmail.com Wed Jul 1 03:15:28 2015 From: c933103 at gmail.com (gfb hjjhjh) Date: Wed, 1 Jul 2015 16:15:28 +0800 Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of Flags) Message-ID: The UTC is considering a proposal to extend the types of flags which can be reliably represented by certain sequences of Unicode characters. In addition to the current mechanism using pairs of regional indicator symbols?already widely implemented?the proposal would use sequences of the TAG characters in the range U+E0030..U+E005A to represent other types of flags. The proposal also provides guidelines to specify valid sequences of TAG characters and how to interpret them. Full details of the proposal are provided in the background document . The UTC welcomes feedback on this proposed new mechanism. Feedback could consist of an indication of support or opposition to the proposal, with reasons why, or could consist of suggestions for improvement of the proposal. For further information, please see the Public Review Issues page. http://blog.unicode.org/2015/06/representing-additional-types-of-flags.html -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: flag-snippets.jpg Type: image/jpeg Size: 36250 bytes Desc: not available URL: From charupdate at orange.fr Wed Jul 1 03:47:55 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Wed, 1 Jul 2015 10:47:55 +0200 (CEST) Subject: WORD JOINER vs ZWNBSP In-Reply-To: <20150630223305.67b8da0f@JRWUBU2> References: <1851400009.9981.1435420121813.JavaMail.www@wwinf1d10> <20150630074746.79ff7cf7@JRWUBU2> <1430770470.10024.1435656344025.JavaMail.www@wwinf1m18> <20150630223305.67b8da0f@JRWUBU2> Message-ID: <1398757226.8479.1435740475172.JavaMail.www@wwinf1m18> On Tue, Jun 30, 2015, Richard Wordingham wrote: > On Tue, 30 Jun 2015 11:25:43 +0200 (CEST) > Marcel Schneider wrote: > > > On Mon, Jun 30, 2015, Richard Wordingham wrote: > > > I tested on Microsoft Word 2010 Starter running on Windows 7 Starter, > > on a netbook. This software being based on the full versions, the > > interpretation of U+FEFF must be the standard behavior. I tested in > > Latin script. You may wish to redo the tests, so please open a new > > document, input two words, replace the blank with whatever character > > the word boundaries behavior is to be checked of, and search for one > > of the two words with the 'whole word' option enabled. If the result > > is none, the test character indicates the absence of word boundaries; > > if there is a result, the test character indicates the presence of > > word boundaries. Yesterday (On Tue, Jun 30, 2015) already, I?wondered how my text could be altered with needlessly suppressed and added line breaks. Now I wish everybody to take notice that, at least on this Public List, I *never* quoted anybody this way: ? > At some time in June 2015, Richard Wordingham wrote: This is why, to get started with this reply, I?replaced that line with the accurate one, which can be checked at http://www.unicode.org/mail-arch/unicode-ml/y2015-m06/0279.html (except the e-mail address, which is suppressed by the list engine at archiving, and will be so here again): On Tue, Jun 30, 2015, Richard Wordingham wrote: _______ > I did my own tests in word 2010 with Windows 7. Although U+FEFF and > U+2060 displayed differently when I enabled the display of > 'non-printing' characters (spaces, inactive soft hyphens, non-breaking > hyphens, paragraph ends etc.), the behaved the same when embedded in > French l'eau and Thai ?? - they changed each word to two words, as > detected by ctrl/rt-arrow. However, this is wrong. At the same time, Doug Ewell (to whom I'll reply soon, as well as to Khaled Hosny) was writing exactly what I see at display: a .notdef box. Personally I've enabled for current display: paragraph ends, manual line breaks, tabulation characters, text limits. (Unfortunately I cannot enable separately the display of style separators too. To see them, I must enable all, as Richard did for test.) Ctrl + RIGHT overrides APOSTROPHEs and in-word single closing-quotes, and can therefore not be used to detect word boundaries. Perhaps you might consider to run the test as I did. It goes as follows: 1 Open a new document. 2 input two words with a blank between. 3 Replace the blank with whatever character the word boundaries behavior is to be checked of. 4 Do a search for one of the two words with the 'whole word' option enabled. ? If the result is 'No instance found', the test character indicates the absence of word boundaries. ? If the result is 'One instance found', the test character indicates the presence of word boundaries. This way, you will be told by Microsoft Word that the word 'eau' is found, because you used U+0027. Same result with U+2019. It wouldn't be until you use U+02BC, that U+006C U+02BC U+0065 U+0061 U+0075 is considered as a single word. With U+006C U+02BC U+FEFF U+0065 U+0061 U+0075, you will find the word 'eau' again. This is not wrong, given that a word joiner is expected to join words, in order that no NBSP nor any other no-break white space is needed to prevent line breaks between them. However, the words remain words. This is why Ctrl + RIGHT makes a stop at U+FEFF, detecting a word boundary. The overriding of in-word punctuations by quick cursor move is for word processing convenience only, in English as well as in French and other languages. In your example, when 'l'eau' (the water) is to be replaced with its counter-part 'la terre' (the land), when placing the cursor at the end and pressing Ctrl + BACKSPACE, you get the two words deleted and can immediately rewrite the non-elided article and the new word. But, as I say, that is not a test for word boundaries. > >> No, this doesn't work. > > Clarification: It doesn't work in correct software. Correct software > would have treated the modified words as single words. As far as belongs to the French example, the elided article and the noun are *already* treated as two words in correct software. There are spell-checkers which don't recognize a word when it is preceded by an elided article with apostrophe, but these are *not* correct software. And they are *not* from Microsoft. About Thai I've no knowledge, but I guess that ?? is a correct word, and therefore, correct software will take notice of the U+FEFF or U+2060 you add between the two characters and therefore assume that you mean *two* words but that you just won't have any blank between them. This is not wrong, again, and it is consistent with the fact that correct software complies to the Standards, that the Standards are designed to be useful, and that correct software is useful software. Talking about software, what use else of being correct? Marcel ? > Message du 30/06/15 23:40 > De : "Richard Wordingham" > A : "Unicode Mailing List" > Copie ? : > Objet : Re: WORD JOINER vs ZWNBSP > > On Tue, 30 Jun 2015 11:25:43 +0200 (CEST) > Marcel Schneider wrote: > > > At some time in June 2015, Richard Wordingham wrote: > > > I tested on Microsoft Word 2010 Starter running on Windows 7 Starter, > > on a netbook. This software being based on the full versions, the > > interpretation of U+FEFF must be the standard behavior. I?tested in > > Latin script. You may wish to redo the tests, so please open a new > > document, input two words, replace the blank with whatever character > > the word boundaries behavior is to be checked of, and search for one > > of the two words with the 'whole word' option enabled. If the result > > is none, the test character indicates the absence of word boundaries; > > if there is a result, the test character indicates the presence of > > word boundaries. > > I did my own tests in word 2010 with Windows 7. Although U+FEFF and > U+2060 displayed differently when I enabled the display of > 'non-printing' characters (spaces, inactive soft hyphens, non-breaking > hyphens, paragraph ends etc.), the behaved the same when embedded in > French l'eau and Thai ?? - they changed each word to two words, as > detected by ctrl/rt-arrow. However, this is wrong. > > > >> No, this doesn't work. > > Clarification: It doesn't work in correct software. Correct software > would have treated the modified words as single words. > > Richard. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Wed Jul 1 03:57:49 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Wed, 1 Jul 2015 10:57:49 +0200 Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of Flags) In-Reply-To: References: Message-ID: I oppose this proposal for the simple reason that it thinks hyphen separations are not necessary. Possibly true today but there will be extensions in some future needing more than 2 letters or 3 digits in the primary subtag. even for iso 3166-2 the regional subtags are very likely to change and without separators the extension,s will become ambiguous 2015-07-01 10:15 GMT+02:00 gfb hjjhjh : > The UTC is > considering a proposal to extend the types of flags which can be reliably > represented by certain sequences of Unicode characters. In addition to the > current mechanism using pairs of regional indicator symbols?already widely > implemented?the proposal would use sequences of the TAG characters in the > range U+E0030..U+E005A to represent other types of flags. The proposal also > provides guidelines to specify valid sequences of TAG characters and how to > interpret them. Full details of the proposal are provided in the background > document > > . > > The UTC welcomes feedback on this proposed new mechanism. Feedback could > consist of an indication of support or opposition to the proposal, with > reasons why, or could consist of suggestions for improvement of the > proposal. > > For further information, please see the Public Review Issues > page. > > http://blog.unicode.org/2015/06/representing-additional-types-of-flags.html > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: flag-snippets.jpg Type: image/jpeg Size: 36250 bytes Desc: not available URL: From dzo at bisharat.net Wed Jul 1 08:50:17 2015 From: dzo at bisharat.net (dzo at bisharat.net) Date: Wed, 1 Jul 2015 13:50:17 +0000 Subject: Adding RAINBOW FLAG to Unicode In-Reply-To: <55915B2C.3060809@att.net> References: <84968C090B5F47409EF2006CF5309985@DougEwell> <55915B2C.3060809@att.net> Message-ID: <78734928-1435758617-cardhu_decombobulator_blackberry.rim.net-1181414367-@b27.c4.bise6.blackberry> Whatever notation that might be added to whatever decision is ultimately made on this should probably mention historic use of the rainbow flag by the peace movement. See for example: https://en.wikipedia.org/wiki/Peace_flag#Rainbow_flag Sent via BlackBerry by AT&T -----Original Message----- From: Ken Whistler Sender: "Unicode" Date: Mon, 29 Jun 2015 07:50:20 To: Noah Slater Cc: Subject: Re: Adding RAINBOW FLAG to Unicode Noah, Additional information you should have is that the UTC is about to publish a new Public Review Issue on the topic of an extended mechanism for the representation of more flag emoji with sequences of tag characters. (Note: *not* representation as encoded single character symbols.) That PRI, when it is available (should be quite soon -- early this week), will be explicitly addressing concerns about state, regional, and international flags. I don't think it will explicitly address "or otherwise", but additional flag emoji that don't happen to be covered by the regional and sub-regional tag mechanisms in the PRI would certainly be in scope for discussion and feedback on the PRI. Other short notes on comments in this long thread: 1. The claim that Twitter is including a RAINBOW FLAG would be taken into consideration by the Emoji Subcommittee. Compatibility with existing systems in wide use is a strong factor in favor of additions: http://www.unicode.org/reports/tr51/#Selection_Factors_Compatibility 2. But on the other hand the offhand note: "When I mentioned my email to a queer friend, they asked if I might propose other pride flags (*as there are many*)." (emphasis added) illustrates the fundamental problem here. There is no effective end to the "or otherwise" case for flags as symbols, and that is why they are "generally not amenable to representation by encoded characters". Any simple image search for "pride flag" or "pride flag list" illustrates the problem amply: https://s-media-cache-ak0.pinimg.com/236x/69/83/f3/6983f3b9a4f68468bb101383006aa565.jpg https://s-media-cache-ak0.pinimg.com/236x/61/88/95/618895059533cb5b52c55cecd641881d.jpg That is not the realm of *characters* -- it is the realm of graphic design of flags, emblems, and frankly, at this point, heraldry. ;-) So, to sum up, I suggest that this thread about the RAINBOW FLAG be directed to the soon-to-be-posted Public Review Issue about extending the generative mechanisms for representing emoji symbols for flags, but that that feedback carefully consider how such an addition would coexist with other mechanisms for extensions of flag representation *and* how it could be reasonably limited to one instead of 28 (... or 500) more flags. --Ken P.S. While I do think there might be a strong case made for the RAINBOW FLAG to be added to the list of emoji flags representable by *some* kind of extension mechanism in Unicode, there really, really is no end to the "or otherwise" case. I happen to live in the city of Oakland, California. Try an image search on "Oakland flag". You start with a more-or-less official City flag, which kind of fits in the city as sub-region of region paradigm, and which can be spotted flying at the Oakland City Hall, but this quickly tails off into a gazillion variants, and various flags as sports memorabilia. I'm quite certain that an Oakland A's flag emoji would be locally quite popular if it were available on people's phones, for example. On 6/28/2015 3:36 PM, Noah Slater wrote: > > I really wish they'd provided a justification for this statement! :) I > guess that this is the right list for a UTC officer to give some sort > of feedback. > > On Sun, 28 Jun 2015 at 21:23 Doug Ewell > wrote: > > > Additionally, the domain of flags is > generally not amenable to representation by encoded characters, > and the > UTC does not wish to entertain further proposals for encoding of > symbol > characters for flags, whether national, state, regional, > international, > or otherwise. References to UTC Minutes: [134-C2], January 28, 2013." > > The last clause is the relevant one here: "whether national, state, > regional, international, or otherwise." The words "or otherwise" could > be interpreted as saying that no *specific* flag of any kind will be > encoded in the future as a single character, partly because the domain > of flags is so open-ended. That would include flags associated with or > representing specific groups of individuals or social causes. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dzo at bisharat.net Wed Jul 1 08:55:02 2015 From: dzo at bisharat.net (dzo at bisharat.net) Date: Wed, 1 Jul 2015 13:55:02 +0000 Subject: Unencoded Latin capitals Message-ID: <239490429-1435758902-cardhu_decombobulator_blackberry.rim.net-451267535-@b27.c4.bise6.blackberry> Michael, Is there a list of lower case Latin letters needing capital equivalents? TIA, Don ------Original Message------ From: Michael Everson Sender: Unicode To: Unicode Public Subject: Re: Adding RAINBOW FLAG to Unicode Sent: Jun 27, 2015 5:56 PM On 27 Jun 2015, at 22:46, Konstantin Ritt wrote: > > U+1F3F3, U+200D, U+2620 > WAVING WHITE FLAG, ZERO WIDTH JOINER, SKULL AND CROSSBONES And thus the slippery slope is well and truly discovered. Gosh, I wish we could add capital equivalents to all (or most of) the un-cased lower-case letters we?ve got for Latin. That at least would be practical. Michael Everson * http://www.evertype.com/ Sent via BlackBerry by AT&T From dzo at bisharat.net Wed Jul 1 09:02:08 2015 From: dzo at bisharat.net (dzo at bisharat.net) Date: Wed, 1 Jul 2015 14:02:08 +0000 Subject: Unicode & the architecture of ICT Message-ID: <1836500250-1435759328-cardhu_decombobulator_blackberry.rim.net-1230244883-@b27.c4.bise6.blackberry> Fyi, a quick reflection on Unicode and enabling use of African languages in ICT. Addresses mainly people not expert on the subject: http://niamey.blogspot.com/2015/06/unicode-and-architecture-of-ict.html Sent via BlackBerry by AT&T From nslater at tumbolia.org Wed Jul 1 11:20:08 2015 From: nslater at tumbolia.org (Noah Slater) Date: Wed, 01 Jul 2015 16:20:08 +0000 Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of Flags) In-Reply-To: References: Message-ID: Can someone help me understand what this means for my rainbow flag proposal? On Wed, 1 Jul 2015 at 10:02 Philippe Verdy wrote: > I oppose this proposal for the simple reason that it thinks hyphen > separations are not necessary. Possibly true today but there will be > extensions in some future needing more than 2 letters or 3 digits in the > primary subtag. even for iso 3166-2 the regional subtags are very likely > to change and without separators the extension,s will become ambiguous > > 2015-07-01 10:15 GMT+02:00 gfb hjjhjh : > >> The UTC is >> considering a proposal to extend the types of flags which can be reliably >> represented by certain sequences of Unicode characters. In addition to the >> current mechanism using pairs of regional indicator symbols?already widely >> implemented?the proposal would use sequences of the TAG characters in the >> range U+E0030..U+E005A to represent other types of flags. The proposal also >> provides guidelines to specify valid sequences of TAG characters and how to >> interpret them. Full details of the proposal are provided in the background >> document >> >> . >> >> The UTC welcomes feedback on this proposed new mechanism. Feedback could >> consist of an indication of support or opposition to the proposal, with >> reasons why, or could consist of suggestions for improvement of the >> proposal. >> >> For further information, please see the Public Review Issues >> page. >> >> http://blog.unicode.org/2015/06/representing-additional-types-of-flags.html >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: flag-snippets.jpg Type: image/jpeg Size: 36250 bytes Desc: not available URL: From doug at ewellic.org Wed Jul 1 11:45:25 2015 From: doug at ewellic.org (Doug Ewell) Date: Wed, 01 Jul 2015 09:45:25 -0700 Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of Flags) Message-ID: <20150701094525.665a7a7059d7ee80bb4d670165c8327d.036fd1878f.wbe@email03.secureserver.net> Noah Slater wrote: > Can someone help me understand what this means for my rainbow flag > proposal? You may want to go back and read Ken Whistler's suggestion from Monday: > I suggest that this thread about the RAINBOW FLAG be > directed to the soon-to-be-posted Public Review Issue about extending > the generative mechanisms for representing emoji symbols for flags, > but that that feedback carefully consider how such an addition would > coexist with other mechanisms for extensions of flag representation > *and* how it could be reasonably limited to one instead of 28 (... or > 500) more flags. I posted feedback yesterday on this PRI that was intended to be consistent with what Ken wrote: > Any proposal to extend the mechanism to cover the many other types of > flags -- for historical regions, NGOs, maritime, sports, or social or > political causes -- must be systematic and well-planned, not ad-hoc or > haphazard, to assure interoperability and extensibility. In other words, to the extent you wish to pursue encoding the rainbow flag as a flag-tag sequence, I suggest this is part of a broader problem space (how to encode flags for non-geopolitical entities) and requires a broader solution that can apply to any arbitrary number of such flags. In other, other words, something like "[flag]LGBT" should be a non-starter. If you are still suggesting a single character, this thread doesn't affect that suggestion at all. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From shervinafshar at gmail.com Wed Jul 1 11:49:53 2015 From: shervinafshar at gmail.com (Shervin Afshar) Date: Wed, 1 Jul 2015 09:49:53 -0700 Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of Flags) In-Reply-To: References: Message-ID: On Wed, Jul 1, 2015 at 9:20 AM, Noah Slater wrote: > Can someone help me understand what this means for my rainbow flag > proposal? > AFAIK, it's not going to have any effect on what you're proposing. This is a mechanism for flags of sub-regions with ISO 3166-2 codes; e.g. US States, countries and provinces of the UK, Tibet, etc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Wed Jul 1 12:33:45 2015 From: doug at ewellic.org (Doug Ewell) Date: Wed, 01 Jul 2015 10:33:45 -0700 Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of Flags) Message-ID: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net> Shervin Afshar wrote: > This is a mechanism for flags of sub-regions with ISO 3166-2 codes; > e.g. US States, countries and provinces of the UK, Tibet, etc. The Tibet Autonomous Region (CN-54), like other regions in China except Hong Kong and Macao, has no official flag. Although this is what some users might expect, implementing or interpreting "[flag]CN54" as the snow-lion flag, associated with the Free Tibet movement, could be controversial and problematic in the extreme. You know how China is. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From doug at ewellic.org Wed Jul 1 12:38:18 2015 From: doug at ewellic.org (Doug Ewell) Date: Wed, 01 Jul 2015 10:38:18 -0700 Subject: Adding RAINBOW FLAG to Unicode Message-ID: <20150701103818.665a7a7059d7ee80bb4d670165c8327d.d917fb1a04.wbe@email03.secureserver.net> wrote: > Whatever notation that might be added to whatever decision is > ultimately made on this should probably mention historic use of the > rainbow flag by the peace movement. See for example: > > https://en.wikipedia.org/wiki/Peace_flag#Rainbow_flag The colors of the rainbow peace flag (purple on top) are often inverted with respect to the LGBT flag (red on top), making them essentially two different flags. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From shervinafshar at gmail.com Wed Jul 1 12:46:27 2015 From: shervinafshar at gmail.com (Shervin Afshar) Date: Wed, 1 Jul 2015 10:46:27 -0700 Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of Flags) In-Reply-To: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net> References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net> Message-ID: On Wed, Jul 1, 2015 at 10:33 AM, Doug Ewell wrote: > > The Tibet Autonomous Region (CN-54), like other regions in China except > Hong Kong and Macao, has no official flag. > > Although this is what some users might expect, implementing or > interpreting "[flag]CN54" as the snow-lion flag, associated with the > Free Tibet movement, could be controversial and problematic in the > extreme. You know how China is. That's correct. I intentionally used that example as the implementations can decide how do they want to represent "[flag]CN54". Technically it would just be "flag for ISO 3166-2:CN-54". ? Shervin -------------- next part -------------- An HTML attachment was scrubbed... URL: From nslater at tumbolia.org Wed Jul 1 13:38:54 2015 From: nslater at tumbolia.org (Noah Slater) Date: Wed, 01 Jul 2015 18:38:54 +0000 Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of Flags) In-Reply-To: <20150701094525.665a7a7059d7ee80bb4d670165c8327d.036fd1878f.wbe@email03.secureserver.net> References: <20150701094525.665a7a7059d7ee80bb4d670165c8327d.036fd1878f.wbe@email03.secureserver.net> Message-ID: Thanks Doug. On Wed, 1 Jul 2015 at 17:45 Doug Ewell wrote: > > In other, other words, something like "[flag]LGBT" should be a > non-starter. > Followed until this bit. Why would it be a non-starter? > If you are still suggesting a single character, this thread doesn't > affect that suggestion at all. > I don't know enough about how the Consortium functions to understand my best course of action. Looking for advisement on (a) what is most likely to pass UTC muster, and (b) what is most likely to result in rainbow flag emojis being available widely in the near future. -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Wed Jul 1 14:26:44 2015 From: doug at ewellic.org (Doug Ewell) Date: Wed, 01 Jul 2015 12:26:44 -0700 Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of Flags) Message-ID: <20150701122644.665a7a7059d7ee80bb4d670165c8327d.73a24430a0.wbe@email03.secureserver.net> Noah Slater wrote: >> In other, other words, something like "[flag]LGBT" should be a >> non-starter. > > Followed until this bit. Why would it be a non-starter? First, because under the proposal described in the PRI, it would unequivocally stand for "region LG, subdivision BT". As it happens, there is no region LG, so the sequence might simply be ignored as undefined. Second, and more generally, because it would not be part of any sort of structured extension to the geopolitical-entity encoding mechanism. It would provide no orderly path to encoding additional, similar flags for other social groups or causes, including others also focusing on sexuality. It would be strictly ad-hoc. It would rely solely on "this combination of letters isn't in use right now, so let's snag it," which is poor standardization, as Michael pointed out on Monday. Using an ad-hoc "land grab" approach to registering flag tags, how would the following flags be represented? 1. The flag of Chicago 2. The flag of the U.S. Army 3. The flag of ASEAN 4. The Olympic flag 5. The flag of UNICEF 6. The Christian flag 7. The Esperanto flag 8. The Confederate battle flag 9. The Gadsden flag ("Don't Tread On Me") 10. The Jolly Roger (pirate flag of Edward England) 11. The flag of ISIS (ISIL, AQMI, Da'esh) 12. The flag of Germany from 1933 to 1945 (Hint: all of these would have to be eligible, once the doors are opened.) Simply coming up with a combination of letters and digits for each of these that happens to be unused in ISO 3166 won't do. There would have to be something with much better structure and organization. This is my suggestion, anyway. > I don't know enough about how the Consortium functions to understand > my best course of action. Looking for advisement on (a) what is most > likely to pass UTC muster, and (b) what is most likely to result in > rainbow flag emojis being available widely in the near future. Other list participants and/or UTC members will have to help you here. I'm the last one you want to ask about how to get a random emoji into the Unicode Standard. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From verdy_p at wanadoo.fr Wed Jul 1 21:12:53 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Thu, 2 Jul 2015 04:12:53 +0200 Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of Flags) In-Reply-To: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net> References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net> Message-ID: And today's Chinese province ofTibet is different from the historic Tibet, as China incorporated other surrounding areas, including some parts taken from Bhutan (a small part around Legaru, and a larger part to the North) and India (some parts to the West from states of Jammu and Kashmir, which itself is also claimed by Pakistan, and of Uttarakhand, and to the East from Arunachal Pradesh), as well as modifying the internal borders of Chinese provinces of Xinjiang in the nort-west and of Sichuan on the east. The whole new province is still named Tibet but much larger than the historic country of Tibet before its annexion. The Chinese claims in India and Bhutan are contested and is still subject to very active military tensions with India. This question is then more important than only the Tibetan free movement that does not claim anything to India and Bhutan (and in fact these two countries are hosting Tibetan refugees and the Free Tibet movement itself) and do not claim anything in Chinese parts previously part of Sichuan and Xinjiang provinces. China also has border conflicts with Tajiskistan and a small part of Afghanistan to extend its current province of Xinjiang to the West. The international borders of China are then extremely fuzzy. With India and Bhutan, the claims are theorically existing but India has kept its presence. The situation is much less clear however with Jammu and Kashmir (that has its own separatist movement in addition to the Pakistan claims) and is now becoming more critical with Tajikistan and in the troubled area bordering Afghanistan, both areas having autonomist islamic movements in Xinjiang (including now some of them allied with Talebans operating in Afghanistan and Tajikistan since the dissolution of the former USSR: before that dissolution, this was also a region of border conflicts between China and USSR). Now China has also maritime bordering conflicts in the South China Sea from Vietnam to the Philippines, Malaysia and Brunei as China wants to extend its maritime borders to the south to include various small islands. It has also conflicts with Taiwan to the north of that maritime area. Defining the borders of China is really complicate. And this has consequences also on the interpretation of Chinese subdivisions of provinces in ISO 3166-2. I would not associate flags with these official Chinese provinces given that even China does not claim any flag. But I would certainly not use these ISO 3166-2 Chinese subdivisions to associate them with historic regions annexed by China, or claimed by China over other countries (which are still a source of active conflicts and military actions or political tensions by China against Vietnam, Taiwan, the Philippines, Malaysia, Brunei, as well with South Korea and Japan. All countries around China have to protect their borders with China whose power and influence is growing (even in the easternmost part of Russia with an important Chinese community supporting China rather than Russia for the historic conflicts with Japan). We've not seen any sign of stabilization and in fact the number of territorial conflicts is growing, as well as the Chinese military presence in all these bordering regions. Many of these existing countries also have internal troubles since long (e.g. Myanmar, and even Vietnam due to the past wars and military support of China for Northern Vietnam against Southern Vietnam: now Vietnam has a significant Chinese community in its own borders, which could support the Chinese claims in South China Sea). It seems that China wants to create a huge matitime area connecting the maritime roads from Hong Kong to Singapore and new conflicts could appear with Indonesia. 2015-07-01 19:33 GMT+02:00 Doug Ewell : > Shervin Afshar wrote: > > > This is a mechanism for flags of sub-regions with ISO 3166-2 codes; > > e.g. US States, countries and provinces of the UK, Tibet, etc. > > The Tibet Autonomous Region (CN-54), like other regions in China except > Hong Kong and Macao, has no official flag. > > Although this is what some users might expect, implementing or > interpreting "[flag]CN54" as the snow-lion flag, associated with the > Free Tibet movement, could be controversial and problematic in the > extreme. You know how China is. > > -- > Doug Ewell | http://ewellic.org | Thornton, CO ???? > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Thu Jul 2 00:16:41 2015 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Thu, 2 Jul 2015 07:16:41 +0200 Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of Flags) In-Reply-To: References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net> Message-ID: *?Please take political discussions elsewhere; they do not belong on this list.* The point about the boundaries of regions changing over time, and flags being associated with a former set of boundaries could have been made in a few sentences. Not only would it have avoided politics, it would have been more likely that people would actually read it (the likelihood being inversely proportional to the length). Mark *? Il meglio ? l?inimico del bene ?* On Thu, Jul 2, 2015 at 4:12 AM, Philippe Verdy wrote: > And today's Chinese province ofTibet is different from the historic Tibet, > as China incorporated other surrounding areas, including some parts taken > from Bhutan (a small part around Legaru, and a larger part to the North) > and India (some parts to the West from states of Jammu and Kashmir, which > itself is also claimed by Pakistan, and of Uttarakhand, and to the East > from Arunachal Pradesh), as well as modifying the internal borders of > Chinese provinces of Xinjiang in the nort-west and of Sichuan on the east. > The whole new province is still named Tibet but much larger than the > historic country of Tibet before its annexion. > > The Chinese claims in India and Bhutan are contested and is still subject > to very active military tensions with India. This question is then more > important than only the Tibetan free movement that does not claim anything > to India and Bhutan (and in fact these two countries are hosting Tibetan > refugees and the Free Tibet movement itself) and do not claim anything in > Chinese parts previously part of Sichuan and Xinjiang provinces. > > China also has border conflicts with Tajiskistan and a small part of > Afghanistan to extend its current province of Xinjiang to the West. The > international borders of China are then extremely fuzzy. With India and > Bhutan, the claims are theorically existing but India has kept its > presence. The situation is much less clear however with Jammu and Kashmir > (that has its own separatist movement in addition to the Pakistan claims) > and is now becoming more critical with Tajikistan and in the troubled area > bordering Afghanistan, both areas having autonomist islamic movements in > Xinjiang (including now some of them allied with Talebans operating in > Afghanistan and Tajikistan since the dissolution of the former USSR: before > that dissolution, this was also a region of border conflicts between China > and USSR). > > Now China has also maritime bordering conflicts in the South China Sea > from Vietnam to the Philippines, Malaysia and Brunei as China wants to > extend its maritime borders to the south to include various small islands. > It has also conflicts with Taiwan to the north of that maritime area. > > Defining the borders of China is really complicate. And this has > consequences also on the interpretation of Chinese subdivisions of > provinces in ISO 3166-2. I would not associate flags with these official > Chinese provinces given that even China does not claim any flag. But I > would certainly not use these ISO 3166-2 Chinese subdivisions to associate > them with historic regions annexed by China, or claimed by China over other > countries (which are still a source of active conflicts and military > actions or political tensions by China against Vietnam, Taiwan, the > Philippines, Malaysia, Brunei, as well with South Korea and Japan. All > countries around China have to protect their borders with China whose power > and influence is growing (even in the easternmost part of Russia with an > important Chinese community supporting China rather than Russia for the > historic conflicts with Japan). > > We've not seen any sign of stabilization and in fact the number of > territorial conflicts is growing, as well as the Chinese military presence > in all these bordering regions. Many of these existing countries also have > internal troubles since long (e.g. Myanmar, and even Vietnam due to the > past wars and military support of China for Northern Vietnam against > Southern Vietnam: now Vietnam has a significant Chinese community in its > own borders, which could support the Chinese claims in South China Sea). It > seems that China wants to create a huge matitime area connecting the > maritime roads from Hong Kong to Singapore and new conflicts could appear > with Indonesia. > > 2015-07-01 19:33 GMT+02:00 Doug Ewell : > >> Shervin Afshar wrote: >> >> > This is a mechanism for flags of sub-regions with ISO 3166-2 codes; >> > e.g. US States, countries and provinces of the UK, Tibet, etc. >> >> The Tibet Autonomous Region (CN-54), like other regions in China except >> Hong Kong and Macao, has no official flag. >> >> Although this is what some users might expect, implementing or >> interpreting "[flag]CN54" as the snow-lion flag, associated with the >> Free Tibet movement, could be controversial and problematic in the >> extreme. You know how China is. >> >> -- >> Doug Ewell | http://ewellic.org | Thornton, CO ???? >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Thu Jul 2 04:01:46 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Thu, 2 Jul 2015 11:01:46 +0200 Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of Flags) In-Reply-To: References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net> Message-ID: The political subject is immediately related to the designation of flags and their association to ISO 3166-1 and -2 encoded entities. Even if you don't like it, this is very political and for a standard seeking for stability, I wonder how any flag (directly bound to specific political entities at specific dates and within some boundaries which may be contested) can be related to ISO 3166 and its instability (and the fact that ISO 3166 entities have in fact also no defined borders, so that ISO 3166-2 is just a political point of view from the current ruler of the current ISO 3166-1 entity). All this topic is political. In fact the real flags are not even encoded with RIS, not even for current nations (and there's still a problem to know what is a recognized nation, even when just considering the UN definition. Political entities are defined but with fuzzy borders, they just represent in fact some local governments, not necessarily their lands, people, or cultures, and in some cases they are in exil or not even ruling: their seat in the UN is vacant and they exist only on the paper, but even UN members disagree about which treaty they recognize). Consider the case of Western Sahara (which no longer exists except on the paper as a dependency of Spain that has abandoned it completely) and with two governments competing to control the territory (Morocco controlling most of it, another part claimed by Mauritania then abandonned, another part left without infrastructures, and many refugees left de facto in Mauritania or Algeria). None of the two autorities designate that territory as "Western Sahara". So it no longer exists (and will likely never exist again). The frozen status of Antarctica has not created any new country or territory, even if there's a sort of joint administration: that adminsitration does not suppresses the existing claims (and new claims that have been made since its creation). So this area has no well defined flag and various falgs are used informally plus national flags for each claim and sometimes specific regional flags created ad hoc. The use of RIS for ISO 3166-1 and its limited extension for ISO3166-2 (slightly modified) does not resolve the problem. In really there's still no standard way to encode flags unambiguously and in a stable way. We'd like to have FOTW (Flags of the World) contributors to propose their own scheme. But it will not be compatible with the current RIS solution or the proposed extension. If ever such standard emerges, it will require encoding a new set of characters. An alternative would be to embed an URN (not reencoded) between some pairs of controls (to embed an object by reference) and use that sequence after a White flag symbol with a joiner. The URN scheme being the best long term solution (and preferable to URLs bound to specific servers), but we could in fact a generic URI encapsulation (supporting URNs and URLs). It could be used then for representing various kinds of entities, and then link them to specific forms: flags, banners, flying flag, flag over a person face, micni location maps, "flag maps"... Programs not recognizing the encoded entities would have a very simply way to scan over the encasulated URI representing some an specified objects. OTher programs will recognize some specific URI schemes. RIS will then be something of the past, obsoleted because it was non neutral, politcally and culturally oriented, incomplete, and fundamentally unstable since the begining... For now we just have some set of flags promoted only to support the immediate support for interconnecting propriatary messaging services. But all this came without a correct review of what was really needed. 2015-07-02 7:16 GMT+02:00 Mark Davis ?? : > *?Please take political discussions elsewhere; they do not belong on this > list.* > > The point about the boundaries of regions changing over time, and flags > being associated with a former set of boundaries could have been made in a > few sentences. Not only would it have avoided politics, it would have been > more likely that people would actually read it (the likelihood being > inversely proportional to the length). > > > Mark > > *? Il meglio ? l?inimico del bene ?* > > On Thu, Jul 2, 2015 at 4:12 AM, Philippe Verdy wrote: > >> And today's Chinese province ofTibet is different from the historic >> Tibet, as China incorporated other surrounding areas, including some parts >> taken from Bhutan (a small part around Legaru, and a larger part to the >> North) and India (some parts to the West from states of Jammu and Kashmir, >> which itself is also claimed by Pakistan, and of Uttarakhand, and to the >> East from Arunachal Pradesh), as well as modifying the internal borders of >> Chinese provinces of Xinjiang in the nort-west and of Sichuan on the east. >> The whole new province is still named Tibet but much larger than the >> historic country of Tibet before its annexion. >> >> The Chinese claims in India and Bhutan are contested and is still subject >> to very active military tensions with India. This question is then more >> important than only the Tibetan free movement that does not claim anything >> to India and Bhutan (and in fact these two countries are hosting Tibetan >> refugees and the Free Tibet movement itself) and do not claim anything in >> Chinese parts previously part of Sichuan and Xinjiang provinces. >> >> China also has border conflicts with Tajiskistan and a small part of >> Afghanistan to extend its current province of Xinjiang to the West. The >> international borders of China are then extremely fuzzy. With India and >> Bhutan, the claims are theorically existing but India has kept its >> presence. The situation is much less clear however with Jammu and Kashmir >> (that has its own separatist movement in addition to the Pakistan claims) >> and is now becoming more critical with Tajikistan and in the troubled area >> bordering Afghanistan, both areas having autonomist islamic movements in >> Xinjiang (including now some of them allied with Talebans operating in >> Afghanistan and Tajikistan since the dissolution of the former USSR: before >> that dissolution, this was also a region of border conflicts between China >> and USSR). >> >> Now China has also maritime bordering conflicts in the South China Sea >> from Vietnam to the Philippines, Malaysia and Brunei as China wants to >> extend its maritime borders to the south to include various small islands. >> It has also conflicts with Taiwan to the north of that maritime area. >> >> Defining the borders of China is really complicate. And this has >> consequences also on the interpretation of Chinese subdivisions of >> provinces in ISO 3166-2. I would not associate flags with these official >> Chinese provinces given that even China does not claim any flag. But I >> would certainly not use these ISO 3166-2 Chinese subdivisions to associate >> them with historic regions annexed by China, or claimed by China over other >> countries (which are still a source of active conflicts and military >> actions or political tensions by China against Vietnam, Taiwan, the >> Philippines, Malaysia, Brunei, as well with South Korea and Japan. All >> countries around China have to protect their borders with China whose power >> and influence is growing (even in the easternmost part of Russia with an >> important Chinese community supporting China rather than Russia for the >> historic conflicts with Japan). >> >> We've not seen any sign of stabilization and in fact the number of >> territorial conflicts is growing, as well as the Chinese military presence >> in all these bordering regions. Many of these existing countries also have >> internal troubles since long (e.g. Myanmar, and even Vietnam due to the >> past wars and military support of China for Northern Vietnam against >> Southern Vietnam: now Vietnam has a significant Chinese community in its >> own borders, which could support the Chinese claims in South China Sea). It >> seems that China wants to create a huge matitime area connecting the >> maritime roads from Hong Kong to Singapore and new conflicts could appear >> with Indonesia. >> >> 2015-07-01 19:33 GMT+02:00 Doug Ewell : >> >>> Shervin Afshar wrote: >>> >>> > This is a mechanism for flags of sub-regions with ISO 3166-2 codes; >>> > e.g. US States, countries and provinces of the UK, Tibet, etc. >>> >>> The Tibet Autonomous Region (CN-54), like other regions in China except >>> Hong Kong and Macao, has no official flag. >>> >>> Although this is what some users might expect, implementing or >>> interpreting "[flag]CN54" as the snow-lion flag, associated with the >>> Free Tibet movement, could be controversial and problematic in the >>> extreme. You know how China is. >>> >>> -- >>> Doug Ewell | http://ewellic.org | Thornton, CO ???? >>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charupdate at orange.fr Thu Jul 2 04:29:06 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Thu, 2 Jul 2015 11:29:06 +0200 (CEST) Subject: WORD JOINER vs ZWNBSP In-Reply-To: <20150630194129.GA16879@khaled-laptop> References: <552516479.6107.1435315719474.JavaMail.www@wwinf2229> <20150626110243.GB18139@ebed.etf.cuni.cz> <2104451852.9023.1435654939028.JavaMail.www@wwinf1m18> <20150630194129.GA16879@khaled-laptop> Message-ID: <1413925467.11206.1435829346235.JavaMail.www@wwinf1m18> On Tue, Jun 30, 2015, Khaled Hosny wrote: > On Tue, Jun 30, 2015 at 11:02:18AM +0200, Marcel Schneider wrote: > > On Sun, Jun 28, 2015, Peter Constable > > wrote: > > > > > Marcel: Can you please clarify in what way Windows 7 is not supporting U+2060. > > > > On my netbook, which is running Windows 7 Starter, U+2060 is not a > > part of any of the shipped fonts. > > It is a control character, it does not need to have a glyph in the font > to be properly supported. As Doug explained us, this is true and false because there are three fonts shipped with Windows' full version where U+2060 is a part of, and all other fonts are bugging about U+2060. However, that too is only an application issue, and Hosny's advice is true for OpenOffice and LibreOffice, if my test results are accurate (please refer to the e-mail I sent just before). The issue about WORD JOINER vs ZWNBSP is resolved in conformance with Unicode recommendations at the condition that the preferred word processor is LibreOffice Writer, or OpenOffice Writer, but not Microsoft Offfice Word. This results from three facts: 1 The WJ is displayed with zero width and with a visible mark (resembling to that of NBSP) in OpenOffice/LibreOffice: [screenshot] 2 The WJ works with whatever font is selected (here, Aharoni). ? 3 No format character is destroyed by OpenOffice/LibreOffice at conversion to plain text (pasting into a text editor). ? This is why, actually, users must switch between applications depending on the actual task and the characters used. Sticking with an application we are used to, would then be a counter-productive error. ? About the WJ being a control character, I would add that it is of general category Cf, which in actual terms is Other (Format), while control characters belong to Cc, named Other (Control). The difference may be slight and a mere terminology topic, but given the bad handling of some format characters by the world's most used word processors, I guess there must be something to be changed. Perhaps the WJ has been forgotten, on the idea that it's only a control. In the case that the WJ has purposely been poorly implemented on Word, that may be to prevent people from using Word for what they should use Publisher. However, I believe that WJs being a part of plain text, they should be properly supported on all text handling applications. And they should be on the keyboard. ? The solution I suggest is therefore to have the word joiner (and the sequences containing it) on Ctrl+Alt or Kana, and the zero width no-break space on Shift+Ctrl+Alt or Shift+Kana, so that users working efficently on good software may access the preferred character a bit easier than users who must use the deprecated character because their word processor does not properly support the preferred one. ? I'm sorry to have asked Unicode to remove the recommendation for U+2060. i'm accustomed to Microsoft's word processor, where I've got my huge autoexpand list. (This is written *without* autoexpand.) And I hadn't already tested that on OpenOffice/LibreOffice. Now, that's done. ? Regards, Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: From charupdate at orange.fr Thu Jul 2 05:22:30 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Thu, 2 Jul 2015 12:22:30 +0200 (CEST) Subject: WORD JOINER vs ZWNBSP Message-ID: <2143588033.13070.1435832550669.JavaMail.www@wwinf1m18> I'm sorry of the name mistake in this mail (it's corrected below) and got aware of a number of problems with sending secreenshots. As I just learned that links are preferred for images, I posted them on Postimage. ? On Tue, Jun 30, 2015, Khaled Hosny wrote: > On Tue, Jun 30, 2015 at 11:02:18AM +0200, Marcel Schneider wrote: > > On Sun, Jun 28, 2015, Peter Constable > > wrote: > > > > > Marcel: Can you please clarify in what way Windows 7 is not supporting U+2060. > > > > On my netbook, which is running Windows 7 Starter, U+2060 is not a > > part of any of the shipped fonts. > > It is a control character, it does not need to have a glyph in the font > to be properly supported. As Doug explained us, this is true and false because there are three fonts shipped with Windows' full version where U+2060 is a part of, and all other fonts are bugging about U+2060. However, that too is only an application issue, and Khaled's advice is true for OpenOffice and LibreOffice, if my test results are accurate (please refer to the e-mail I sent just before). The issue about WORD JOINER vs ZWNBSP is resolved in conformance with Unicode recommendations at the condition that the preferred word processor is LibreOffice Writer, or OpenOffice Writer, but not Microsoft Offfice Word. This results from three facts: 1 The WJ is displayed with zero width and with a visible mark (resembling to that of NBSP) in OpenOffice/LibreOffice: http://s24.postimg.org/5ujkak28l/screen_m_2015_07_02_04_08.jpg 2 The WJ works with whatever font is selected (here, Aharoni). ? 3 No format character is destroyed by OpenOffice/LibreOffice at conversion to plain text (pasting into a text editor). ? This is why, actually, users must switch between applications depending on the actual task and the characters used. Sticking with an application we are used to, would then be a counter-productive error. ? If you wish to view some more screenshots, I'd like to provide these (I switched the UI to English if possible, eventually in LibreOffice Writer): http://s6.postimg.org/mfn27wthd/screen_m_2015_07_02_04_19.jpg http://s6.postimg.org/6wpmasl6p/screen_m_2015_07_02_04_32.png http://s6.postimg.org/bz6y5kugx/screen_m_2015_07_02_04_42.jpg ? ? About the WJ being a control character, I would add that it is of general category Cf, which in actual terms is Other (Format), while control characters belong to Cc, named Other (Control). The difference may be slight and a mere terminology topic, but given the bad handling of some format characters by the world's most used word processors, I guess there must be something to be changed. Perhaps the WJ has been forgotten, on the idea that it's only a control. In the case that the WJ has purposely been poorly implemented on Word, that may be to prevent people from using Word for what they should use Publisher. However, I believe that WJs being a part of plain text, they should be properly supported on all text handling applications. And they should be on the keyboard. ? The solution I suggest is therefore to have the word joiner (and the sequences containing it) on Ctrl+Alt or Kana, and the zero width no-break space on Shift+Ctrl+Alt or Shift+Kana, so that users working efficently on good software may access the preferred character a bit easier than users who must use the deprecated character because their word processor does not properly support the preferred one. ? I'm sorry to have asked Unicode to remove the recommendation for U+2060. i'm accustomed to Microsoft's word processor, where I've got my huge autoexpand list. (This is written *without* autoexpand.) And I hadn't already tested that on OpenOffice/LibreOffice. Now, that's done. ? Regards, Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: From eik at iki.fi Thu Jul 2 06:57:11 2015 From: eik at iki.fi (Erkki I Kolehmainen) Date: Thu, 2 Jul 2015 14:57:11 +0300 Subject: VS: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of Flags) In-Reply-To: References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net> Message-ID: <003401d0b4be$3af16970$b0d43c50$@fi> I cannot but agree with Mark! Thus, please? Sincerely, Erkki L?hett?j?: Unicode [mailto:unicode-bounces at unicode.org] Puolesta Philippe Verdy L?hetetty: 2. hein?kuuta 2015 12:02 Vastaanottaja: Mark Davis ?? Kopio: Doug Ewell; Unicode Mailing List Aihe: Re: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of Flags) The political subject is immediately related to the designation of flags and their association to ISO 3166-1 and -2 encoded entities. Even if you don't like it, this is very political and for a standard seeking for stability, I wonder how any flag (directly bound to specific political entities at specific dates and within some boundaries which may be contested) can be related to ISO 3166 and its instability (and the fact that ISO 3166 entities have in fact also no defined borders, so that ISO 3166-2 is just a political point of view from the current ruler of the current ISO 3166-1 entity). All this topic is political. In fact the real flags are not even encoded with RIS, not even for current nations (and there's still a problem to know what is a recognized nation, even when just considering the UN definition. Political entities are defined but with fuzzy borders, they just represent in fact some local governments, not necessarily their lands, people, or cultures, and in some cases they are in exil or not even ruling: their seat in the UN is vacant and they exist only on the paper, but even UN members disagree about which treaty they recognize). Consider the case of Western Sahara (which no longer exists except on the paper as a dependency of Spain that has abandoned it completely) and with two governments competing to control the territory (Morocco controlling most of it, another part claimed by Mauritania then abandonned, another part left without infrastructures, and many refugees left de facto in Mauritania or Algeria). None of the two autorities designate that territory as "Western Sahara". So it no longer exists (and will likely never exist again). The frozen status of Antarctica has not created any new country or territory, even if there's a sort of joint administration: that adminsitration does not suppresses the existing claims (and new claims that have been made since its creation). So this area has no well defined flag and various falgs are used informally plus national flags for each claim and sometimes specific regional flags created ad hoc. The use of RIS for ISO 3166-1 and its limited extension for ISO3166-2 (slightly modified) does not resolve the problem. In really there's still no standard way to encode flags unambiguously and in a stable way. We'd like to have FOTW (Flags of the World) contributors to propose their own scheme. But it will not be compatible with the current RIS solution or the proposed extension. If ever such standard emerges, it will require encoding a new set of characters. An alternative would be to embed an URN (not reencoded) between some pairs of controls (to embed an object by reference) and use that sequence after a White flag symbol with a joiner. The URN scheme being the best long term solution (and preferable to URLs bound to specific servers), but we could in fact a generic URI encapsulation (supporting URNs and URLs). It could be used then for representing various kinds of entities, and then link them to specific forms: flags, banners, flying flag, flag over a person face, micni location maps, "flag maps"... Programs not recognizing the encoded entities would have a very simply way to scan over the encasulated URI representing some an specified objects. OTher programs will recognize some specific URI schemes. RIS will then be something of the past, obsoleted because it was non neutral, politcally and culturally oriented, incomplete, and fundamentally unstable since the begining... For now we just have some set of flags promoted only to support the immediate support for interconnecting propriatary messaging services. But all this came without a correct review of what was really needed. 2015-07-02 7:16 GMT+02:00 Mark Davis ?? : ?Please take political discussions elsewhere; they do not belong on this list. The point about the boundaries of regions changing over time, and flags being associated with a former set of boundaries could have been made in a few sentences. Not only would it have avoided politics, it would have been more likely that people would actually read it (the likelihood being inversely proportional to the length). Mark ? Il meglio ? l?inimico del bene ? On Thu, Jul 2, 2015 at 4:12 AM, Philippe Verdy wrote: ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From charupdate at orange.fr Thu Jul 2 06:58:40 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Thu, 2 Jul 2015 13:58:40 +0200 (CEST) Subject: WORD JOINER vs ZWNBSP Message-ID: <687953989.15645.1435838320486.JavaMail.www@wwinf1m18> This message contained a screenshot and originally contained several attached screenshots, which prevented it from being forwarded to the List. I removed all and suggest that for screenshots, readers might refer to the links I added in my e-mail I resent today to Khaled Hosny. ? On Tue, Jun 30, 2015, Doug Ewell wrote: > Khaled Hosny wrote: > > >> On my netbook, which is running Windows 7 Starter, U+2060 is not a > >> part of any of the shipped fonts. > > > > It is a control character, it does not need to have a glyph in the > > font to be properly supported. Thank you Khaled, I will respond soon after this. > The problem is the word "supported." Marcel is seeing a visible glyph (a > .notdef box) for what is supposed to be an invisible, zero-width > character, and that is leading him to conclude that Windows doesn't > "support" this character. The .notdef box is exactly what I see sometimes on the Notepad and every time in the Word dialogs when I use U+2060, but in fact, what I see in the document is a particular glyph, representing a tall fullheight empty box with a wide space to its right despite of the font being proportional, and in the Notepad text the same box but without space. Only when I switch the font to the one you indicate below, the word joiner displays correctly on my version of Microsoft Word. Please see the attached screenshots (I wanted to paste them into this e-mail). > On my Win 7 machine at work, when I enter the string "one?two" > ("one\u2060two") and click on either word, both words are selected. That > is exactly what I would expect WJ to do. This works on the built-in > Notepad as well as Notepad++ and BabelPad (but not on GoDaddy's > Web-based email client). The selection with double-click corresponds to what Richard did with the quick cursor move. These phenomena are text processing features which give little evidence on the presence or the absence of word boundaries. So I redid your test but used the search tool, with the "Whole words only" option enabled. This gives an idea of how the application percieves the words as entities, or better said, how developers expect users to expect search results. Well that isn't really a better expression... What I?want to say is that what we see is normally what we are expected to expect. Personally I wouldn't like to get selected only a part of the compound I want most probably to mark up as a whole, nor do you, Doug. This is why a double-click on no matter which spot on the sequence makes this sequence selected as a whole. By contrast, given that we took care to insert word joiners where normally we aren't expected to (because it is sufficient to simply type the words one after each other without anything between, to get them as *one* word), the software engineers expect us to wish to join what must remain a sequence of separate words. Consequently, the built-in search engine will recognize each word as a word for itself. This is where good software deploys its benefits. Some software does not recognize the ZWNBSP or the NBSP (I don't know which one or both) as indicating the presence of a word boundary, and therefore does not work correctly. That depends also on the PDF?conversion tool. Please check the screenshots (I switched the UIs to English wherever possible, that is, on LibreOffice). [This e-mail has been blocked because it contained several attached screenshots. So I resend it without attached images.] > But out of more than 500 fonts on that machine, the only stock Microsoft > fonts that show WJ with zero-width, instead of a .notdef glyph, are > Javanese Text, Myanmar Text, and Segoe UI Symbol. So while it's > inaccurate to extrapolate this to "Microsoft doesn't support WJ," the > font support is definitely lacking. I wish to thank you personally Doug, for this very valuable hint. Effectively, on Microsoft Word 2010 Starter on Widows 7 Starter, the WJ is not correctly displayed unless the font is switched to Segoe UI Symbol (which is the one out of the three that had been shipped with my OS). If the Segoe typeface is not appropriate in the document, we can ask Word to find and replace all istances of U+2060 with the same formatted in Segoe UI Symbol. This may be what Word users are expected to do every time. Even if that isn't really what we expect of a Productivity Suite. Perhaps, or most probably, this problem does not occur in other high-end software, as Microsoft Publisher (needs to be confirmed). But if somebody buys Microsoft Office Premium, or Professional, he should be save from that misfunctioning. As should be everybody using Microsoft software, in fact. > The bit about characters being converted to other characters, of course, > has nothing to do with Windows and everything to do with particular > applications. Based on this hint, I did more tests and found out that for a proper conversion to plain text, any segment including U+00A0, U+FEFF and other format characters, when copied from a document on Microsoft Word, must first be pasted into a LibreOffice document, then copied again and finally pasted into the text editor. I should avoid to vent further about that issue, and I'd better wait for official comments; I simply suppose that there is an algorithm (say, then, as a part of Microsoft Word) detecting where the clipboard item goes to, and eventually destroying the format characters. Guess everybody to what use... Thanks a lot! Marcel [originally one pasted screenshot] ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Thu Jul 2 07:05:44 2015 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Thu, 2 Jul 2015 14:05:44 +0200 Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of Flags) In-Reply-To: References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net> Message-ID: Ok. I wasn't clear enough. Certainly boundaries are political and relevant, as is the fact that they change. What is not relevant is talking about particular country's motivations and actions. Moreover, you insist about writing a tome about this. In other words, TL;DR. Mark *? Il meglio ? l?inimico del bene ?* On Thu, Jul 2, 2015 at 11:01 AM, Philippe Verdy wrote: > The political subject is immediately related to the designation of flags > and their association to ISO 3166-1 and -2 encoded entities. Even if you > don't like it, this is very political and for a standard seeking for > stability, I wonder how any flag (directly bound to specific political > entities at specific dates and within some boundaries which may be > contested) can be related to ISO 3166 and its instability (and the fact > that ISO 3166 entities have in fact also no defined borders, so that ISO > 3166-2 is just a political point of view from the current ruler of the > current ISO 3166-1 entity). > > All this topic is political. In fact the real flags are not even encoded > with RIS, not even for current nations (and there's still a problem to know > what is a recognized nation, even when just considering the UN definition. > Political entities are defined but with fuzzy borders, they just represent > in fact some local governments, not necessarily their lands, people, or > cultures, and in some cases they are in exil or not even ruling: their seat > in the UN is vacant and they exist only on the paper, but even UN members > disagree about which treaty they recognize). > ?...? -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Thu Jul 2 07:20:50 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Thu, 2 Jul 2015 14:20:50 +0200 Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of Flags) In-Reply-To: References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net> Message-ID: It was not just about it but on the fact that nothing is solved and for things that Unicode does not want to support, there should be a better way using existing standards to bind some object with semantics taken from a blind but easily parsable object (here an URI ,without the need to reinvent a way to encode it, just a plain URI just surrounded by a couple of controls). No need then to describe what will be in that URI, it will just need to be interpreted as a unique indentifier within some namespace. With that it will be possible to create catalogs and standardize a few of them. The system will not be limited to geopolitical entities. And nobody will need to support all the namespaces or even to perform any external query to some rogue server delivering malicious content. The URI could still embed a small image using the "data:" URI scheme. Also I criticize the fact of using RIS to decribe a "standard" feature in the UCS, when they will be bound to unstable ISO standards which are already politically biased. RIS was a bad choice the way it was specified, and even its specification does not fully conforms to these ISO standards. 2015-07-02 14:05 GMT+02:00 Mark Davis ?? : > Ok. I wasn't clear enough. Certainly boundaries are political and > relevant, as is the fact that they change. What is not relevant is talking > about particular country's motivations and actions. > > Moreover, you insist about writing a tome about this. In other words, > TL;DR. > > Mark > > *? Il meglio ? l?inimico del bene ?* > > On Thu, Jul 2, 2015 at 11:01 AM, Philippe Verdy > wrote: > >> The political subject is immediately related to the designation of flags >> and their association to ISO 3166-1 and -2 encoded entities. Even if you >> don't like it, this is very political and for a standard seeking for >> stability, I wonder how any flag (directly bound to specific political >> entities at specific dates and within some boundaries which may be >> contested) can be related to ISO 3166 and its instability (and the fact >> that ISO 3166 entities have in fact also no defined borders, so that ISO >> 3166-2 is just a political point of view from the current ruler of the >> current ISO 3166-1 entity). >> >> All this topic is political. In fact the real flags are not even encoded >> with RIS, not even for current nations (and there's still a problem to know >> what is a recognized nation, even when just considering the UN definition. >> Political entities are defined but with fuzzy borders, they just represent >> in fact some local governments, not necessarily their lands, people, or >> cultures, and in some cases they are in exil or not even ruling: their seat >> in the UN is vacant and they exist only on the paper, but even UN members >> disagree about which treaty they recognize). >> > ?...? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nslater at tumbolia.org Thu Jul 2 07:33:03 2015 From: nslater at tumbolia.org (Noah Slater) Date: Thu, 02 Jul 2015 12:33:03 +0000 Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of Flags) In-Reply-To: References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net> Message-ID: Correct me if I'm wrong, but it seems like Philippe's core argument is that geopolitical entities and flags (as a specific instances of a design, in the heraldic sense) are disjoint. And that using geopolitical codes to refer to these designs is inherently unstable. On Thu, 2 Jul 2015 at 13:26 Philippe Verdy wrote: > It was not just about it but on the fact that nothing is solved and for > things that Unicode does not want to support, there should be a better way > using existing standards to bind some object with semantics taken from a > blind but easily parsable object (here an URI ,without the need to reinvent > a way to encode it, just a plain URI just surrounded by a couple of > controls). No need then to describe what will be in that URI, it will just > need to be interpreted as a unique indentifier within some namespace. > With that it will be possible to create catalogs and standardize a few of > them. The system will not be limited to geopolitical entities. And nobody > will need to support all the namespaces or even to perform any external > query to some rogue server delivering malicious content. The URI could > still embed a small image using the "data:" URI scheme. > Also I criticize the fact of using RIS to decribe a "standard" feature in > the UCS, when they will be bound to unstable ISO standards which are > already politically biased. RIS was a bad choice the way it was specified, > and even its specification does not fully conforms to these ISO standards. > > 2015-07-02 14:05 GMT+02:00 Mark Davis ?? : > >> Ok. I wasn't clear enough. Certainly boundaries are political and >> relevant, as is the fact that they change. What is not relevant is talking >> about particular country's motivations and actions. >> >> Moreover, you insist about writing a tome about this. In other words, >> TL;DR. >> >> Mark >> >> *? Il meglio ? l?inimico del bene ?* >> >> On Thu, Jul 2, 2015 at 11:01 AM, Philippe Verdy >> wrote: >> >>> The political subject is immediately related to the designation of flags >>> and their association to ISO 3166-1 and -2 encoded entities. Even if you >>> don't like it, this is very political and for a standard seeking for >>> stability, I wonder how any flag (directly bound to specific political >>> entities at specific dates and within some boundaries which may be >>> contested) can be related to ISO 3166 and its instability (and the fact >>> that ISO 3166 entities have in fact also no defined borders, so that ISO >>> 3166-2 is just a political point of view from the current ruler of the >>> current ISO 3166-1 entity). >>> >>> All this topic is political. In fact the real flags are not even encoded >>> with RIS, not even for current nations (and there's still a problem to know >>> what is a recognized nation, even when just considering the UN definition. >>> Political entities are defined but with fuzzy borders, they just represent >>> in fact some local governments, not necessarily their lands, people, or >>> cultures, and in some cases they are in exil or not even ruling: their seat >>> in the UN is vacant and they exist only on the paper, but even UN members >>> disagree about which treaty they recognize). >>> >> ?...? >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Thu Jul 2 08:04:13 2015 From: doug at ewellic.org (Doug Ewell) Date: Thu, 2 Jul 2015 07:04:13 -0600 Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of Flags) In-Reply-To: References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net> Message-ID: <417C8958513D4476884721AB0881975A@DougEwell> Noah Slater wrote: > Correct me if I'm wrong, but it seems like Philippe's core argument is > that geopolitical entities and flags (as a specific instances of a > design, in the heraldic sense) are disjoint. And that using > geopolitical codes to refer to these designs is inherently unstable. But the only alternative is to encode about 200 discrete emoji for what we think of as "country" flags, plus somewhere between 0 and 5000 for flags of what we think of as "subdivisions." And in the end, when users see these emoji, they will still think "Oh, that's the US flag" or "the French flag" or "the Japanese flag" or whatever. They will still associate them with geopolitical entities. That's the whole purpose of such flags. (Either that or they will associate them with languages, which is far more unstable than anything else being discussed here.) -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From doug at ewellic.org Thu Jul 2 08:12:23 2015 From: doug at ewellic.org (Doug Ewell) Date: Thu, 2 Jul 2015 07:12:23 -0600 Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of Flags) Message-ID: I wrote: > But the only alternative is to encode about 200 discrete emoji [...] Here I am assuming that UTC will not shift gears and approve an "embedded URI" scheme, which sounds way too much like localizable you-know-whats. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From charupdate at orange.fr Thu Jul 2 03:37:17 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Thu, 2 Jul 2015 10:37:17 +0200 (CEST) Subject: WORD JOINER vs ZWNBSP In-Reply-To: <20150630142826.665a7a7059d7ee80bb4d670165c8327d.c8a619afc7.wbe@email03.secureserver.net> References: <20150630142826.665a7a7059d7ee80bb4d670165c8327d.c8a619afc7.wbe@email03.secureserver.net> Message-ID: <1766396455.9008.1435826237456.JavaMail.www@wwinf1m18> On Tue, Jun 30, 2015, Doug Ewell wrote: > Khaled Hosny wrote: > > >> On my netbook, which is running Windows 7 Starter, U+2060 is not a > >> part of any of the shipped fonts. > > > > It is a control character, it does not need to have a glyph in the > > font to be properly supported. Thank you Khaled, I will respond soon after this. > The problem is the word "supported." Marcel is seeing a visible glyph (a > .notdef box) for what is supposed to be an invisible, zero-width > character, and that is leading him to conclude that Windows doesn't > "support" this character. The .notdef box is exactly what I see sometimes on the Notepad and every time in the Word dialogs when I use U+2060, but in fact, what I see in the document is a particular glyph, representing a tall fullheight empty box with a wide space to its right despite of the font being proportional, and in the Notepad text the same box but without space. Only when I switch the font to the one you indicate below, the word joiner displays correctly on my version of Microsoft Word. Please see the attached screenshots (I wanted to paste them into this e-mail). > On my Win 7 machine at work, when I enter the string "one?two" > ("one\u2060two") and click on either word, both words are selected. That > is exactly what I would expect WJ to do. This works on the built-in > Notepad as well as Notepad++ and BabelPad (but not on GoDaddy's > Web-based email client). The selection with double-click corresponds to what Richard did with the quick cursor move. These phenomena are text processing features which give little evidence on the presence or the absence of word boundaries. So I redid your test but used the search tool, with the "Whole words only" option enabled. This gives an idea of how the application percieves the words as entities, or better said, how developers expect users to expect search results. Well that isn't really a better expression... What I?want to say is that what we see is normally what we are expected to expect. Personally I wouldn't like to get selected only a part of the compound I want most probably to mark up as a whole, nor do you, Doug. This is why a double-click on no matter which spot on the sequence makes this sequence selected as a whole. By contrast, given that we took care to insert word joiners where normally we aren't expected to (because it is sufficient to simply type the words one after each other without anything between, to get them as *one* word), the software engineers expect us to wish to join what must remain a sequence of separate words. Consequently, the built-in search engine will recognize each word as a word for itself. This is where good software deploys its benefits. Some software does not recognize the ZWNBSP or the NBSP (I don't know which one or both) as indicating the presence of a word boundary, and therefore does not work correctly. That depends also on the PDF?conversion tool. Please check the screenshots (I switched the UIs to English wherever possible, that is, on LibreOffice). > But out of more than 500 fonts on that machine, the only stock Microsoft > fonts that show WJ with zero-width, instead of a .notdef glyph, are > Javanese Text, Myanmar Text, and Segoe UI Symbol. So while it's > inaccurate to extrapolate this to "Microsoft doesn't support WJ," the > font support is definitely lacking. I wish to thank you personally Doug, for this very valuable hint. Effectively, on Microsoft Word 2010 Starter on Widows 7 Starter, the WJ is not correctly displayed unless the font is switched to Segoe UI Symbol (which is the one out of the three that had been shipped with my OS). If the Segoe typeface is not appropriate in the document, we can ask Word to find and replace all istances of U+2060 with the same formatted in Segoe UI Symbol. This may be what Word users are expected to do every time. Even if that isn't really what we expect of a Productivity Suite. Perhaps, or most probably, this problem does not occur in other high-end software, as Microsoft Publisher (needs to be confirmed). But if somebody buys Microsoft Office Premium, or Professional, he should be save from that misfunctioning. As should be everybody using Microsoft software, in fact. > The bit about characters being converted to other characters, of course, > has nothing to do with Windows and everything to do with particular > applications. Based on this hint, I did more tests and found out that for a proper conversion to plain text, any segment including U+00A0, U+FEFF and other format characters, when copied from a document on Microsoft Word, must first be pasted into a LibreOffice document, then copied again and finally pasted into the text editor. I should avoid to vent further about that issue, and I'd better wait for official comments; I simply suppose that there is an algorithm (say, then, as a part of Microsoft Word) detecting where the clipboard item goes to, and eventually destroying the format characters. Guess everybody to what use... Thanks a lot! Marcel ? ? ? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: screen m 2015-07-02 04.08.jpg Type: image/jpeg Size: 156419 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: screen m 2015-07-02 04.59.jpg Type: image/jpeg Size: 150875 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: screen m 2015-07-02 04.32.jpg Type: image/jpeg Size: 200880 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: screen m 2015-07-02 04.42.jpg Type: image/jpeg Size: 126705 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: screen m 2015-07-02 05.08.jpg Type: image/jpeg Size: 197542 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: screen m 2015-07-02 05.21.png Type: image/png Size: 90615 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: screen m 2015-07-02 04.19.jpg Type: image/jpeg Size: 176376 bytes Desc: not available URL: From charupdate at orange.fr Thu Jul 2 04:39:54 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Thu, 2 Jul 2015 11:39:54 +0200 (CEST) Subject: WORD JOINER vs ZWNBSP Message-ID: <1396690673.11696.1435829994863.JavaMail.www@wwinf1m18> On Tue, Jun 30, 2015, Doug Ewell wrote: > Khaled Hosny wrote: > > >> On my netbook, which is running Windows 7 Starter, U+2060 is not a > >> part of any of the shipped fonts. > > > > It is a control character, it does not need to have a glyph in the > > font to be properly supported. Thank you Khaled, I will respond soon after this. > The problem is the word "supported." Marcel is seeing a visible glyph (a > .notdef box) for what is supposed to be an invisible, zero-width > character, and that is leading him to conclude that Windows doesn't > "support" this character. The .notdef box is exactly what I see sometimes on the Notepad and every time in the Word dialogs when I use U+2060, but in fact, what I see in the document is a particular glyph, representing a tall fullheight empty box with a wide space to its right despite of the font being proportional, and in the Notepad text the same box but without space. Only when I switch the font to the one you indicate below, the word joiner displays correctly on my version of Microsoft Word. Please see the attached screenshots (I wanted to paste them into this e-mail). > On my Win 7 machine at work, when I enter the string "one?two" > ("one\u2060two") and click on either word, both words are selected. That > is exactly what I would expect WJ to do. This works on the built-in > Notepad as well as Notepad++ and BabelPad (but not on GoDaddy's > Web-based email client). The selection with double-click corresponds to what Richard did with the quick cursor move. These phenomena are text processing features which give little evidence on the presence or the absence of word boundaries. So I redid your test but used the search tool, with the "Whole words only" option enabled. This gives an idea of how the application percieves the words as entities, or better said, how developers expect users to expect search results. Well that isn't really a better expression... What I?want to say is that what we see is normally what we are expected to expect. Personally I wouldn't like to get selected only a part of the compound I want most probably to mark up as a whole, nor do you, Doug. This is why a double-click on no matter which spot on the sequence makes this sequence selected as a whole. By contrast, given that we took care to insert word joiners where normally we aren't expected to (because it is sufficient to simply type the words one after each other without anything between, to get them as *one* word), the software engineers expect us to wish to join what must remain a sequence of separate words. Consequently, the built-in search engine will recognize each word as a word for itself. This is where good software deploys its benefits. Some software does not recognize the ZWNBSP or the NBSP (I don't know which one or both) as indicating the presence of a word boundary, and therefore does not work correctly. That depends also on the PDF?conversion tool. Please check the screenshots (I switched the UIs to English wherever possible, that is, on LibreOffice). [This e-mail has been blocked because it contained several attached screenshots. So I resend it without attached images.] > But out of more than 500 fonts on that machine, the only stock Microsoft > fonts that show WJ with zero-width, instead of a .notdef glyph, are > Javanese Text, Myanmar Text, and Segoe UI Symbol. So while it's > inaccurate to extrapolate this to "Microsoft doesn't support WJ," the > font support is definitely lacking. I wish to thank you personally Doug, for this very valuable hint. Effectively, on Microsoft Word 2010 Starter on Widows 7 Starter, the WJ is not correctly displayed unless the font is switched to Segoe UI Symbol (which is the one out of the three that had been shipped with my OS). If the Segoe typeface is not appropriate in the document, we can ask Word to find and replace all istances of U+2060 with the same formatted in Segoe UI Symbol. This may be what Word users are expected to do every time. Even if that isn't really what we expect of a Productivity Suite. Perhaps, or most probably, this problem does not occur in other high-end software, as Microsoft Publisher (needs to be confirmed). But if somebody buys Microsoft Office Premium, or Professional, he should be save from that misfunctioning. As should be everybody using Microsoft software, in fact. > The bit about characters being converted to other characters, of course, > has nothing to do with Windows and everything to do with particular > applications. Based on this hint, I did more tests and found out that for a proper conversion to plain text, any segment including U+00A0, U+FEFF and other format characters, when copied from a document on Microsoft Word, must first be pasted into a LibreOffice document, then copied again and finally pasted into the text editor. I should avoid to vent further about that issue, and I'd better wait for official comments; I simply suppose that there is an algorithm (say, then, as a part of Microsoft Word) detecting where the clipboard item goes to, and eventually destroying the format characters. Guess everybody to what use... Thanks a lot! Marcel [one pasted screenshot] ? ? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Thu Jul 2 09:54:28 2015 From: doug at ewellic.org (Doug Ewell) Date: Thu, 02 Jul 2015 07:54:28 -0700 Subject: Representing Additional Types of Flags Message-ID: <20150702075428.665a7a7059d7ee80bb4d670165c8327d.f848ab7c97.wbe@email03.secureserver.net> There must be a problem with my browser. When it displays the PRI #299 background document, there is text about using CLDR entities to define regions and subdivisions, to preclude stability problems in ISO 3166-1. Apparently that text doesn't appear on other people's browsers. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From doug at ewellic.org Thu Jul 2 10:07:34 2015 From: doug at ewellic.org (Doug Ewell) Date: Thu, 02 Jul 2015 08:07:34 -0700 Subject: Representing Additional Types of Flags Message-ID: <20150702080734.665a7a7059d7ee80bb4d670165c8327d.ee02e411cb.wbe@email03.secureserver.net> Also posted as formal feedback to the PRI: 6. What is the policy on generating flag tags with unicode_region_subtag values corresponding to private-use BCP 47 subtags, other than those given special semantics by CLDR? Are they invalid or merely discouraged? Should tools allow users to create such a tag? Is there any provision for a "private agreement," similar to that defined in Unicode for PUA usage? -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From mark at macchiato.com Thu Jul 2 11:10:50 2015 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Thu, 2 Jul 2015 18:10:50 +0200 Subject: Representing Additional Types of Flags In-Reply-To: <20150630145719.665a7a7059d7ee80bb4d670165c8327d.06f042790e.wbe@email03.secureserver.net> References: <20150630145719.665a7a7059d7ee80bb4d670165c8327d.06f042790e.wbe@email03.secureserver.net> Message-ID: I'll try to answer a few of these. Mark *? Il meglio ? l?inimico del bene ?* On Tue, Jun 30, 2015 at 11:57 PM, Doug Ewell wrote: > Re-posting my comments and questions on this PRI to the list. I've > already submitted them as formal feedback. > > . > > I support this proposal. I have the following questions: > > 1. The existing RIS-based flag mechanism is based on ISO 3166-1 (TUS 7.0 > ?22.10). In this proposal, "valid" tag sequences would instead be > determined by CLDR data and LDML specification. Is there any precedent > for CLDR to define the validity of Unicode character sequences? > ?We already have, in tr51, the unicode_region_codes being used for validity testing of flags: http://unicode.org/reports/tr51/#Encoding http://unicode.org/reports/tr51/#Flags? ?Those are typically the same as the ISO codes, but do add XK http://unicode.org/reports/tr35/#unicode_region_subtag? > 2. What is the policy on generating flag tags with deprecated > unicode_region_subtag or unicode_subdivision_subtag values, such as > "[flag]UK"? How "discouraged" would such a tag be? Should tools allow > users to create such a tag? > CLDR treats UK as deprecated. When a code is deprecated, we strongly discourage its use in new data, but normally allow it for old data. But the UK is somewhat different, since it really shouldn't ever be valid as it stands. The purpose for UK in CLDR metadata is so that locale ID canonicalization can map en-UK (which occurs quite often) to en-GB, and so on. (We do this also for overlong codes like eng-GB => en-GB.) ? But you're right; we need to be able to distinguish this case (and ones like it.) I filed http://unicode.org/cldr/trac/ticket/8736? ? > > 3. The subdivisions.xml file contains a "subtype" hierarchy, reflecting > the "parent subdivision" relationship in ISO 3166-2. So region 'FR' > contains subdivision 'J' (?le-de-France), which itself contains > subdivision '75' (Paris). Is there any significance to the "subtype" > hierarchy as far as flag tags are concerned, or are "[flag]FRJ" and > "[flag]FR75" equally valid? > ?No, there isn't. But see also E.5 in http://www.unicode.org/review/pri299/pri299-additional-flags-background.html ? > > 4. The entry for "001" in subdivisions.xml contains each of the > two-letter codes for regions (countries) that have their own > subdivisions. This is less than the set of all regions; for example, > Anguilla (AI) does not have ISO 3166-2 subdivisions and so is not > listed. This implies that a tag like "[flag]001US" is valid (and > equivalent to "US" spelled with RIS, which is preferred) but > "[flag]001AI" is not valid. Is this intended? If not, can it be > clarified? > ?Good catch, the 001 shouldn't even exist in the subdivisionContainment. This is now fixed in trunk. (The subdivision addition will only be final in September, so feedback on it now would be great. People can file tickets at http://unicode.org/cldr/trac/newticket ?)? ? > > 5. Will any preliminary examples of CLDR 4-character subdivision codes > be made available before any such codes are actually assigned? > ?The only purpose for the 4-character subdivision codes is stability. So let's suppose that Colorado decides to join Canada (thereby deprecating CO in ISO 3166-2 ), and British Columbia decides to join the US (getting the code CO in ISO 3166-2 ). In that case, CLDR would keep the old code CO (but deprecated) and create a new 4-letter code for BC, such as XXCO. This is just for illustration, of course, I've heard no rumors about either political shift... > . > > The PRI #299 mechanism is clearly and intentionally oriented toward > representing flags of well-defined geopolitical entities. > > Any proposal to extend the mechanism to cover the many other types of > flags -- for historical regions, NGOs, maritime, sports, or social or > political causes -- must be systematic and well-planned, not ad-hoc or > haphazard, to assure interoperability and extensibility. > ?Firmly agreed. ? > > The documentation for the PRI #299 mechanism should state clearly that > (e.g.) the Confederate battle flag, the Olympic flag, the Esperanto > flag, the LGBT rainbow flag, and the naval flags used to spell out > "ENGLAND EXPECTS" can be represented only via a proper extension to the > mechanism, not by ad-hoc means such as the use of unassigned or > private-use combinations. This is at least as important as ensuring the > stable coding of geopolitical flags. > ?Yes, again a good point. > 6. What is the policy on generating flag tags with unicode_region_subtag values corresponding to private-use BCP 47 subtags, other than those given special semantics by CLDR? Are they invalid or merely discouraged? Should tools allow users to create such a tag? Is there any provision for a "private agreement," similar to that defined in Unicode for PUA usage? ?We'll have to address that. My view is that they should not be valid: if someone wants a PU flag, of any source, they have over 130,000 Unicode PU character?s to play with. ? > > > -- > Doug Ewell | http://ewellic.org | Thornton, CO ???? > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenwhistler at att.net Thu Jul 2 12:04:38 2015 From: kenwhistler at att.net (Ken Whistler) Date: Thu, 02 Jul 2015 10:04:38 -0700 Subject: Adding RAINBOW FLAG to Unicode In-Reply-To: References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net> Message-ID: <55956F26.9030607@att.net> On 7/2/2015 2:01 AM, Philippe Verdy wrote: > > The frozen status of Antarctica ... ... will be addressed separately by global warming. But be that as it may... > > In really there's still no standard way to encode flags unambiguously > and in a stable way. We'd like to have FOTW (Flags of the World) > contributors to propose their own scheme. But it will not be > compatible with the current RIS solution or the proposed extension. If > ever such standard emerges, it will require encoding a new set of > characters. The UTC is neither responsible for nor interested in a "standard way to encode flags unambiguously". I suspect one of the reasons this discussion is tending to derail into political topics and too much detail about particular flags and their stability and the stability of geopolitical entities they represent and yadda yadda, is that people seem ineluctably drawn to the misapprehension that this is all about standard encoding of flags. It is not. Rather, it is about a standard way to represent recognizable and interchangeable emoji (colorful little pictographs) of flags, using defined sequences of Unicode characters. The existing mechanism using regional indicator symbol (RIS) pairs was originally aimed at solving the following problems: 1. Enabling the reliable interchange of the legacy 10 flag emoji from Japanese carrier sets. 2. Enabling the completion of the encoding of emoji to cover the rest of the Japanese carrier sets without all progress dragging to a complete halt as national bodies in SC2 would argue interminably over a "standard way to encode flags unambiguously" in an ISO standard. 3. Dealing with the inevitable hue and cry: "China and Japan and the US got their flag! Why can't I get my country's flag??!" And it appears that the RIS mechanism succeeded spectacularly well in addressing all of those design goals. In the middle of last year, for example, there was a major media and internet campaign to "encode the flag of India". Well, the RIS mechanism handled the real issue there just fine -- when the new phones started coming out with support for display and interchange of emoji for flags using the RIS sequences, there was the emoji for the flag of India for everybody to use. Problem solved. And the problem which was solved was /not /the determination that the <1F1EE, 1F1F3> RIS sequence "IN" meant /precisely /the current national flag of India, the saffron, white and green tricolor with the Ashoka Chakra, and *not* any other flag of India (the flag of the Indian army, the flag of the Mughal Empire, the flag of British India, etc.). The RIS sequence "IN" was just mapped to the colorful little emoji glyph for the Indian flag that everybody wanted to interchange. The Unicode Standard is not a vexillology standard -- nor will it ever be. It is a standard for the encoding and interchange of characters. The *character* problem we are faced with here is that people want to use and interchange colorful little emoji pictographs of various flags in text streams. The RIS mechanism addresses a significant part of that problem, but is not extensible to cover the full scope of the demand. And what is the scope of the additional demand? 1. The first part can be summed up as: *the flag of Scotland problem*. In other words, there are a number of high visibility, high demand, widely recognized /regional/ flags that would be interchanged as just more emoji pictographs, if a mechanism for that were available. People who want to use an emoji for the flag of Scotland just as easily as someone can use an emoji for the flag of Great Britain are not going to accept an argument that says, "Well, we can't do that on your phones because there is no 3166-1 country code registered, so we can't map a Scotland flag emoji glyph to a RIS pair." Hence the PRI #299 proposal: for an extension mechanism that would address the flag of Scotland problem in a generic and reasonably stable way. 2. The second part can be summed up as: *the rainbow flag problem*. In other words, there are a number of high visibility, high demand, widely recognized /non-governmental/ flags that would be interchanged as just more emoji pictographs, if a mechanism for that were available. From the public's point of view, this is another no brainer: if the flag of Japan and the flag of Scotland, why not the rainbow flag??! They aren't interested in the limitations of the underlying representation mechanisms, nor should they be, IMO. The problem the UTC faces here is that there are a number of reasonable and popular candidates, which the rainbow flag amply exemplifies, for more colorful little emoji pictographs for flags that people would like to interchange -- but there is no obvious and extensible way to do so reliably in terms of sequences of Unicode characters in a plain text stream. The PRI #299 proposal does not extend into this realm, for many of the reasons pointed out by Doug Ewell. There are a number of potential approaches to address the rainbow flag problem. For example: a. use private-use characters b. pursue one-by-one encoding of each newly desired flag pictograph as a symbol c. extend the unicode_region_subtag and unicode_subdivision_subtag scheme in CLDR to add some new subtag addressing a separate, non-geopolitical hierarchy d. create a separate extension using TAG characters but with a syntax not dependent on CLDR subtag definitions e. create a registry of flag entities suitable for representation as emoji, together with a "c" or "d" style syntax f. something else? g. do nothing (and perhaps hope that stickers will solve the problem) If we are to make any progress here in addressing the actual scope of "the rainbow flag problem", I suggest we focus on the details and pros and cons of suggestions like those of a through g above, rather than pursuing more discussion recapitulating the history of the borders of Tibet -- which truly are out of scope here. --Ken -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Thu Jul 2 12:33:30 2015 From: doug at ewellic.org (Doug Ewell) Date: Thu, 02 Jul 2015 10:33:30 -0700 Subject: Representing Additional Types of Flags Message-ID: <20150702103330.665a7a7059d7ee80bb4d670165c8327d.b24129c345.wbe@email03.secureserver.net> Mark Davis ?? wrote: >> Is there any precedent for CLDR to define the validity of Unicode >> character sequences? > > We already have, in tr51, the unicode_region_codes being used for > validity testing of flags: > http://unicode.org/reports/tr51/#Encoding > http://unicode.org/reports/tr51/#Flags the second of which (Annex B) says: "The valid region sequences are specified by Unicode region subtags as defined in [CLDR], excluding those that are designated private-use or deprecated in [CLDR]." In that case, the wording in TUS needs to be corrected, because TUS 7.0 ?22.10 says: "The regional indicator symbols in the range U+1F1E6..U+1F1FF can be used in pairs to represent an ISO 3166 region code." It doesn't say anything about valid pairs being defined by CLDR instead of ISO. I wonder how many users actually know this. > Those are typically the same as the ISO codes, but do add XK > http://unicode.org/reports/tr35/#unicode_region_subtag So QO, QU, and ZZ would be excluded, since those are private-use in BCP 47 and hence also in CLDR. But XK is included, even though it is also private-use. Is this correct? Can an application tell that XK is in and the others are out, just by looking at CLDR data? Also, I assume all of the same include/exclude rules apply both to RIS combinations and to PRI #299-style flag tags. Please let me know if that's not true. > CLDR treats UK as deprecated. > [...] > But you're right; we need to be able to distinguish this case (and > ones like it.) I filed > http://unicode.org/cldr/trac/ticket/8736 OK, so UK is not valid in RIS combinations or flag tags either. Glad to see that clarified. >> Is there any significance to the "subtype" hierarchy as far as flag >> tags are concerned, or are "[flag]FRJ" and "[flag]FR75" equally >> valid? > > ?No, there isn't. But see also E.5 in > http://www.unicode.org/review/pri299/pri299-additional-flags-background.html Right, clearly flags don't exist for many of the subdivisions. But I'm not sure this is the same question as whether the three-level hierarchy is relevant. In my example, ?le-de-France and Paris both have flags, and they aren't the same. (Wikipedia says the ?le-de-France flag is "non-official and unused," but they do have a page for it, and in any case there are probably better examples.) > The only purpose for the 4-character subdivision codes is stability. > So let's suppose that Colorado decides to join Canada (thereby > deprecating CO in ISO 3166-2), and British Columbia decides to join > the US (getting the code CO in ISO 3166-2). In that case, CLDR would > keep the old code CO (but deprecated) and create a new 4-letter code > for BC, such as XXCO. This is just for illustration, of course, I've > heard no rumors about either political shift... Thanks for the 'XXCO' example; this is different from tending toward 'COXX' and was what I was looking for. The exact scenario would not apply, of course, due to the agreement to keep subdivision codes unique across the US/Canada border. I'd suppose this would be preserved, and 3166-2 would assign US-BC to "British Columbia as US state," and there would be no coding conflict to resolve. But again, additional examples could easily be dreamed up: replace BC with the Central Abaco region of the Bahamas (currently BS-CO), which isn't that far away. >> (private-use flag tags) > > ?We'll have to address that. My view is that they should not be valid: > if someone wants a PU flag, of any source, they have over 130,000 > Unicode PU character?s to play with. I concur, and this is consistent with Annex B. Thanks, -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From mark at macchiato.com Thu Jul 2 12:44:23 2015 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Thu, 2 Jul 2015 19:44:23 +0200 Subject: Adding RAINBOW FLAG to Unicode In-Reply-To: <55956F26.9030607@att.net> References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net> <55956F26.9030607@att.net> Message-ID: To add some information that people like Noah may not be aware of: This email list is an open, public list for arbitrary discussions about Unicode and software internationalization. It is *not* an email list for consortium business?the vast majority of the people on it are *not* members of the Unicode consortium, and are simply expressing their opinions on a particular topic, as individuals. Members of the consortium are *not* necessarily active on this list. Those who are do *not* necessarily engage in every topic. It can be a useful place to talk about possible proposals, but any opinions provided here (or that appear in random blogs, news articles, or change.org petitions) are *not* taken into account by the consortium. Anyone wanting a proposal to be considered *should* submit it via http://unicode.org/reporting.html. Those submissions *are* considered by the relevant technical body in the consortium. People proposing new *emoji* characters *should* read http://unicode.org/reports/tr51/#Selection_Factors beforehand, and follow the guidelines there. Proposals about emoji are directed to the emoji subcommittee, which meets weekly. It makes recommendations to the UTC, which meets quarterly. -------------- next part -------------- An HTML attachment was scrubbed... URL: From leob at mailcom.com Thu Jul 2 12:46:47 2015 From: leob at mailcom.com (Leo Broukhis) Date: Thu, 2 Jul 2015 10:46:47 -0700 Subject: Adding RAINBOW FLAG to Unicode In-Reply-To: <55956F26.9030607@att.net> References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net> <55956F26.9030607@att.net> Message-ID: Why not add another 26 A-Z characters, call them "regional supplementary symbols", and let carriers decide what to encode and how to encode what they want with sequences * to their hearts' content? Leo On Thu, Jul 2, 2015 at 10:04 AM, Ken Whistler wrote: > > On 7/2/2015 2:01 AM, Philippe Verdy wrote: > > > The frozen status of Antarctica ... > > > ... will be addressed separately by global warming. But be that as it may... > > > In really there's still no standard way to encode flags unambiguously and in > a stable way. We'd like to have FOTW (Flags of the World) contributors to > propose their own scheme. But it will not be compatible with the current RIS > solution or the proposed extension. If ever such standard emerges, it will > require encoding a new set of characters. > > > The UTC is neither responsible for nor interested in a "standard way > to encode flags unambiguously". I suspect one of the reasons this > discussion is tending to derail into political topics and too much detail > about particular flags and their stability and the stability of geopolitical > entities they represent and yadda yadda, is that people seem ineluctably > drawn to the misapprehension that this is all about standard encoding > of flags. > > It is not. > > Rather, it is about a standard way to represent recognizable and > interchangeable > emoji (colorful little pictographs) of flags, using defined sequences of > Unicode characters. > > The existing mechanism using regional indicator symbol (RIS) pairs was > originally aimed at solving the following problems: > > 1. Enabling the reliable interchange of the legacy 10 flag emoji from > Japanese > carrier sets. > > 2. Enabling the completion of the encoding of emoji to cover the rest > of the Japanese carrier sets without all progress dragging to a > complete halt as national bodies in SC2 would argue interminably over > a "standard way to encode flags unambiguously" in an ISO standard. > > 3. Dealing with the inevitable hue and cry: "China and Japan and the US got > their flag! > Why can't I get my country's flag??!" > > And it appears that the RIS mechanism succeeded spectacularly well in > addressing all of those design goals. > > In the middle of last year, for example, there was a major media and > internet campaign to "encode the flag of India". Well, the RIS mechanism > handled the real issue there just fine -- when the new phones started > coming out with support for display and interchange of emoji for flags > using the RIS sequences, there was the emoji for the flag of India for > everybody to use. Problem solved. > > And the problem which was solved was not the determination that > the <1F1EE, 1F1F3> RIS sequence "IN" meant precisely the current > national flag of India, the saffron, white and green tricolor with the > Ashoka Chakra, and *not* any other flag of India (the flag of the > Indian army, the flag of the Mughal Empire, the flag of British > India, etc.). The RIS sequence "IN" was just mapped to the colorful > little emoji glyph for the Indian flag that everybody wanted to interchange. > > The Unicode Standard is not a vexillology standard -- nor will it ever be. > It is a standard for the encoding and interchange of characters. > > The *character* problem we are faced with here is that people want > to use and interchange colorful little emoji pictographs of various > flags in text streams. The RIS mechanism addresses a significant > part of that problem, but is not extensible to cover the full scope of the > demand. > > And what is the scope of the additional demand? > > 1. The first part can be summed up as: the flag of Scotland problem. > > In other words, there are a number of high visibility, high demand, > widely recognized regional flags that would be interchanged as just > more emoji pictographs, if a mechanism for that were available. > > People who want to use an emoji for the flag of Scotland just as > easily as someone can use an emoji for the flag of Great Britain > are not going to accept an argument that says, "Well, we can't do > that on your phones because there is no 3166-1 country code registered, so > we can't map a Scotland flag emoji glyph to a RIS pair." > > Hence the PRI #299 proposal: for an extension mechanism that would > address the flag of Scotland problem in a generic and reasonably > stable way. > > 2. The second part can be summed up as: the rainbow flag problem. > > In other words, there are a number of high visibility, high demand, > widely recognized non-governmental flags that would be interchanged > as just more emoji pictographs, if a mechanism for that were available. > > From the public's point of view, this is another no brainer: if the > flag of Japan and the flag of Scotland, why not the rainbow flag??! > They aren't interested in the limitations of the underlying representation > mechanisms, nor should they be, IMO. > > The problem the UTC faces here is that there are a number of > reasonable and popular candidates, which the rainbow flag amply > exemplifies, for more colorful little emoji pictographs for flags that > people would like to interchange -- but there is no obvious and > extensible way to do so reliably in terms of sequences of Unicode > characters in a plain text stream. The PRI #299 proposal does not > extend into this realm, for many of the reasons pointed > out by Doug Ewell. > > There are a number of potential approaches to address the rainbow > flag problem. For example: > > a. use private-use characters > b. pursue one-by-one encoding of each newly desired flag pictograph as a > symbol > c. extend the unicode_region_subtag and unicode_subdivision_subtag > scheme in CLDR to add some new subtag addressing a separate, > non-geopolitical hierarchy > d. create a separate extension using TAG characters but with a > syntax not dependent on CLDR subtag definitions > e. create a registry of flag entities suitable for representation as > emoji, together with a "c" or "d" style syntax > f. something else? > g. do nothing (and perhaps hope that stickers will solve the problem) > > If we are to make any progress here in addressing the actual scope > of "the rainbow flag problem", I suggest we focus on the details and > pros and cons of suggestions like those of a through g above, rather than > pursuing more discussion recapitulating the history of the borders of Tibet > -- > which truly are out of scope here. > > --Ken > > From mark at macchiato.com Thu Jul 2 12:55:53 2015 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Thu, 2 Jul 2015 19:55:53 +0200 Subject: Adding RAINBOW FLAG to Unicode In-Reply-To: References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net> <55956F26.9030607@att.net> Message-ID: Again, that has no advantage over PUA characters. Carriers/vendors can *already* add whatever PUA characters they want to fonts and keyboards. But of course, the problem is interoperability; you send a flag to a friend for your favorite vacation spot, Florida, and the friend sees a flag for New Jersey. Mark *? Il meglio ? l?inimico del bene ?* On Thu, Jul 2, 2015 at 7:46 PM, Leo Broukhis wrote: > Why not add another 26 A-Z characters, call them "regional > supplementary symbols", and let carriers decide what to encode and how > to encode what they want with sequences * to their > hearts' content? > > Leo > > On Thu, Jul 2, 2015 at 10:04 AM, Ken Whistler wrote: > > > > On 7/2/2015 2:01 AM, Philippe Verdy wrote: > > > > > > The frozen status of Antarctica ... > > > > > > ... will be addressed separately by global warming. But be that as it > may... > > > > > > In really there's still no standard way to encode flags unambiguously > and in > > a stable way. We'd like to have FOTW (Flags of the World) contributors to > > propose their own scheme. But it will not be compatible with the current > RIS > > solution or the proposed extension. If ever such standard emerges, it > will > > require encoding a new set of characters. > > > > > > The UTC is neither responsible for nor interested in a "standard way > > to encode flags unambiguously". I suspect one of the reasons this > > discussion is tending to derail into political topics and too much detail > > about particular flags and their stability and the stability of > geopolitical > > entities they represent and yadda yadda, is that people seem ineluctably > > drawn to the misapprehension that this is all about standard encoding > > of flags. > > > > It is not. > > > > Rather, it is about a standard way to represent recognizable and > > interchangeable > > emoji (colorful little pictographs) of flags, using defined sequences of > > Unicode characters. > > > > The existing mechanism using regional indicator symbol (RIS) pairs was > > originally aimed at solving the following problems: > > > > 1. Enabling the reliable interchange of the legacy 10 flag emoji from > > Japanese > > carrier sets. > > > > 2. Enabling the completion of the encoding of emoji to cover the rest > > of the Japanese carrier sets without all progress dragging to a > > complete halt as national bodies in SC2 would argue interminably over > > a "standard way to encode flags unambiguously" in an ISO standard. > > > > 3. Dealing with the inevitable hue and cry: "China and Japan and the US > got > > their flag! > > Why can't I get my country's flag??!" > > > > And it appears that the RIS mechanism succeeded spectacularly well in > > addressing all of those design goals. > > > > In the middle of last year, for example, there was a major media and > > internet campaign to "encode the flag of India". Well, the RIS mechanism > > handled the real issue there just fine -- when the new phones started > > coming out with support for display and interchange of emoji for flags > > using the RIS sequences, there was the emoji for the flag of India for > > everybody to use. Problem solved. > > > > And the problem which was solved was not the determination that > > the <1F1EE, 1F1F3> RIS sequence "IN" meant precisely the current > > national flag of India, the saffron, white and green tricolor with the > > Ashoka Chakra, and *not* any other flag of India (the flag of the > > Indian army, the flag of the Mughal Empire, the flag of British > > India, etc.). The RIS sequence "IN" was just mapped to the colorful > > little emoji glyph for the Indian flag that everybody wanted to > interchange. > > > > The Unicode Standard is not a vexillology standard -- nor will it ever > be. > > It is a standard for the encoding and interchange of characters. > > > > The *character* problem we are faced with here is that people want > > to use and interchange colorful little emoji pictographs of various > > flags in text streams. The RIS mechanism addresses a significant > > part of that problem, but is not extensible to cover the full scope of > the > > demand. > > > > And what is the scope of the additional demand? > > > > 1. The first part can be summed up as: the flag of Scotland problem. > > > > In other words, there are a number of high visibility, high demand, > > widely recognized regional flags that would be interchanged as just > > more emoji pictographs, if a mechanism for that were available. > > > > People who want to use an emoji for the flag of Scotland just as > > easily as someone can use an emoji for the flag of Great Britain > > are not going to accept an argument that says, "Well, we can't do > > that on your phones because there is no 3166-1 country code registered, > so > > we can't map a Scotland flag emoji glyph to a RIS pair." > > > > Hence the PRI #299 proposal: for an extension mechanism that would > > address the flag of Scotland problem in a generic and reasonably > > stable way. > > > > 2. The second part can be summed up as: the rainbow flag problem. > > > > In other words, there are a number of high visibility, high demand, > > widely recognized non-governmental flags that would be interchanged > > as just more emoji pictographs, if a mechanism for that were available. > > > > From the public's point of view, this is another no brainer: if the > > flag of Japan and the flag of Scotland, why not the rainbow flag??! > > They aren't interested in the limitations of the underlying > representation > > mechanisms, nor should they be, IMO. > > > > The problem the UTC faces here is that there are a number of > > reasonable and popular candidates, which the rainbow flag amply > > exemplifies, for more colorful little emoji pictographs for flags that > > people would like to interchange -- but there is no obvious and > > extensible way to do so reliably in terms of sequences of Unicode > > characters in a plain text stream. The PRI #299 proposal does not > > extend into this realm, for many of the reasons pointed > > out by Doug Ewell. > > > > There are a number of potential approaches to address the rainbow > > flag problem. For example: > > > > a. use private-use characters > > b. pursue one-by-one encoding of each newly desired flag pictograph as a > > symbol > > c. extend the unicode_region_subtag and unicode_subdivision_subtag > > scheme in CLDR to add some new subtag addressing a separate, > > non-geopolitical hierarchy > > d. create a separate extension using TAG characters but with a > > syntax not dependent on CLDR subtag definitions > > e. create a registry of flag entities suitable for representation as > > emoji, together with a "c" or "d" style syntax > > f. something else? > > g. do nothing (and perhaps hope that stickers will solve the problem) > > > > If we are to make any progress here in addressing the actual scope > > of "the rainbow flag problem", I suggest we focus on the details and > > pros and cons of suggestions like those of a through g above, rather than > > pursuing more discussion recapitulating the history of the borders of > Tibet > > -- > > which truly are out of scope here. > > > > --Ken > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Thu Jul 2 13:02:44 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Thu, 2 Jul 2015 19:02:44 +0100 Subject: WORD JOINER vs ZWNBSP In-Reply-To: <1766396455.9008.1435826237456.JavaMail.www@wwinf1m18> References: <20150630142826.665a7a7059d7ee80bb4d670165c8327d.c8a619afc7.wbe@email03.secureserver.net> <1766396455.9008.1435826237456.JavaMail.www@wwinf1m18> Message-ID: <20150702190244.789e44af@JRWUBU2> On Thu, 2 Jul 2015 10:37:17 +0200 (CEST) Marcel Schneider wrote: > (because it is > sufficient to simply type the words one after each other without > anything between, to get them as *one* word) This only applies where it is traditional to separate words, a habit the Romans got out of and the Irish revived. Unicode Word Boundary Rule WB4 (in UAX #29 'Unicode Text Segmentation') decrees that U+2060 and U+FEFF be ignored in word-boundary determination except that newline breaks before them and that inserting them between between and creates an extra word boundary. Richard. From leob at mailcom.com Thu Jul 2 13:10:27 2015 From: leob at mailcom.com (Leo Broukhis) Date: Thu, 2 Jul 2015 11:10:27 -0700 Subject: Adding RAINBOW FLAG to Unicode In-Reply-To: References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net> <55956F26.9030607@att.net> Message-ID: With extensible self-delimited regional indicator sequences the carriers will be able to come to an agreement and to petition Unicode to register them as named character sequences symbolizing flags not encoded by an ISO entity, like various rainbow flags, making sure that the format of such sequences is guaranteed not to clash with any existing ISO 3166 format. Also, ISO 3166-2 can have 2 or 3 letters after the dash; it makes sense to have the letters after the dash self-delimited, if/when REGIONAL INDICATOR DASH is added to facilitate encoding of ISO 3166-2 codes. Leo On Thu, Jul 2, 2015 at 10:55 AM, Mark Davis ?? wrote: > Again, that has no advantage over PUA characters. Carriers/vendors can > *already* add whatever PUA characters they want to fonts and keyboards. But > of course, the problem is interoperability; you send a flag to a friend for > your favorite vacation spot, Florida, and the friend sees a flag for New > Jersey. > > > Mark > > ? Il meglio ? l?inimico del bene ? > > On Thu, Jul 2, 2015 at 7:46 PM, Leo Broukhis wrote: >> >> Why not add another 26 A-Z characters, call them "regional >> supplementary symbols", and let carriers decide what to encode and how >> to encode what they want with sequences * to their >> hearts' content? >> >> Leo >> >> On Thu, Jul 2, 2015 at 10:04 AM, Ken Whistler wrote: >> > >> > On 7/2/2015 2:01 AM, Philippe Verdy wrote: >> > >> > >> > The frozen status of Antarctica ... >> > >> > >> > ... will be addressed separately by global warming. But be that as it >> > may... >> > >> > >> > In really there's still no standard way to encode flags unambiguously >> > and in >> > a stable way. We'd like to have FOTW (Flags of the World) contributors >> > to >> > propose their own scheme. But it will not be compatible with the current >> > RIS >> > solution or the proposed extension. If ever such standard emerges, it >> > will >> > require encoding a new set of characters. >> > >> > >> > The UTC is neither responsible for nor interested in a "standard way >> > to encode flags unambiguously". I suspect one of the reasons this >> > discussion is tending to derail into political topics and too much >> > detail >> > about particular flags and their stability and the stability of >> > geopolitical >> > entities they represent and yadda yadda, is that people seem ineluctably >> > drawn to the misapprehension that this is all about standard encoding >> > of flags. >> > >> > It is not. >> > >> > Rather, it is about a standard way to represent recognizable and >> > interchangeable >> > emoji (colorful little pictographs) of flags, using defined sequences of >> > Unicode characters. >> > >> > The existing mechanism using regional indicator symbol (RIS) pairs was >> > originally aimed at solving the following problems: >> > >> > 1. Enabling the reliable interchange of the legacy 10 flag emoji from >> > Japanese >> > carrier sets. >> > >> > 2. Enabling the completion of the encoding of emoji to cover the rest >> > of the Japanese carrier sets without all progress dragging to a >> > complete halt as national bodies in SC2 would argue interminably over >> > a "standard way to encode flags unambiguously" in an ISO standard. >> > >> > 3. Dealing with the inevitable hue and cry: "China and Japan and the US >> > got >> > their flag! >> > Why can't I get my country's flag??!" >> > >> > And it appears that the RIS mechanism succeeded spectacularly well in >> > addressing all of those design goals. >> > >> > In the middle of last year, for example, there was a major media and >> > internet campaign to "encode the flag of India". Well, the RIS mechanism >> > handled the real issue there just fine -- when the new phones started >> > coming out with support for display and interchange of emoji for flags >> > using the RIS sequences, there was the emoji for the flag of India for >> > everybody to use. Problem solved. >> > >> > And the problem which was solved was not the determination that >> > the <1F1EE, 1F1F3> RIS sequence "IN" meant precisely the current >> > national flag of India, the saffron, white and green tricolor with the >> > Ashoka Chakra, and *not* any other flag of India (the flag of the >> > Indian army, the flag of the Mughal Empire, the flag of British >> > India, etc.). The RIS sequence "IN" was just mapped to the colorful >> > little emoji glyph for the Indian flag that everybody wanted to >> > interchange. >> > >> > The Unicode Standard is not a vexillology standard -- nor will it ever >> > be. >> > It is a standard for the encoding and interchange of characters. >> > >> > The *character* problem we are faced with here is that people want >> > to use and interchange colorful little emoji pictographs of various >> > flags in text streams. The RIS mechanism addresses a significant >> > part of that problem, but is not extensible to cover the full scope of >> > the >> > demand. >> > >> > And what is the scope of the additional demand? >> > >> > 1. The first part can be summed up as: the flag of Scotland problem. >> > >> > In other words, there are a number of high visibility, high demand, >> > widely recognized regional flags that would be interchanged as just >> > more emoji pictographs, if a mechanism for that were available. >> > >> > People who want to use an emoji for the flag of Scotland just as >> > easily as someone can use an emoji for the flag of Great Britain >> > are not going to accept an argument that says, "Well, we can't do >> > that on your phones because there is no 3166-1 country code registered, >> > so >> > we can't map a Scotland flag emoji glyph to a RIS pair." >> > >> > Hence the PRI #299 proposal: for an extension mechanism that would >> > address the flag of Scotland problem in a generic and reasonably >> > stable way. >> > >> > 2. The second part can be summed up as: the rainbow flag problem. >> > >> > In other words, there are a number of high visibility, high demand, >> > widely recognized non-governmental flags that would be interchanged >> > as just more emoji pictographs, if a mechanism for that were available. >> > >> > From the public's point of view, this is another no brainer: if the >> > flag of Japan and the flag of Scotland, why not the rainbow flag??! >> > They aren't interested in the limitations of the underlying >> > representation >> > mechanisms, nor should they be, IMO. >> > >> > The problem the UTC faces here is that there are a number of >> > reasonable and popular candidates, which the rainbow flag amply >> > exemplifies, for more colorful little emoji pictographs for flags that >> > people would like to interchange -- but there is no obvious and >> > extensible way to do so reliably in terms of sequences of Unicode >> > characters in a plain text stream. The PRI #299 proposal does not >> > extend into this realm, for many of the reasons pointed >> > out by Doug Ewell. >> > >> > There are a number of potential approaches to address the rainbow >> > flag problem. For example: >> > >> > a. use private-use characters >> > b. pursue one-by-one encoding of each newly desired flag pictograph as a >> > symbol >> > c. extend the unicode_region_subtag and unicode_subdivision_subtag >> > scheme in CLDR to add some new subtag addressing a separate, >> > non-geopolitical hierarchy >> > d. create a separate extension using TAG characters but with a >> > syntax not dependent on CLDR subtag definitions >> > e. create a registry of flag entities suitable for representation as >> > emoji, together with a "c" or "d" style syntax >> > f. something else? >> > g. do nothing (and perhaps hope that stickers will solve the problem) >> > >> > If we are to make any progress here in addressing the actual scope >> > of "the rainbow flag problem", I suggest we focus on the details and >> > pros and cons of suggestions like those of a through g above, rather >> > than >> > pursuing more discussion recapitulating the history of the borders of >> > Tibet >> > -- >> > which truly are out of scope here. >> > >> > --Ken >> > >> > > > From doug at ewellic.org Thu Jul 2 13:59:52 2015 From: doug at ewellic.org (Doug Ewell) Date: Thu, 02 Jul 2015 11:59:52 -0700 Subject: Adding RAINBOW FLAG to Unicode Message-ID: <20150702115952.665a7a7059d7ee80bb4d670165c8327d.532b92d6b9.wbe@email03.secureserver.net> Leo Broukhis wrote: > With extensible self-delimited regional indicator sequences the > carriers will be able to come to an agreement and to petition Unicode > to register them as named character sequences symbolizing flags not > encoded by an ISO entity, like various rainbow flags, making sure that > the format of such sequences is guaranteed not to clash with any > existing ISO 3166 format. There are already plenty of ways for companies and groups and individuals to request new emoji. This way would have the disadvantage of conflating non-regional flags with a coding system for regions, which doesn't seem like a good idea. > Also, ISO 3166-2 can have 2 or 3 letters or 1, or digits or a combination > after the dash; it makes sense to have the letters after the dash > self-delimited, if/when REGIONAL INDICATOR DASH is added to > facilitate encoding of ISO 3166-2 codes. I don't understand the significance of this part. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From doug at ewellic.org Thu Jul 2 14:09:15 2015 From: doug at ewellic.org (Doug Ewell) Date: Thu, 02 Jul 2015 12:09:15 -0700 Subject: Adding RAINBOW FLAG to Unicode Message-ID: <20150702120915.665a7a7059d7ee80bb4d670165c8327d.afc9aa094b.wbe@email03.secureserver.net> Ken Whistler wrote: > The UTC is neither responsible for nor interested in a "standard way > to encode flags unambiguously". > > [...] > > The Unicode Standard is not a vexillology standard -- nor will it ever > be. It is a standard for the encoding and interchange of characters. Even though I continue to believe there *should* be a vexillology standard for encoding flags as unambiguously as practicable, I'm in strong agreement that this is not a Unicode problem, or a character problem, or even a CLDR problem. If there were such a standard today, it might make sense for Unicode and/or CLDR to adapt it for the emoji purposes we are discussing here. But there isn't. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From c933103 at gmail.com Thu Jul 2 14:22:34 2015 From: c933103 at gmail.com (gfb hjjhjh) Date: Fri, 3 Jul 2015 03:22:34 +0800 Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of Flags) In-Reply-To: <417C8958513D4476884721AB0881975A@DougEwell> References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net> <417C8958513D4476884721AB0881975A@DougEwell> Message-ID: As I read, should those flag be versioned when being use?As the curremt implementation sound like those flag would change all over the time, and if people using the emoticon with country X's flag on it to show support for its current government, once the government have been overthrown and the overthrown is internationally recongized with new flags and thus being accepted, then what appear on one's timeline of their social media would have their meaning shifted to the opposing side of their original intention by simply updating their device, and for those who haven't update their device they would see same effect from message written by those who have already updated their devices. a potential way to do it might be adding RIS for number and then append those numbers after alphabetical RIS to show year of start while retaining the unnumbered alphabetical RIS as they are today? 2015?7?2? ??9:09? "Doug Ewell" ??? > Noah Slater wrote: > > Correct me if I'm wrong, but it seems like Philippe's core argument is >> that geopolitical entities and flags (as a specific instances of a >> design, in the heraldic sense) are disjoint. And that using >> geopolitical codes to refer to these designs is inherently unstable. >> > > But the only alternative is to encode about 200 discrete emoji for what we > think of as "country" flags, plus somewhere between 0 and 5000 for flags of > what we think of as "subdivisions." > > And in the end, when users see these emoji, they will still think "Oh, > that's the US flag" or "the French flag" or "the Japanese flag" or > whatever. They will still associate them with geopolitical entities. That's > the whole purpose of such flags. > > (Either that or they will associate them with languages, which is far more > unstable than anything else being discussed here.) > > -- > Doug Ewell | http://ewellic.org | Thornton, CO ???? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From leob at mailcom.com Thu Jul 2 14:33:31 2015 From: leob at mailcom.com (Leo Broukhis) Date: Thu, 2 Jul 2015 12:33:31 -0700 Subject: Adding RAINBOW FLAG to Unicode In-Reply-To: <20150702115952.665a7a7059d7ee80bb4d670165c8327d.532b92d6b9.wbe@email03.secureserver.net> References: <20150702115952.665a7a7059d7ee80bb4d670165c8327d.532b92d6b9.wbe@email03.secureserver.net> Message-ID: Currently a sequence of regional indicator symbols is parsed unambiguously by greedily taking pairs of RIS chars and interpreting them as ISO 3166-1 alpha 2 codes. If REGIONAL INDICATOR DASH and REGIONAL INDICATOR digits are added, along with regional supplementary symbols, then sequences * can be parsed unambiguously as ISO 3166-2, whereas + can be parsed as a named sequence signifying a flag of a non-governmental entity (or - as ISO 3166-1 alpha 3, and longer sequences as non-governmental). Leo On Thu, Jul 2, 2015 at 11:59 AM, Doug Ewell wrote: > Leo Broukhis wrote: > >> With extensible self-delimited regional indicator sequences the >> carriers will be able to come to an agreement and to petition Unicode >> to register them as named character sequences symbolizing flags not >> encoded by an ISO entity, like various rainbow flags, making sure that >> the format of such sequences is guaranteed not to clash with any >> existing ISO 3166 format. > > There are already plenty of ways for companies and groups and > individuals to request new emoji. This way would have the disadvantage > of conflating non-regional flags with a coding system for regions, which > doesn't seem like a good idea. > >> Also, ISO 3166-2 can have 2 or 3 letters > > or 1, or digits or a combination > >> after the dash; it makes sense to have the letters after the dash >> self-delimited, if/when REGIONAL INDICATOR DASH is added to >> facilitate encoding of ISO 3166-2 codes. > > I don't understand the significance of this part. > > -- > Doug Ewell | http://ewellic.org | Thornton, CO ???? > > From kenwhistler at att.net Thu Jul 2 14:57:23 2015 From: kenwhistler at att.net (Ken Whistler) Date: Thu, 02 Jul 2015 12:57:23 -0700 Subject: Adding RAINBOW FLAG to Unicode In-Reply-To: References: <20150702115952.665a7a7059d7ee80bb4d670165c8327d.532b92d6b9.wbe@email03.secureserver.net> Message-ID: <559597A3.8090205@att.net> On 7/2/2015 12:33 PM, Leo Broukhis wrote: > If REGIONAL INDICATOR DASH and REGIONAL INDICATOR digits are added, > along with regional supplementary symbols, then sequences > * can be parsed unambiguously as ISO 3166-2, > whereas + can be parsed as a named sequence signifying > a flag of a non-governmental entity (or - as ISO > 3166-1 alpha 3, and longer sequences as non-governmental). > > The point of switching to the TAG characters for an extension mechanism beyond what the RIS pairs can handle is that TAG characters for letters *and* digits *and* dash already exist and do not have to be encoded yet again before they could be used. Any proposal that depends on getting agreement to encode and publish some *further* set of meta-characters for representing letters, digits, and ASCII punctuation marks would at this point push out any possible solution to the time frame of Unicode 10.0 (June, 2017). And even that would depend on first coming to agreement that *more* sets of meta-characters for dealing with the same kind of function that TAG characters could already serve would be a good idea. The potential for significant disagreement could push such a solution out even further. Remember that any solution involving encoding more characters with "funny behavior" would need not only to gain consensus in the UTC, but would also have to pass muster in SC2 and pass two formal ballots by the national bodies. You could create an equivalent proposal to what you are suggesting above by simply substituting and for your RID and RSS above -- and you could do it *now*, instead of in 2017. But once we look to TAG characters for an extension mechanism, why mess with the existing RIS pair syntax and break the existing implementations using them? Hence, the direction taken in PRI #399, which suggests an extension syntax based entirely on the TAG characters. --Ken From leob at mailcom.com Thu Jul 2 15:58:22 2015 From: leob at mailcom.com (Leo Broukhis) Date: Thu, 2 Jul 2015 13:58:22 -0700 Subject: Adding RAINBOW FLAG to Unicode In-Reply-To: <559597A3.8090205@att.net> References: <20150702115952.665a7a7059d7ee80bb4d670165c8327d.532b92d6b9.wbe@email03.secureserver.net> <559597A3.8090205@att.net> Message-ID: What I don't like about PRI #399 is its proposing to use default-ignorable characters. On a non-vexillology-aware platform, I'd like to see something informative, albeit not resembling a flag, but indicative of the intention to display a flag, like RIS can be, as opposed to nondescript white flags. Leo On Thu, Jul 2, 2015 at 12:57 PM, Ken Whistler wrote: > > > On 7/2/2015 12:33 PM, Leo Broukhis wrote: >> >> If REGIONAL INDICATOR DASH and REGIONAL INDICATOR digits are added, >> along with regional supplementary symbols, then sequences >> * can be parsed unambiguously as ISO 3166-2, >> whereas + can be parsed as a named sequence signifying >> a flag of a non-governmental entity (or - as ISO >> 3166-1 alpha 3, and longer sequences as non-governmental). >> >> > > The point of switching to the TAG characters for an extension > mechanism beyond what the RIS pairs can handle is that > TAG characters for letters *and* digits *and* dash already exist > and do not have to be encoded yet again before they could be used. > > Any proposal that depends on getting agreement to encode and > publish some *further* set of meta-characters for representing > letters, digits, and ASCII punctuation marks would at this point > push out any possible solution to the time frame of Unicode 10.0 > (June, 2017). And even that would depend on first coming to > agreement that *more* sets of meta-characters for dealing with > the same kind of function that TAG characters could already serve > would be a good idea. The potential for significant disagreement could > push such a solution out even further. Remember that any > solution involving encoding more characters with "funny behavior" > would need not only to gain consensus in the UTC, but would > also have to pass muster in SC2 and pass two formal ballots by > the national bodies. > > You could create an equivalent proposal to what you are suggesting > above by simply substituting and for your > RID and RSS above -- and you could do it *now*, instead of in 2017. > > But once we look to TAG characters for an extension mechanism, > why mess with the existing RIS pair syntax and break the existing > implementations using them? Hence, the direction taken in > PRI #399, which suggests an extension syntax based entirely on > the TAG characters. > > --Ken From doug at ewellic.org Thu Jul 2 15:58:54 2015 From: doug at ewellic.org (Doug Ewell) Date: Thu, 02 Jul 2015 13:58:54 -0700 Subject: [OT] Versioning flags (was: Re: Adding RAINBOW FLAG to Unicode) Message-ID: <20150702135854.665a7a7059d7ee80bb4d670165c8327d.0c4b4865bb.wbe@email03.secureserver.net> gfb hjjhjh wrote: > As I read, should those flag be versioned when being use?As the > curremt implementation sound like those flag would change all over the > time, and if people using the emoticon with country X's flag on it to > show support for its current government, once the government have been > overthrown Or not: http://www.newfijiflag.com > and the overthrown is internationally recongized with new flags and > thus being accepted, then what appear on one's timeline of their > social media would have their meaning shifted to the opposing side of > their original intention by simply updating their device, and for > those who haven't update their device they would see same effect from > message written by those who have already updated their devices. a > potential way to do it might be adding RIS for number and then append > those numbers after alphabetical RIS to show year of start while > retaining the unnumbered alphabetical RIS as they are today? This would be a great reason to remind users of emoji flags not to try to use them to indicate an "intention" that can have an "opposing side," such as loyalty or support for a particular political party or government. They aren't for that. A proper coding standard for flags (NOT in scope for Unicode) might have this sort of versioning feature, but even then, I would think the default (unversioned) behavior should be to select the "current" flag, whatever that is. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From gwalla at gmail.com Thu Jul 2 16:59:37 2015 From: gwalla at gmail.com (Garth Wallace) Date: Thu, 2 Jul 2015 14:59:37 -0700 Subject: Adding RAINBOW FLAG to Unicode In-Reply-To: <20150702120915.665a7a7059d7ee80bb4d670165c8327d.afc9aa094b.wbe@email03.secureserver.net> References: <20150702120915.665a7a7059d7ee80bb4d670165c8327d.afc9aa094b.wbe@email03.secureserver.net> Message-ID: On Thu, Jul 2, 2015 at 12:09 PM, Doug Ewell wrote: > Ken Whistler wrote: > >> The UTC is neither responsible for nor interested in a "standard way >> to encode flags unambiguously". >> >> [...] >> >> The Unicode Standard is not a vexillology standard -- nor will it ever >> be. It is a standard for the encoding and interchange of characters. > > Even though I continue to believe there *should* be a vexillology > standard for encoding flags as unambiguously as practicable, I'm in > strong agreement that this is not a Unicode problem, or a character > problem, or even a CLDR problem. > > If there were such a standard today, it might make sense for Unicode > and/or CLDR to adapt it for the emoji purposes we are discussing here. > But there isn't. Tangentially, I recently ran across something called International Flag Identification Symbols. It's a symbolic notation for vexillology that describes their use of flags and some aspects of their design but not enough to reproduce them. They're described on the Flags of the World site and the usage symbols at least are used inline with text on that site, e.g. in the article on German flags and as a quotation from a reference in the article on Guinea-Bissau . The site uses small black & white GIFs but there are apparently a couple of TrueType fonts that put those symbols in the PUA. From petercon at microsoft.com Thu Jul 2 19:56:32 2015 From: petercon at microsoft.com (Peter Constable) Date: Fri, 3 Jul 2015 00:56:32 +0000 Subject: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of Flags) In-Reply-To: <003401d0b4be$3af16970$b0d43c50$@fi> References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net> <003401d0b4be$3af16970$b0d43c50$@fi> Message-ID: Erkki, in this case, I think Philippe is making valid points. - For the proposal to be workable requires some means of ensuring stability of encoded representations. The way this would be done would be for CLDR to provide data with all valid sequences --- effectively becoming a registry. - The concepts being denoted are inherently political, often unstable, and sometimes highly sensitive. Sensitive issues aside, a better approach would be to have a URN tagging scheme --- which IMO begs the question why this is a Unicode topic as it clearly crosses outside the limits of plain text. Sensitive issues considered, though, it begs the question as to whether Unicode should be considering any of this at all, no matter what the scheme for encoded representation may be. Someone helpfully reminded us of this: >> [...] the UTC does not wish to entertain further proposals for >> encoding of symbol characters for flags, whether national, state, >> regional, international, or otherwise. References to UTC Minutes: >> [134-C2], January 28, 2013. Peter From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Erkki I Kolehmainen Sent: Thursday, July 2, 2015 5:42 PM To: verdy_p at wanadoo.fr; 'Mark Davis ??' Cc: 'Doug Ewell'; 'Unicode Mailing List' Subject: VS: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of Flags) I cannot but agree with Mark! Thus, please? Sincerely, Erkki L?hett?j?: Unicode [mailto:unicode-bounces at unicode.org] Puolesta Philippe Verdy L?hetetty: 2. hein?kuuta 2015 12:02 Vastaanottaja: Mark Davis ?? Kopio: Doug Ewell; Unicode Mailing List Aihe: Re: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of Flags) The political subject is immediately related to the designation of flags and their association to ISO 3166-1 and -2 encoded entities. Even if you don't like it, this is very political and for a standard seeking for stability, I wonder how any flag (directly bound to specific political entities at specific dates and within some boundaries which may be contested) can be related to ISO 3166 and its instability (and the fact that ISO 3166 entities have in fact also no defined borders, so that ISO 3166-2 is just a political point of view from the current ruler of the current ISO 3166-1 entity). All this topic is political. In fact the real flags are not even encoded with RIS, not even for current nations (and there's still a problem to know what is a recognized nation, even when just considering the UN definition. Political entities are defined but with fuzzy borders, they just represent in fact some local governments, not necessarily their lands, people, or cultures, and in some cases they are in exil or not even ruling: their seat in the UN is vacant and they exist only on the paper, but even UN members disagree about which treaty they recognize). Consider the case of Western Sahara (which no longer exists except on the paper as a dependency of Spain that has abandoned it completely) and with two governments competing to control the territory (Morocco controlling most of it, another part claimed by Mauritania then abandonned, another part left without infrastructures, and many refugees left de facto in Mauritania or Algeria). None of the two autorities designate that territory as "Western Sahara". So it no longer exists (and will likely never exist again). The frozen status of Antarctica has not created any new country or territory, even if there's a sort of joint administration: that adminsitration does not suppresses the existing claims (and new claims that have been made since its creation). So this area has no well defined flag and various falgs are used informally plus national flags for each claim and sometimes specific regional flags created ad hoc. The use of RIS for ISO 3166-1 and its limited extension for ISO3166-2 (slightly modified) does not resolve the problem. In really there's still no standard way to encode flags unambiguously and in a stable way. We'd like to have FOTW (Flags of the World) contributors to propose their own scheme. But it will not be compatible with the current RIS solution or the proposed extension. If ever such standard emerges, it will require encoding a new set of characters. An alternative would be to embed an URN (not reencoded) between some pairs of controls (to embed an object by reference) and use that sequence after a White flag symbol with a joiner. The URN scheme being the best long term solution (and preferable to URLs bound to specific servers), but we could in fact a generic URI encapsulation (supporting URNs and URLs). It could be used then for representing various kinds of entities, and then link them to specific forms: flags, banners, flying flag, flag over a person face, micni location maps, "flag maps"... Programs not recognizing the encoded entities would have a very simply way to scan over the encasulated URI representing some an specified objects. OTher programs will recognize some specific URI schemes. RIS will then be something of the past, obsoleted because it was non neutral, politcally and culturally oriented, incomplete, and fundamentally unstable since the begining... For now we just have some set of flags promoted only to support the immediate support for interconnecting propriatary messaging services. But all this came without a correct review of what was really needed. 2015-07-02 7:16 GMT+02:00 Mark Davis ?? >: ?Please take political discussions elsewhere; they do not belong on this list. The point about the boundaries of regions changing over time, and flags being associated with a former set of boundaries could have been made in a few sentences. Not only would it have avoided politics, it would have been more likely that people would actually read it (the likelihood being inversely proportional to the length). Mark ? Il meglio ? l?inimico del bene ? On Thu, Jul 2, 2015 at 4:12 AM, Philippe Verdy > wrote: ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From charupdate at orange.fr Fri Jul 3 10:19:13 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Fri, 3 Jul 2015 17:19:13 +0200 (CEST) Subject: WORD JOINER vs ZWNBSP In-Reply-To: <20150702190244.789e44af@JRWUBU2> References: <20150630142826.665a7a7059d7ee80bb4d670165c8327d.c8a619afc7.wbe@email03.secureserver.net> <1766396455.9008.1435826237456.JavaMail.www@wwinf1m18> <20150702190244.789e44af@JRWUBU2> Message-ID: <1720403398.17524.1435936753200.JavaMail.www@wwinf1k36> On Thu, Jul 02, 2015, Richard Wordingham wrote: > On Thu, 2 Jul 2015 10:37:17 +0200 (CEST) > Marcel Schneider wrote: > > > (because it is > > sufficient to simply type the words one after each other without > > anything between, to get them as *one* word) > > This only applies where it is traditional to separate words, a habit > the Romans got out of and the Irish revived. IMHO the case is a bit different in handwritten or engraved text vs word processing. > Unicode Word Boundary Rule WB4 (in UAX #29 'Unicode Text > Segmentation') decrees that U+2060 and U+FEFF be ignored in > word-boundary determination except that newline breaks before them and > that inserting them between between and creates an extra word > boundary. When we look up the set of existing format characters (Cf), the ZWSP, ZWNBSP and WJ fall out of the group in that they are used to detect word boundaries in cases like whole word search and spell checking. (They indicate word boundaries.) This is why, in reality, they are remapped to another category, a practice expressedly allowed by UAX #29. So in fact, the WB4 rule scarcely ever (say, *never*) applies to them. This can be discovered by oneself following the hints given at the very beginning of the UAX #29 content. I believe that UAXes as well as the whole Standard are not here to decree, as Richard calls it, but to promote knowledge and to share a number of useful rules, given in accordance with practice and real needs. Perhaps some sentences are likely to be rewritten for clarification in order to stick even more with reality. Perhaps, too, we should reconsider what we are talking about when using the expression ?word boundary?. This is a bit ambiguous because UIs are designed to meet different needs, and because in English, the apostrophe is often a part of the sequences it is between. If I'm right, U+2019 or U+02BC in _month?s_ is expected to indicate a word boundary, and a search for the whole word _month_ will succeed, while _won?t_ in in the UAX #29 example is *one* word, and searching for a supposed _won_ word makes no sense (and will fail). However, both are selected as a whole by Shift+Ctrl+LEFT/RIGHT ARROW. [For the archive: Please refer to the last month?s thread _A new take on the English apostrophe in Unicode_. About the difference between quick cursor move and double-click select vs "whole word" search, please refer to my previous e-mails.] Definitely, word boundaries are found with a whole word search (see UAX #29, again). Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenwhistler at att.net Fri Jul 3 13:23:42 2015 From: kenwhistler at att.net (Ken Whistler) Date: Fri, 03 Jul 2015 11:23:42 -0700 Subject: Adding RAINBOW FLAG to Unicode In-Reply-To: References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net> <003401d0b4be$3af16970$b0d43c50$@fi> Message-ID: <5596D32E.9030403@att.net> On 7/2/2015 5:56 PM, Peter Constable wrote: > > Erkki, in this case, I think Philippe is making valid points. > > -For the proposal to be workable requires some means of ensuring > stability of encoded representations. The way this would be done would > be for CLDR to provide data with all valid sequences --- effectively > becoming a registry. > I think that is wrong on a couple of grounds. First, detailed stability of reference to actual defined geopolitical entities or particular detailed flag designs is not *required* for proposal to represent *pictographs* of flags by some sequence of Unicode characters to be "workable". Sure, more stability of reference is desirable. But the current RIS pair mechanism for representing flag pictographs for countries is already "workable" -- it works and is widely deployed and widely used -- without having guarantees that some particular country may not decide tomorrow to change its official flag and hence result in some particular pictographic display being obsolete in some sense, for example. Second, the horse is already out of the barn regarding the particular data that CLDR would be referring to. This works by reference to the ISO 3166-2 scheme of subdivisions: https://en.wikipedia.org/wiki/ISO_3166-2 and *that* becomes the registry required for stability of representations, plus whatever grandfathering stability-of-code mechanism BCP 47 adds on top of that. We don't require a further detailed level of registration, I think, to make this workable. If the New Zealand Hawke's Bay Regional Council (NZ-HKB) decided it needed a district flag (or decided to change one it may already have), I'm not going to be overly concerned about the details there. As long as has a stable definition as a Unicode extended flag tag sequence, it is up to somebody else to decide if they want to actually map a Hawke's Bay flag /pictograph /in a font to that sequence -- or update the flag pictograph they may have been using. Yeah, this could be a giant headache for any vendor that felt they had to support *every possible* region/subdivision sequence and keep the exact representations of flag pictographs stable. But I predict this will very, very quickly result in people making a "let's cover the 99% case" set of decisions, and then issues like "Should we display a flag pictograph for the Hawke's Bay Regional Council?" will be dealt with by the normal methods of triage for feature requests. > -The concepts being denoted are inherently political, often unstable, > and sometimes highly sensitive. > > Sensitive issues aside, a better approach would be to have a URN > tagging scheme --- which IMO begs the question why this is a Unicode > topic as it clearly crosses outside the limits of plain text. > A URN tagging scheme might make sense if what we were trying to do was delegating all identity concerns to external authority, and if we didn't care about efficiency of representation, either. I don't think that is what this is about, as I tried to make clear yesterday. I don't think we are encoding *flags* -- we are creating a mechanism for the reliable representation of a set of *pictographs (emoji) for flags*. And those pictographs for flags need an efficient representation that can coexist comfortably with the rest of plain text -- the way the RIS pairs already do. > Sensitive issues considered, though, it begs the question as to > whether Unicode should be considering any of this at all, no matter > what the scheme for encoded representation may be. Someone helpfully > reminded us of this: > > >> [...] the UTC does not wish to entertain further proposals for > > >> encoding of symbol characters for flags, whether national, state, > > >> regional, international, or otherwise. References to UTC Minutes: > > >> [134-C2], January 28, 2013. > I believe that that statement (and the referenced decision) refer specifically to the unwillingness of the UTC to entertain proposals for encoding an indefinite number of pictographs for flags (of whatever variety) *as symbol characters* -- that is, one-by-one encodings as a single, gc=So code point in the standard. Heading that direction is clearly not an efficient way to deal with the concern, and would waste everybody's time in one-by-one proposals and ad hoc decisions for each individual flag pictograph to be added. The UTC has a long history of putting a stake in the ground when it encounters a character encoding problem which requires a *general* solution, rather than a dribbling in of one-off decisions an item at a time. And I think the tag proposal for dealing with the representation of flag pictographs for regional subdivisions shows precisely the kind of generality that we are looking for -- dealing with hundreds of potentially representable entities with a general mechanism, rather than trying to encode them all one-by-one. Incidentally, back to the ostensible topic of this thread -- I don't think the extended flag tag proposal currently addresses the issue of how to represent a pictograph for a rainbow flag. In that case I think a new registry mechanism might in fact make sense -- and I have spelled out details of how one could reasonably work in conjunction with the extended flag tag proposal in feedback submitted on PRI #299. --Ken > Peter > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Fri Jul 3 13:31:43 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Fri, 3 Jul 2015 19:31:43 +0100 Subject: WORD JOINER vs ZWNBSP In-Reply-To: <1720403398.17524.1435936753200.JavaMail.www@wwinf1k36> References: <20150630142826.665a7a7059d7ee80bb4d670165c8327d.c8a619afc7.wbe@email03.secureserver.net> <1766396455.9008.1435826237456.JavaMail.www@wwinf1m18> <20150702190244.789e44af@JRWUBU2> <1720403398.17524.1435936753200.JavaMail.www@wwinf1k36> Message-ID: <20150703193143.1fa823db@JRWUBU2> On Fri, 3 Jul 2015 17:19:13 +0200 (CEST) Marcel Schneider wrote: > On Thu, Jul 02, 2015, Richard Wordingham wrote: > > This only applies where it is traditional to separate words, a habit > > the Romans got out of and the Irish revived. > IMHO the case is a bit different in handwritten or engraved text vs > word processing. For your information, the Thais, Burmese and Cambodians use word processors. Look up line-breaking category SA for modern, mainstream examples of writing systems where words are not separated by spaces or any other character. Richard. From doug at ewellic.org Fri Jul 3 14:50:51 2015 From: doug at ewellic.org (Doug Ewell) Date: Fri, 3 Jul 2015 13:50:51 -0600 Subject: PRI #299 (was: Re: Adding RAINBOW FLAG to Unicode) Message-ID: <52DEAA64A5C54EB8A31C746477DCAAB7@DougEwell> Leo Broukhis wrote: > What I don't like about PRI #399 is its proposing to use default- > ignorable characters. On a non-vexillology-aware platform, I'd like > to see something informative, albeit not resembling a flag, but > indicative of the intention to display a flag, like RIS can be, as > opposed to nondescript white flags. This is just a personal prediction, but I'd guess that once the PRI #299 mechanism hits the streets, U+1F3F3 WAVING WHITE FLAG will be used overwhelmingly for tag sequences and comparatively seldom on its own. When a reader sees ??, it might be relatively safe to assume the writer intended to display a specific flag. I don't know what the original impetus for adding U+1F3F3 was. That might help us predict how popular U+1F3F3 will be on its own. Maybe one of the Emoji Gurus can help out here. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From asmus-inc at ix.netcom.com Fri Jul 3 18:28:43 2015 From: asmus-inc at ix.netcom.com (Asmus Freytag (t)) Date: Fri, 3 Jul 2015 16:28:43 -0700 Subject: PRI #299 (was: Re: Adding RAINBOW FLAG to Unicode) In-Reply-To: <52DEAA64A5C54EB8A31C746477DCAAB7@DougEwell> References: <52DEAA64A5C54EB8A31C746477DCAAB7@DougEwell> Message-ID: <55971AAB.4040203@ix.netcom.com> An HTML attachment was scrubbed... URL: From leob at mailcom.com Fri Jul 3 23:14:07 2015 From: leob at mailcom.com (Leo Broukhis) Date: Fri, 3 Jul 2015 21:14:07 -0700 Subject: PRI #299 (was: Re: Adding RAINBOW FLAG to Unicode) In-Reply-To: <52DEAA64A5C54EB8A31C746477DCAAB7@DougEwell> References: <52DEAA64A5C54EB8A31C746477DCAAB7@DougEwell> Message-ID: On Fri, Jul 3, 2015 at 12:50 PM, Doug Ewell wrote: > Leo Broukhis wrote: > >> What I don't like about PRI #399 is its proposing to use default- >> ignorable characters. On a non-vexillology-aware platform, I'd like >> to see something informative, albeit not resembling a flag, but >> indicative of the intention to display a flag, like RIS can be, as >> opposed to nondescript white flags. > > > This is just a personal prediction, but I'd guess that once the PRI #299 > mechanism hits the streets, U+1F3F3 WAVING WHITE FLAG will be used > overwhelmingly for tag sequences and comparatively seldom on its own. When a > reader sees ??, it might be relatively safe to assume the writer intended to > display a specific flag. But then a reader will have to look at the raw Unicode bytestream to find out *which* specific flag was intended. How convenient is that? Leo From kenwhistler at att.net Fri Jul 3 23:38:16 2015 From: kenwhistler at att.net (Ken Whistler) Date: Fri, 03 Jul 2015 21:38:16 -0700 Subject: PRI #299 In-Reply-To: References: <52DEAA64A5C54EB8A31C746477DCAAB7@DougEwell> Message-ID: <55976338.4010500@att.net> On 7/3/2015 9:14 PM, Leo Broukhis wrote: > On Fri, Jul 3, 2015 at 12:50 PM, Doug Ewell wrote: >> Leo Broukhis wrote: >> >>> What I don't like about PRI #399 is its proposing to use default- >>> ignorable characters. On a non-vexillology-aware platform, I'd like >>> to see something informative, albeit not resembling a flag, but >>> indicative of the intention to display a flag, like RIS can be, as >>> opposed to nondescript white flags. > But then a reader will have to look at the raw Unicode bytestream to > find out *which* specific flag was intended. > How convenient is that? > Ah, but on a "non-vexillology-aware platform", if it is just ignoring all this vexatious trouble of mapping the tag sequences to identifiable flag pictographs, you're just as likely that the fonts/renderers involved won't do anything comprehensible with any new non-default-ignorable metacharacter additions, either -- particularly as they would be Unicode 10.0+ additions to the standard. So the most likely display would end up looking more like: ? ? ? ? ? How convenient is that? --Ken -------------- next part -------------- An HTML attachment was scrubbed... URL: From leob at mailcom.com Fri Jul 3 23:52:52 2015 From: leob at mailcom.com (Leo Broukhis) Date: Fri, 3 Jul 2015 21:52:52 -0700 Subject: PRI #299 In-Reply-To: <55976338.4010500@att.net> References: <52DEAA64A5C54EB8A31C746477DCAAB7@DougEwell> <55976338.4010500@att.net> Message-ID: Most platforms display unknown printable characters as white rectangles with hex digits in them. In Doug's message, I saw a rectangle with 01F in the upper row, and 3F3 in the lower row. Moreover, on any platform when users see unknown characters, they search for a font, install it and are able to see in cleartext at least something they can make sense of. For a RIS or any other non-default-ignorable character on a non-vexillology-aware platform, a font with stylized letters would be sufficient to read the intent of the writer, and, as a free extra, to tell apart Liechtenstein and Haiti without squinting. On Fri, Jul 3, 2015 at 9:38 PM, Ken Whistler wrote: > > > On 7/3/2015 9:14 PM, Leo Broukhis wrote: > > On Fri, Jul 3, 2015 at 12:50 PM, Doug Ewell wrote: > > Leo Broukhis wrote: > > What I don't like about PRI #399 is its proposing to use default- > ignorable characters. On a non-vexillology-aware platform, I'd like > to see something informative, albeit not resembling a flag, but > indicative of the intention to display a flag, like RIS can be, as > opposed to nondescript white flags. > > But then a reader will have to look at the raw Unicode bytestream to > find out *which* specific flag was intended. > How convenient is that? > > > Ah, but on a "non-vexillology-aware platform", if it is just ignoring > all this vexatious trouble of mapping the tag sequences to identifiable > flag pictographs, you're just as likely that the fonts/renderers > involved won't do anything comprehensible with any new > non-default-ignorable metacharacter additions, either -- particularly as > they > would be Unicode 10.0+ additions to the standard. So the most > likely display would end up looking more like: ? ? ? ? ? > > How convenient is that? > > --Ken > > From charupdate at orange.fr Sat Jul 4 10:02:00 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Sat, 4 Jul 2015 17:02:00 +0200 (CEST) Subject: WORD JOINER vs ZWNBSP In-Reply-To: <20150703193143.1fa823db@JRWUBU2> References: <20150630142826.665a7a7059d7ee80bb4d670165c8327d.c8a619afc7.wbe@email03.secureserver.net> <1766396455.9008.1435826237456.JavaMail.www@wwinf1m18> <20150702190244.789e44af@JRWUBU2> <1720403398.17524.1435936753200.JavaMail.www@wwinf1k36> <20150703193143.1fa823db@JRWUBU2> Message-ID: <486299455.13899.1436022120042.JavaMail.www@wwinf1j32> On Fri, Jul 03, 2015, Richard Wordingham wrote: > On Fri, 3 Jul 2015 17:19:13 +0200 (CEST) > Marcel Schneider wrote: > > > On Thu, Jul 02, 2015, Richard Wordingham wrote: > > > > This only applies where it is traditional to separate words, a habit > > > the Romans got out of and the Irish revived. > > > IMHO the case is a bit different in handwritten or engraved text vs > > word processing. > > For your information, the Thais, Burmese and Cambodians use word > processors. Look up line-breaking category SA for modern, mainstream > examples of writing systems where words are not separated by spaces or > any other character. I considered not to reply any more in this unfaithful dialogue, where after bringing up some historic examples to make me think about them, Richard switches back to present and makes people believe I could suppose that any country could prefer the use of other means than what's world standard. I already mentioned in this thread that I do not have any knowledge of Thai, and in another thread, that my scope is *latin* keyboard layouts. Now lets come to the core: Why on earth do we need word boundaries for whole word search in Latin script, while Thai, Burmese and Cambodian scripts Richard mentions as examples, use impl?mentations that can find whole words without any need of "spaces or any other [separating] character"? Best wishes, Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Sat Jul 4 12:13:21 2015 From: doug at ewellic.org (Doug Ewell) Date: Sat, 4 Jul 2015 11:13:21 -0600 Subject: Adding RAINBOW FLAG to Unicode In-Reply-To: References: Message-ID: <4FC92E8938644B21AA3587A879E9378F@DougEwell> Ken Whistler wrote: > But the current RIS pair mechanism for representing flag pictographs > for countries is already "workable" -- it works and is widely deployed > and widely used -- without having guarantees that some particular > country may not decide tomorrow to change its official flag and hence > result in some particular pictographic display being obsolete in some > sense, for example. Which brings up a counterpoint to gfb hjjhjh's earlier point: Suppose a Twitter user wants to use "the emoticon with country X's flag on it to show support for its current government," then the government is overthrown by an enemy which KEEPS the existing flag, forcing the government-in-exile to adopt a different flag? Now, the user who put the existing flag in her tweets appears to be showing support for the enemy. This is what happened in France during World War II, except of course for the emoticon and Twitter and that. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From richard.wordingham at ntlworld.com Sat Jul 4 13:20:05 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sat, 4 Jul 2015 19:20:05 +0100 Subject: WORD JOINER vs ZWNBSP In-Reply-To: <486299455.13899.1436022120042.JavaMail.www@wwinf1j32> References: <20150630142826.665a7a7059d7ee80bb4d670165c8327d.c8a619afc7.wbe@email03.secureserver.net> <1766396455.9008.1435826237456.JavaMail.www@wwinf1m18> <20150702190244.789e44af@JRWUBU2> <1720403398.17524.1435936753200.JavaMail.www@wwinf1k36> <20150703193143.1fa823db@JRWUBU2> <486299455.13899.1436022120042.JavaMail.www@wwinf1j32> Message-ID: <20150704192005.474faf4a@JRWUBU2> On Sat, 4 Jul 2015 17:02:00 +0200 (CEST) Marcel Schneider wrote: > On Fri, Jul 03, 2015, Richard Wordingham wrote: > > > On Fri, 3 Jul 2015 17:19:13 +0200 (CEST) > > Marcel Schneider wrote: > I considered not to reply any more in this unfaithful dialogue, where > after bringing up some historic examples to make me think about them, > Richard switches back to present and makes people believe I could > suppose that any country could prefer the use of other means than > what's world standard. I cannot work out what you think I am making people believe you might suppose. I was pointing out that not everyone uses visible word boundaries. I will also note that people are reluctant to type invisible characters if they don't have immediate benefits. > Now lets come to the core: Why on earth > do we need word boundaries for whole word search in Latin script, > while Thai, Burmese and Cambodian scripts Richard mentions as > examples, use impl?mentations that can find whole words without any > need of "spaces or any other [separating] character"? The Thai and Cambodian implementations are far from perfect, even when applied to the Thai and Cambodian languages. Using a dictionary for the national languages on text of other languages naturally has even worse performance. A quick experiment suggest that for whole word search in Thai, LibreOffice simply ignores any boundaries bwtween Thai word characters. Double click and ctrl/arrow use different rules. It's quite possible that we are misinterpreting the results of whole word searches. One way of implementing whole word search is to do a general search and then check whether the word found is part of a larger word. To do that, one might simply ask whether the characters before and after the string found are permitted in words. One might easily set things up so that by omission U+2060 is not considered part of a word - the code could have been written before U+2060 was assigned and not updated since. Richard. From charupdate at orange.fr Mon Jul 6 06:36:31 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Mon, 6 Jul 2015 13:36:31 +0200 (CEST) Subject: WORD JOINER vs ZWNBSP In-Reply-To: <20150704192005.474faf4a@JRWUBU2> References: <20150630142826.665a7a7059d7ee80bb4d670165c8327d.c8a619afc7.wbe@email03.secureserver.net> <1766396455.9008.1435826237456.JavaMail.www@wwinf1m18> <20150702190244.789e44af@JRWUBU2> <1720403398.17524.1435936753200.JavaMail.www@wwinf1k36> <20150703193143.1fa823db@JRWUBU2> <486299455.13899.1436022120042.JavaMail.www@wwinf1j32> <20150704192005.474faf4a@JRWUBU2> Message-ID: <413346311.9685.1436182591794.JavaMail.www@wwinf1h21> On Sat, Jul 04, 2015, Richard Wordingham wrote?: > I will also note that people are reluctant to type > invisible characters if they don't have immediate benefits. This might be the reason why U+2060 hadn't been properly implemented on the spot on word processors, whose users were supposed not to use it. ?As it has already been pointed out, on my version of Word, U+2060 is font-related, what it should not be, and the fallback isn't well set (nor is is it for U+205D TRICOLON, BTW). ?In the meantime, in typography, where the interest of a word joiner is obvious, other software is used. ?By contrast, later versions of word processing applications, no matter of which software house, would have experienced in-depth changes including text segmentation tailoring. > The Thai and Cambodian implementations are far from perfect, even when > applied to the Thai and Cambodian languages. ?Using a dictionary for > the national languages on text of other languages naturally has even > worse performance. ?A quick experiment suggest that for whole word > search in Thai, LibreOffice simply ignores any boundaries bwtween Thai > word characters. ?Double click and ctrl/arrow use different rules. When Doug Ewell wrote on Tue Jun 30, 2015 that clicking on either part of ?'one\u2060two' selects the whole, I didn't check on my version, taking that as a matter of fact. ?Now I've done and I'm astonished to see *one* part selected only. ?Consequently, between Word 97 (the full version on which Word 2010 Starter is based upon, if I remember well what I've read somewhere) and Word 2010, even the rules for double click and ctrl/arrow must have been changed, to better meet users' needs and expectations. ?From this and some among the bugs having been fixed prior to Word 2013 (I've been told on Microsoft Community), I extrapolate without hasty generalization that Word 2016 could eventually be the performative version I expect since I do word processing. > It's quite possible that we are misinterpreting the results of whole > word searches. ?One way of implementing whole word search is to do a > general search and then check whether the word found is part of a > larger word. ?To do that, one might simply ask whether the > characters before and after the string found are permitted in words. > One might easily set things up so that by omission U+2060 is not > considered part of a word - the code could have been written before > U+2060 was assigned and not updated since. Indeed, perhaps we are dealing with an obsolete behavior. ?I wonder whether Word 2010, which is already overriding U+2060 at word selecting and quick cursor move, does the same at whole word search. ?Personally I'd prefer it did not, because I?believe that this isn't useful. ?So I agree with OpenOffice/LibreOffice (tested version of the latter: 4.2.4.2), that don't. ?Nor does Adobe Reader. By deduction, I'm now supposing that Microsoft Word actually doesn't neither. Thank you for the information about the Thai and Cambodian implementations. ?I?think that it would be correct to prioritize updates for those implementations which "are far from perfect", given that those still exist(!), in order that everybody on earth could come into the benefit of really performative worktools. Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Mon Jul 6 10:18:59 2015 From: doug at ewellic.org (Doug Ewell) Date: Mon, 06 Jul 2015 08:18:59 -0700 Subject: PRI #299 Message-ID: <20150706081859.665a7a7059d7ee80bb4d670165c8327d.1cacf1e31c.wbe@email03.secureserver.net> Leo Broukhis wrote: > Most platforms display unknown printable characters as white > rectangles with hex digits in them. > In Doug's message, I saw a rectangle with 01F in the upper row, and > 3F3 in the lower row. This is a handy feature, at least for character geeks like us, but "most platforms" might be a bit misleading here. There is a rather commonly used platform that starts with the letter W which does not do this. > Moreover, on any platform when users see unknown characters, they > search for a font, install it and are able to see in cleartext at > least something they can make sense of. For a RIS or any other > non-default-ignorable character on a non-vexillology-aware platform, a > font with stylized letters would be sufficient to read the intent of > the writer, and, as a free extra, to tell apart Liechtenstein and > Haiti without squinting. I think a useful bit of feedback on PRI #299 would be to inquire whether it is, in fact, a design goal to handle this use case of transparency of the individual letters on platforms, rendering engines, and/or fonts that don't support flag-tag composition. (Please, not "non-vexillology-aware." None of these platforms studies or analyzes flags. They assemble multiple characters into a single image.) If transparency on flag-tag-unaware platforms is not a design goal, it might be difficult to make the case that default-ignorable tag characters are a poor choice because they don't support transparency. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From doug at ewellic.org Mon Jul 6 10:26:10 2015 From: doug at ewellic.org (Doug Ewell) Date: Mon, 06 Jul 2015 08:26:10 -0700 Subject: Adding RAINBOW FLAG to Unicode Message-ID: <20150706082610.665a7a7059d7ee80bb4d670165c8327d.23bc5880f2.wbe@email03.secureserver.net> Ken Whistler wrote: > Incidentally, back to the ostensible topic of this thread -- I don't > think the extended flag tag proposal currently addresses the issue > of how to represent a pictograph for a rainbow flag. It doesn't. > In that case I think a new registry mechanism might in fact make sense > -- and I have spelled out details of how one could reasonably work in > conjunction with the extended flag tag proposal in feedback submitted > on PRI #299. Is this list the right place to discuss that proposal? -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From leob at mailcom.com Mon Jul 6 10:53:27 2015 From: leob at mailcom.com (Leo Broukhis) Date: Mon, 6 Jul 2015 08:53:27 -0700 Subject: PRI #299 In-Reply-To: <20150706081859.665a7a7059d7ee80bb4d670165c8327d.1cacf1e31c.wbe@email03.secureserver.net> References: <20150706081859.665a7a7059d7ee80bb4d670165c8327d.1cacf1e31c.wbe@email03.secureserver.net> Message-ID: On Mon, Jul 6, 2015 at 8:18 AM, Doug Ewell wrote: > Leo Broukhis wrote: > >> Most platforms display unknown printable characters as white >> rectangles with hex digits in them. >> In Doug's message, I saw a rectangle with 01F in the upper row, and >> 3F3 in the lower row. > > This is a handy feature, at least for character geeks like us, but "most > platforms" might be a bit misleading here. There is a rather commonly > used platform that starts with the letter W which does not do this. I was a little surprised myself when I saw it in Firefox under W7 Enterprise, but here we are. >> Moreover, on any platform when users see unknown characters, they >> search for a font, install it and are able to see in cleartext at >> least something they can make sense of. For a RIS or any other >> non-default-ignorable character on a non-vexillology-aware platform, a >> font with stylized letters would be sufficient to read the intent of >> the writer, and, as a free extra, to tell apart Liechtenstein and >> Haiti without squinting. > > I think a useful bit of feedback on PRI #299 would be to inquire whether > it is, in fact, a design goal to handle this use case of transparency of Huh? What kind of a deliberate design goal would be to forgo semantics in favor of presentation, even as a fallback behavior? In an ideal world, where all platforms are actively maintained, and all maintainers rush to implement the cool new features, it could have been acceptable, but not in our world, I'm afraid. > the individual letters on platforms, rendering engines, and/or fonts > that don't support flag-tag composition. (Please, not > "non-vexillology-aware." None of these platforms studies or analyzes > flags. They assemble multiple characters into a single image.) "Vexillology awareness" was, of course, mostly in jest. > If transparency on flag-tag-unaware platforms is not a design goal, it > might be difficult to make the case that default-ignorable tag > characters are a poor choice because they don't support transparency. Right. Then the objection should be interpreted with regard to the design goal. Leo From kenwhistler at att.net Mon Jul 6 10:53:35 2015 From: kenwhistler at att.net (Ken Whistler) Date: Mon, 06 Jul 2015 08:53:35 -0700 Subject: Adding RAINBOW FLAG to Unicode In-Reply-To: <20150706082610.665a7a7059d7ee80bb4d670165c8327d.23bc5880f2.wbe@email03.secureserver.net> References: <20150706082610.665a7a7059d7ee80bb4d670165c8327d.23bc5880f2.wbe@email03.secureserver.net> Message-ID: <559AA47F.5090906@att.net> On 7/6/2015 8:26 AM, Doug Ewell wrote: > Ken Whistler wrote: > > >> In that case I think a new registry mechanism might in fact make sense >> -- and I have spelled out details of how one could reasonably work in >> conjunction with the extended flag tag proposal in feedback submitted >> on PRI #299. > Is this list the right place to discuss that proposal? > > It is fair game for discussion on this list, of course. On the other hand, it might make sense to wait and see if it gains any traction when the UTC meets later this month and considers all of the feedback on the extended flag tag PRI #299 proposal together. If the concept of a Unicode flag pictograph registry garners no interest there, it is unlikely it would go further after that. --Ken From doug at ewellic.org Mon Jul 6 10:59:14 2015 From: doug at ewellic.org (Doug Ewell) Date: Mon, 06 Jul 2015 08:59:14 -0700 Subject: Adding RAINBOW FLAG to Unicode Message-ID: <20150706085914.665a7a7059d7ee80bb4d670165c8327d.3ea7e67602.wbe@email03.secureserver.net> Ken Whistler wrote: > On the other hand, it might make sense to wait and see if it gains any > traction when the UTC meets later this month and considers all of the > feedback on the extended flag tag PRI #299 proposal together. If the > concept of a Unicode flag pictograph registry garners no interest > there, it is unlikely it would go further after that. I'll wait, since most of my comments are about details. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From doug at ewellic.org Mon Jul 6 11:15:57 2015 From: doug at ewellic.org (Doug Ewell) Date: Mon, 06 Jul 2015 09:15:57 -0700 Subject: PRI #299 Message-ID: <20150706091557.665a7a7059d7ee80bb4d670165c8327d.a63c6e403b.wbe@email03.secureserver.net> Leo Broukhis wrote: >> This is a handy feature, at least for character geeks like us, but >> "most platforms" might be a bit misleading here. There is a rather >> commonly used platform that starts with the letter W which does not >> do this. > > I was a little surprised myself when I saw it in Firefox under W7 > Enterprise, but here we are. I'm surprised too; I hadn't tried using Firefox to view these sequences. Thanks for demonstrating this. We may once again be stumbling over different interpretations of the word "platform": does it refer to an operating system in general, a specific version thereof, or a specific editor, word processor, or browser under that OS and version? >> I think a useful bit of feedback on PRI #299 would be to inquire >> whether it is, in fact, a design goal to handle this use case of >> transparency of > > Huh? What kind of a deliberate design goal would be to forgo semantics > in favor of presentation, even as a fallback behavior? > In an ideal world, where all platforms are actively maintained, and > all maintainers rush to implement the cool new features, > it could have been acceptable, but not in our world, I'm afraid. I questioned whether it was a (positive) design goal to handle the fallback case in the way you described. I did not suggest that it was a (negative) design goal NOT to handle it, or to obscure the tag characters, and I would suggest there is a huge difference between the two. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From mark at macchiato.com Mon Jul 6 11:16:17 2015 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Mon, 6 Jul 2015 18:16:17 +0200 Subject: PRI #299 In-Reply-To: References: <20150706081859.665a7a7059d7ee80bb4d670165c8327d.1cacf1e31c.wbe@email03.secureserver.net> Message-ID: On Mon, Jul 6, 2015 at 5:53 PM, Leo Broukhis wrote: >> Most platforms display unknown printable characters as white >> rectangles with hex digits in them. >> In Doug's message, I saw a rectangle with 01F in the upper row, and >> 3F3 in the lower row. > > This is a handy feature, at least for character geeks like us, but "most > > platforms" might be a bit misleading here. There is a rather commonly > > used platform that starts with the letter W which does not do this. > > I was a little surprised myself when I saw it in Firefox under W7 > Enterprise, but here we are. ?"Most platforms" is quite misleading. Rather the converse: for the vast majority of people, the programs that they use on the devices they have will *not* show unknown printable characters in a format with readable hex digits. Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at swales.us Mon Jul 6 11:42:05 2015 From: steve at swales.us (Steve Swales) Date: Mon, 6 Jul 2015 09:42:05 -0700 Subject: Adding RAINBOW FLAG to Unicode In-Reply-To: <20150701103818.665a7a7059d7ee80bb4d670165c8327d.d917fb1a04.wbe@email03.secureserver.net> References: <20150701103818.665a7a7059d7ee80bb4d670165c8327d.d917fb1a04.wbe@email03.secureserver.net> Message-ID: <62291D50-840E-4451-A4EE-53A006CCBCD5@swales.us> Or a flag inversion modifier? recently I discovered that the Philippines flag, for example, has a special meaning (we are at war) when inverted. Just a thought. -steve > On Jul 1, 2015, at 10:38 AM, Doug Ewell wrote: > > wrote: > >> Whatever notation that might be added to whatever decision is >> ultimately made on this should probably mention historic use of the >> rainbow flag by the peace movement. See for example: >> >> https://en.wikipedia.org/wiki/Peace_flag#Rainbow_flag > > The colors of the rainbow peace flag (purple on top) are often inverted > with respect to the LGBT flag (red on top), making them essentially two > different flags. > > -- > Doug Ewell | http://ewellic.org | Thornton, CO ???? > > From doug at ewellic.org Mon Jul 6 11:53:48 2015 From: doug at ewellic.org (Doug Ewell) Date: Mon, 06 Jul 2015 09:53:48 -0700 Subject: Adding RAINBOW FLAG to Unicode Message-ID: <20150706095348.665a7a7059d7ee80bb4d670165c8327d.9ec08d1f9c.wbe@email03.secureserver.net> Steve Swales wrote: > Or a flag inversion modifier? recently I discovered that the > Philippines flag, for example, has a special meaning (we are at war) > when inverted. Just a thought. An inverted ensign on a ship was formerly used as a distress signal: http://www.crwflags.com/fotw/flags/xf-flip.html I'd argue strongly against adding "modifiers" to Unicode flag tags to indicate inverted, waving, half-staff, folded, or any other transient state. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From asmus-inc at ix.netcom.com Mon Jul 6 14:24:18 2015 From: asmus-inc at ix.netcom.com (Asmus Freytag (t)) Date: Mon, 6 Jul 2015 12:24:18 -0700 Subject: PRI #299 In-Reply-To: <20150706081859.665a7a7059d7ee80bb4d670165c8327d.1cacf1e31c.wbe@email03.secureserver.net> References: <20150706081859.665a7a7059d7ee80bb4d670165c8327d.1cacf1e31c.wbe@email03.secureserver.net> Message-ID: <559AD5E2.5020204@ix.netcom.com> An HTML attachment was scrubbed... URL: From asmus-inc at ix.netcom.com Mon Jul 6 14:52:04 2015 From: asmus-inc at ix.netcom.com (Asmus Freytag (t)) Date: Mon, 6 Jul 2015 12:52:04 -0700 Subject: Adding RAINBOW FLAG to Unicode In-Reply-To: <62291D50-840E-4451-A4EE-53A006CCBCD5@swales.us> References: <20150701103818.665a7a7059d7ee80bb4d670165c8327d.d917fb1a04.wbe@email03.secureserver.net> <62291D50-840E-4451-A4EE-53A006CCBCD5@swales.us> Message-ID: <559ADC64.80505@ix.netcom.com> An HTML attachment was scrubbed... URL: From doug at ewellic.org Mon Jul 6 15:11:35 2015 From: doug at ewellic.org (Doug Ewell) Date: Mon, 06 Jul 2015 13:11:35 -0700 Subject: Stationary vs. waving flags (was: Re: Adding RAINBOW FLAG to Unicode) Message-ID: <20150706131135.665a7a7059d7ee80bb4d670165c8327d.b0fde2cbd7.wbe@email03.secureserver.net> Asmus Freytag (t) wrote: > Rather than modifiers, I think a more natural thing would be to have > different base characters that reflect whether it's a flag, a pennant, > waving, flying from a flag stock or whatever other variety. > > Base characters could be limited to an "approved" list, which could be > extended as needed to cater to actual demand. > > In this context, I dislike the current proposal to use a WAVING flag > as a base character for non-waving plan and rectangular images of > flags. Is it your belief that users who wish to display an emoji flag care whether the flag is shown stationary versus flapping in the wind? What would be the compatibility solution for the existing set of emoji flags supported by RIS? Some carriers already show them rectangular, while others already show them waving: http://unicode.org/emoji/charts/full-emoji-list.html#1f1e6_1f1eb -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From leoboiko at namakajiri.net Mon Jul 6 15:20:58 2015 From: leoboiko at namakajiri.net (Leonardo Boiko) Date: Mon, 6 Jul 2015 17:20:58 -0300 Subject: Stationary vs. waving flags (was: Re: Adding RAINBOW FLAG to Unicode) In-Reply-To: <20150706131135.665a7a7059d7ee80bb4d670165c8327d.b0fde2cbd7.wbe@email03.secureserver.net> References: <20150706131135.665a7a7059d7ee80bb4d670165c8327d.b0fde2cbd7.wbe@email03.secureserver.net> Message-ID: 2015-07-06 17:11 GMT-03:00 Doug Ewell : > Is it your belief that users who wish to display an emoji flag care > whether the flag is shown stationary versus flapping in the wind? I think a waving white flag is an emoji symbol for "truce/surrender/come in peace", whereas a white rectangle doesn't easily transmit the same idea. From doug at ewellic.org Mon Jul 6 15:40:22 2015 From: doug at ewellic.org (Doug Ewell) Date: Mon, 06 Jul 2015 13:40:22 -0700 Subject: Stationary vs. waving flags (was: Re: Adding RAINBOW FLAG to Unicode) Message-ID: <20150706134022.665a7a7059d7ee80bb4d670165c8327d.304da8751c.wbe@email03.secureserver.net> Leonardo Boiko wrote: >> Is it your belief that users who wish to display an emoji flag care >> whether the flag is shown stationary versus flapping in the wind? > > I think a waving white flag is an emoji symbol for > "truce/surrender/come in peace", whereas a white rectangle doesn't > easily transmit the same idea. I don't know how many other flags have different semantics depending on whether they are waving or not. I note that neither RIS pairs nor PRI #299 sequences can encode a plain white flag (but of course the user can simply choose between U+2690 and U+1F3F3 for that). I hear Asmus's concern about using WAVING WHITE FLAG as the base character for emoji flags which might not be depicted as waving. However, in that case the solution would be to choose a different, *single* base character. What Asmus wanted was > to have different base characters that reflect whether it's a flag, a > pennant, waving, flying from a flag stock or whatever other variety and this is the problem I don't think can be solved, either with RIS flags or with PRI #299 flags, regardless of the choice of base character. Different platforms already show (e.g.) the French flag as either waving or not waving. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From gwalla at gmail.com Mon Jul 6 15:55:20 2015 From: gwalla at gmail.com (Garth Wallace) Date: Mon, 6 Jul 2015 13:55:20 -0700 Subject: Adding RAINBOW FLAG to Unicode In-Reply-To: <559ADC64.80505@ix.netcom.com> References: <20150701103818.665a7a7059d7ee80bb4d670165c8327d.d917fb1a04.wbe@email03.secureserver.net> <62291D50-840E-4451-A4EE-53A006CCBCD5@swales.us> <559ADC64.80505@ix.netcom.com> Message-ID: On Mon, Jul 6, 2015 at 12:52 PM, Asmus Freytag (t) wrote: > On 7/6/2015 9:42 AM, Steve Swales wrote: > > Or a flag inversion modifier? recently I discovered that the Philippines > flag, for example, has a special meaning (we are at war) when inverted. > Just a thought. > > > Rather than modifiers, I think a more natural thing would be to have > different base characters that reflect whether it's a flag, a pennant, > waving, flying from a flag stock or whatever other variety. > > Base characters could be limited to an "approved" list, which could be > extended as needed to cater to actual demand. > > In this context, I dislike the current proposal to use a WAVING flag as a > base character for non-waving plan and rectangular images of flags. > A./ I'm concerned that the proposed base is a white flag, which usually means "surrender". It seems like there's some potential for miscommunication there. From doug at ewellic.org Mon Jul 6 16:31:07 2015 From: doug at ewellic.org (Doug Ewell) Date: Mon, 06 Jul 2015 14:31:07 -0700 Subject: Stationary vs. waving flags (was: Re: Adding RAINBOW FLAG to Unicode) Message-ID: <20150706143107.665a7a7059d7ee80bb4d670165c8327d.c065cd7fa2.wbe@email03.secureserver.net> I wrote: > I hear Asmus's concern about using WAVING WHITE FLAG as the base > character for emoji flags which might not be depicted as waving. I suppose there's no particular reason why U+2690 can't be the base character instead. But then Garth Wallace wrote: > I'm concerned that the proposed base is a white flag, which usually > means "surrender". It seems like there's some potential for > miscommunication there. If the intrinsic meaning of the base character in isolation is a problem -- people using flag-tag-unaware systems will see a white flag and assume it means "surrender" -- then there aren't any existing encoded flag characters that are any better. Black flags have historically had a wide variety of meanings as well -- mourning, anarchy, Italian fascism, race car driver disqualified, etc. So substituting U+1F3F4 or U+2691 won't help. All of the other existing flag symbol characters have even more specific meanings, usually annotated in TUS. Folks who consider this a problem are probably intrigued by item 2 under "Discussion" in the background document: encode an all-new base character. This would delay the rollout of the mechanism, and if the new character has a glyph that looks at all like a flag, it will likely face the same criticism (e.g. "looks too much like the Portuguese flag"). -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From gwalla at gmail.com Mon Jul 6 17:35:43 2015 From: gwalla at gmail.com (Garth Wallace) Date: Mon, 6 Jul 2015 15:35:43 -0700 Subject: Stationary vs. waving flags (was: Re: Adding RAINBOW FLAG to Unicode) In-Reply-To: <20150706143107.665a7a7059d7ee80bb4d670165c8327d.c065cd7fa2.wbe@email03.secureserver.net> References: <20150706143107.665a7a7059d7ee80bb4d670165c8327d.c065cd7fa2.wbe@email03.secureserver.net> Message-ID: On Mon, Jul 6, 2015 at 2:31 PM, Doug Ewell wrote: > I wrote: > >> I hear Asmus's concern about using WAVING WHITE FLAG as the base >> character for emoji flags which might not be depicted as waving. > > I suppose there's no particular reason why U+2690 can't be the base > character instead. I suspect it's because WAVING WHITE FLAG is defined as having an emoji representation and WHITE FLAG isn't. > But then Garth Wallace wrote: > >> I'm concerned that the proposed base is a white flag, which usually >> means "surrender". It seems like there's some potential for >> miscommunication there. > > If the intrinsic meaning of the base character in isolation is a problem > -- people using flag-tag-unaware systems will see a white flag and > assume it means "surrender" -- then there aren't any existing encoded > flag characters that are any better. > > Black flags have historically had a wide variety of meanings as well -- > mourning, anarchy, Italian fascism, race car driver disqualified, etc. > So substituting U+1F3F4 or U+2691 won't help. All of the other existing > flag symbol characters have even more specific meanings, usually > annotated in TUS. That's true, none of the existing flag characters are neutral. > Folks who consider this a problem are probably intrigued by item 2 under > "Discussion" in the background document: encode an all-new base > character. This would delay the rollout of the mechanism, and if the new > character has a glyph that looks at all like a flag, it will likely face > the same criticism (e.g. "looks too much like the Portuguese flag"). I think crosshatching would be neutral. I'm not aware of any flags with a field of diagonal stripes; they usually only have one. Although I suppose heraldry enthusiasts might interpret them as tinctures. From nslater at tumbolia.org Mon Jul 6 18:34:20 2015 From: nslater at tumbolia.org (Noah Slater) Date: Mon, 06 Jul 2015 23:34:20 +0000 Subject: Adding RAINBOW FLAG to Unicode In-Reply-To: <20150706095348.665a7a7059d7ee80bb4d670165c8327d.9ec08d1f9c.wbe@email03.secureserver.net> References: <20150706095348.665a7a7059d7ee80bb4d670165c8327d.9ec08d1f9c.wbe@email03.secureserver.net> Message-ID: Previously in this thread, it was suggested that I make a formal proposal to the UTC. I have held back from doing this because it's not at all clear what implementation I should be proposing, or whether I can propose something WITHOUT an implementation. (Some advise there would be handy!) Should I trust that the UTC will be aware of the informal proposal of the rainbow flag when they meet to discuss PRI #299, or should I do something to properly bring it to their attention? On Mon, 6 Jul 2015 at 17:58 Doug Ewell wrote: > Steve Swales wrote: > > > Or a flag inversion modifier? recently I discovered that the > > Philippines flag, for example, has a special meaning (we are at war) > > when inverted. Just a thought. > > An inverted ensign on a ship was formerly used as a distress signal: > http://www.crwflags.com/fotw/flags/xf-flip.html > > I'd argue strongly against adding "modifiers" to Unicode flag tags to > indicate inverted, waving, half-staff, folded, or any other transient > state. > > -- > Doug Ewell | http://ewellic.org | Thornton, CO ???? > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rscook at wenlin.com Tue Jul 7 09:53:03 2015 From: rscook at wenlin.com (Richard Cook) Date: Tue, 7 Jul 2015 07:53:03 -0700 Subject: vexillology, was: Adding RAINBOW FLAG to Unicode In-Reply-To: References: <20150702120915.665a7a7059d7ee80bb4d670165c8327d.afc9aa094b.wbe@email03.secureserver.net> Message-ID: <63B729C6-B57B-40F6-8852-6164F7F99361@wenlin.com> Ken Whistler wrote: >> vexillology > Garth Wallace wrote: > > Tangentially, I recently ran across something called International > Flag Identification Symbols. It's a symbolic notation for vexillology > that describes their use of flags and some aspects of their design but > not enough to reproduce them. Ken, Hasn't any vexillogist defined a full blown FDL (Flag Description Language) yet? That would be a sub-discipline of heraldic arms blazoning, I guess. -Richard ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rscook at wenlin.com Tue Jul 7 09:56:26 2015 From: rscook at wenlin.com (Richard Cook) Date: Tue, 7 Jul 2015 07:56:26 -0700 Subject: vexillology, was: Adding RAINBOW FLAG to Unicode In-Reply-To: <63B729C6-B57B-40F6-8852-6164F7F99361@wenlin.com> References: <20150702120915.665a7a7059d7ee80bb4d670165c8327d.afc9aa094b.wbe@email03.secureserver.net> <63B729C6-B57B-40F6-8852-6164F7F99361@wenlin.com> Message-ID: On Jul 7, 2015, at 7:53 AM, Richard Cook wrote: > > Ken Whistler wrote: >>> vexillology > > >> Garth Wallace wrote: >> >> Tangentially, I recently ran across something called International >> Flag Identification Symbols. It's a symbolic notation for vexillology >> that describes their use of flags and some aspects of their design but >> not enough to reproduce them. > > Ken, > > Hasn't any vexillogist => vexillologist > defined a full blown FDL (Flag Description Language) yet? That would be a sub-discipline of heraldic arms blazoning, I guess. > > -Richard > > ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From petercon at microsoft.com Tue Jul 7 10:11:37 2015 From: petercon at microsoft.com (Peter Constable) Date: Tue, 7 Jul 2015 15:11:37 +0000 Subject: Adding RAINBOW FLAG to Unicode In-Reply-To: <5596D32E.9030403@att.net> References: <20150701103345.665a7a7059d7ee80bb4d670165c8327d.f4f3a553a3.wbe@email03.secureserver.net> <003401d0b4be$3af16970$b0d43c50$@fi> <5596D32E.9030403@att.net> Message-ID: I never said anything about stability of geopolitical entities. I only mentioned stability of encoded character sequences. Peter From: Ken Whistler [mailto:kenwhistler at att.net] Sent: Friday, July 3, 2015 11:24 AM To: Peter Constable Cc: unicode at unicode.org Subject: Re: Adding RAINBOW FLAG to Unicode On 7/2/2015 5:56 PM, Peter Constable wrote: Erkki, in this case, I think Philippe is making valid points. - For the proposal to be workable requires some means of ensuring stability of encoded representations. The way this would be done would be for CLDR to provide data with all valid sequences --- effectively becoming a registry. I think that is wrong on a couple of grounds. First, detailed stability of reference to actual defined geopolitical entities or particular detailed flag designs is not *required* for proposal to represent *pictographs* of flags by some sequence of Unicode characters to be "workable". Sure, more stability of reference is desirable. But the current RIS pair mechanism for representing flag pictographs for countries is already "workable" -- it works and is widely deployed and widely used -- without having guarantees that some particular country may not decide tomorrow to change its official flag and hence result in some particular pictographic display being obsolete in some sense, for example. Second, the horse is already out of the barn regarding the particular data that CLDR would be referring to. This works by reference to the ISO 3166-2 scheme of subdivisions: https://en.wikipedia.org/wiki/ISO_3166-2 and *that* becomes the registry required for stability of representations, plus whatever grandfathering stability-of-code mechanism BCP 47 adds on top of that. We don't require a further detailed level of registration, I think, to make this workable. If the New Zealand Hawke's Bay Regional Council (NZ-HKB) decided it needed a district flag (or decided to change one it may already have), I'm not going to be overly concerned about the details there. As long as has a stable definition as a Unicode extended flag tag sequence, it is up to somebody else to decide if they want to actually map a Hawke's Bay flag pictograph in a font to that sequence -- or update the flag pictograph they may have been using. Yeah, this could be a giant headache for any vendor that felt they had to support *every possible* region/subdivision sequence and keep the exact representations of flag pictographs stable. But I predict this will very, very quickly result in people making a "let's cover the 99% case" set of decisions, and then issues like "Should we display a flag pictograph for the Hawke's Bay Regional Council?" will be dealt with by the normal methods of triage for feature requests. - The concepts being denoted are inherently political, often unstable, and sometimes highly sensitive. Sensitive issues aside, a better approach would be to have a URN tagging scheme --- which IMO begs the question why this is a Unicode topic as it clearly crosses outside the limits of plain text. A URN tagging scheme might make sense if what we were trying to do was delegating all identity concerns to external authority, and if we didn't care about efficiency of representation, either. I don't think that is what this is about, as I tried to make clear yesterday. I don't think we are encoding *flags* -- we are creating a mechanism for the reliable representation of a set of *pictographs (emoji) for flags*. And those pictographs for flags need an efficient representation that can coexist comfortably with the rest of plain text -- the way the RIS pairs already do. Sensitive issues considered, though, it begs the question as to whether Unicode should be considering any of this at all, no matter what the scheme for encoded representation may be. Someone helpfully reminded us of this: >> [...] the UTC does not wish to entertain further proposals for >> encoding of symbol characters for flags, whether national, state, >> regional, international, or otherwise. References to UTC Minutes: >> [134-C2], January 28, 2013. I believe that that statement (and the referenced decision) refer specifically to the unwillingness of the UTC to entertain proposals for encoding an indefinite number of pictographs for flags (of whatever variety) *as symbol characters* -- that is, one-by-one encodings as a single, gc=So code point in the standard. Heading that direction is clearly not an efficient way to deal with the concern, and would waste everybody's time in one-by-one proposals and ad hoc decisions for each individual flag pictograph to be added. The UTC has a long history of putting a stake in the ground when it encounters a character encoding problem which requires a *general* solution, rather than a dribbling in of one-off decisions an item at a time. And I think the tag proposal for dealing with the representation of flag pictographs for regional subdivisions shows precisely the kind of generality that we are looking for -- dealing with hundreds of potentially representable entities with a general mechanism, rather than trying to encode them all one-by-one. Incidentally, back to the ostensible topic of this thread -- I don't think the extended flag tag proposal currently addresses the issue of how to represent a pictograph for a rainbow flag. In that case I think a new registry mechanism might in fact make sense -- and I have spelled out details of how one could reasonably work in conjunction with the extended flag tag proposal in feedback submitted on PRI #299. --Ken Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Tue Jul 7 11:07:22 2015 From: doug at ewellic.org (Doug Ewell) Date: Tue, 07 Jul 2015 09:07:22 -0700 Subject: Adding RAINBOW FLAG to Unicode Message-ID: <20150707090722.665a7a7059d7ee80bb4d670165c8327d.90c2185143.wbe@email03.secureserver.net> Disclaimer: These are only suggestions. I've never submitted a character proposal. You should prefer the advice of people who have, or of UTC members who evaluate proposals. Noah Slater wrote: > Previously in this thread, it was suggested that I make a formal > proposal to the UTC. I have held back from doing this because it's not > at all clear what implementation I should be proposing, or whether I > can propose something WITHOUT an implementation. (Some advise there > would be handy!) If by "implementation" you mean a suggestion for how Unicode should encode this flag (single character, extension to the PRI #299 mechanism similar to what Ken proposed, or something else), it might be a good idea to summarize the options and choose at least one "preferred" option. > Should I trust that the UTC will be aware of the informal proposal of > the rainbow flag when they meet to discuss PRI #299, or should I do > something to properly bring it to their attention? As Mark Davis wrote [1], this list is not a venue for formally proposing anything, and it's not safe to assume that UTC members have read this list and have any background. If you want to state something, make sure you state it in the proposal. You can quote and paraphrase list discussions, but don't just insert links to the list archive. [1] http://www.unicode.org/mail-arch/unicode-ml/y2015-m07/0033.html -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From everson at evertype.com Tue Jul 7 11:09:48 2015 From: everson at evertype.com (Michael Everson) Date: Tue, 7 Jul 2015 17:09:48 +0100 Subject: vexillology, was: Adding RAINBOW FLAG to Unicode In-Reply-To: References: <20150702120915.665a7a7059d7ee80bb4d670165c8327d.afc9aa094b.wbe@email03.secureserver.net> <63B729C6-B57B-40F6-8852-6164F7F99361@wenlin.com> Message-ID: As I recall, Ant?nio Martins-Tuv?lkin and Anshuman Pandey both submitted proposals on this subject in 2007 or 2008 and in 2012 respectively. Michael Everson * http://www.evertype.com/ From nslater at tumbolia.org Tue Jul 7 11:29:29 2015 From: nslater at tumbolia.org (Noah Slater) Date: Tue, 07 Jul 2015 16:29:29 +0000 Subject: Adding RAINBOW FLAG to Unicode In-Reply-To: <20150707090722.665a7a7059d7ee80bb4d670165c8327d.90c2185143.wbe@email03.secureserver.net> References: <20150707090722.665a7a7059d7ee80bb4d670165c8327d.90c2185143.wbe@email03.secureserver.net> Message-ID: Thanks Doug. That's very helpful. On Tue, 7 Jul 2015 at 17:07 Doug Ewell wrote: > Disclaimer: These are only suggestions. I've never submitted a character > proposal. You should prefer the advice of people who have, or of UTC > members who evaluate proposals. > > Noah Slater wrote: > > > Previously in this thread, it was suggested that I make a formal > > proposal to the UTC. I have held back from doing this because it's not > > at all clear what implementation I should be proposing, or whether I > > can propose something WITHOUT an implementation. (Some advise there > > would be handy!) > > If by "implementation" you mean a suggestion for how Unicode should > encode this flag (single character, extension to the PRI #299 mechanism > similar to what Ken proposed, or something else), it might be a good > idea to summarize the options and choose at least one "preferred" > option. > > > Should I trust that the UTC will be aware of the informal proposal of > > the rainbow flag when they meet to discuss PRI #299, or should I do > > something to properly bring it to their attention? > > As Mark Davis wrote [1], this list is not a venue for formally proposing > anything, and it's not safe to assume that UTC members have read this > list and have any background. If you want to state something, make sure > you state it in the proposal. You can quote and paraphrase list > discussions, but don't just insert links to the list archive. > > [1] http://www.unicode.org/mail-arch/unicode-ml/y2015-m07/0033.html > > -- > Doug Ewell | http://ewellic.org | Thornton, CO ???? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Thu Jul 9 10:53:26 2015 From: doug at ewellic.org (Doug Ewell) Date: Thu, 09 Jul 2015 08:53:26 -0700 Subject: Precomposed Cyrillic letters Message-ID: <20150709085326.665a7a7059d7ee80bb4d670165c8327d.7125f102ec.wbe@email03.secureserver.net> From http://www.unicode.org/L2/L2015/15169-montenegro-cyrillic.pdf, "Addition of two letters from Montenegrin language, CYRILLIC script": > 9. Can any of the proposed characters be encoded using a composed > character sequence of either existing characters or other proposed > characters? > No Saying it doesn't make it so: > Annex 1: Character shapes (related to section B, item 4b) > Cyrillic small letter SJ > ?? <0441 0301> > Cyrillic capital letter SJ > ?? <0421 0301> > Cyrillic small letter ZJ > ?? <0437 0301> > Cyrillic capital letter ZJ > ?? <0417 0301> Quite a few fonts don't display these well (and quite a few do), but of course that's a font problem, not an encoding problem. Cf. http://www.unicode.org/faq/char_combmark.html#11 -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From doug at ewellic.org Thu Jul 9 10:58:08 2015 From: doug at ewellic.org (Doug Ewell) Date: Thu, 09 Jul 2015 08:58:08 -0700 Subject: Tamil-Latin proposal Message-ID: <20150709085808.665a7a7059d7ee80bb4d670165c8327d.7c3c5acc77.wbe@email03.secureserver.net> http://www.unicode.org/L2/L2015/15153-tamil-latin-proposal.pdf I suppose the response to this proposal won't be made public. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From markus.icu at gmail.com Thu Jul 9 11:37:21 2015 From: markus.icu at gmail.com (Markus Scherer) Date: Thu, 9 Jul 2015 09:37:21 -0700 Subject: Precomposed Cyrillic letters In-Reply-To: <20150709085326.665a7a7059d7ee80bb4d670165c8327d.7125f102ec.wbe@email03.secureserver.net> References: <20150709085326.665a7a7059d7ee80bb4d670165c8327d.7125f102ec.wbe@email03.secureserver.net> Message-ID: On Thu, Jul 9, 2015 at 8:53 AM, Doug Ewell wrote: > From http://www.unicode.org/L2/L2015/15169-montenegro-cyrillic.pdf, > "Addition of two letters from Montenegrin language, CYRILLIC script": > > > 9. Can any of the proposed characters be encoded using a composed > > character sequence of either existing characters or other proposed > > characters? > > No > > Saying it doesn't make it so: > Right, although I doubt that the proposers monitor this mailing list... In case an interested party is listening: If sr-ME needs different locale data than sr, then one could contribute such data to CLDR . See the current state: http://unicode.org/cldr/trac/browser/trunk/common/main/sr_Cyrl_ME.xml markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Thu Jul 9 13:25:05 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Thu, 9 Jul 2015 19:25:05 +0100 Subject: Tamil-Latin proposal In-Reply-To: <20150709085808.665a7a7059d7ee80bb4d670165c8327d.7c3c5acc77.wbe@email03.secureserver.net> References: <20150709085808.665a7a7059d7ee80bb4d670165c8327d.7c3c5acc77.wbe@email03.secureserver.net> Message-ID: <20150709192505.6ef5db34@JRWUBU2> On Thu, 09 Jul 2015 08:58:08 -0700 "Doug Ewell" wrote: > http://www.unicode.org/L2/L2015/15153-tamil-latin-proposal.pdf > > I suppose the response to this proposal won't be made public. It's a shame there's no precedent for proposals being rejected for lying. However, it might be rejected for being a 'contemporary' script with no users - that much is admitted to! Richard. From verdy_p at wanadoo.fr Thu Jul 9 14:08:09 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Thu, 9 Jul 2015 21:08:09 +0200 Subject: Tamil-Latin proposal In-Reply-To: <20150709192505.6ef5db34@JRWUBU2> References: <20150709085808.665a7a7059d7ee80bb4d670165c8327d.7c3c5acc77.wbe@email03.secureserver.net> <20150709192505.6ef5db34@JRWUBU2> Message-ID: Also it's cleanrly not needed to duplicate Latin letters (or Cyrillic tool) to borrow them in them Tamil script, just in order to add Tamil vowel diacritics on top of them. If that proposer wnats to creatre a font allowing combinng Latin/Cyrillic letters with Tamil vowel signs, there's no need to duplicate the encoding of these base letters. Nothing prohibits a font to map those combinations, evne if it's not needed for other languages using the Latin and Cyrillic scripts: that could be done by extending an existing Tamil font (most of them already map Basic Latin, even if none of them currently map combinations with Tamil vowel signs). For the usage purpose desribed, in fact a good font for Latin and IPA would work, with just a few additions fow allowing the Tamil vowel signs. And no need to create specific encodings for Latin+generic diacritics, evne if the precombined letters are not encoded (why those additional "base letters" would be restricted to Tamil?) Given there's no user using this extended script, the Unicode policy will require first experimenting and creating a user community, and demonstrate that for this usage, the existing encodings cannot work reliably. But for now there's no need for it, no compatibility issues to resolve, no dictionaries or old books for which this encoding would be useful. And it's definitely not a problem of chicken and egg: this is an attempt to bypass the UCS encoding policies specifically for a script that really does not these duplicate extra base letters and combining vowels. And it's definitely not a new script by a proposed "new" script whose characters are in fact badly named! There's no such "Tamil-Latin" letters, but the real standard is about transliterations of Tamil using standard Latin letters (romanizations), or IPA symbols, and for that there are already standards that do not need any of these additions that would in fact add more complications and would solve no practical problems. Let's just focus on the Tamil romanization standards, and romanized IME for Tamil which already work as is. 2015-07-09 20:25 GMT+02:00 Richard Wordingham < richard.wordingham at ntlworld.com>: > On Thu, 09 Jul 2015 08:58:08 -0700 > "Doug Ewell" wrote: > > > http://www.unicode.org/L2/L2015/15153-tamil-latin-proposal.pdf > > > > I suppose the response to this proposal won't be made public. > > It's a shame there's no precedent for proposals being rejected for > lying. However, it might be rejected for being a 'contemporary' > script with no users - that much is admitted to! > > Richard. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Thu Jul 9 15:18:30 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Thu, 9 Jul 2015 21:18:30 +0100 Subject: Tamil-Latin proposal In-Reply-To: References: <20150709085808.665a7a7059d7ee80bb4d670165c8327d.7c3c5acc77.wbe@email03.secureserver.net> <20150709192505.6ef5db34@JRWUBU2> Message-ID: <20150709211830.736197d4@JRWUBU2> I did wonder if part of the idea was to get consonant + pulli accepted as basic. On Thu, 9 Jul 2015 21:08:09 +0200 Philippe Verdy wrote: > Also it's cleanrly not needed to duplicate Latin letters (or Cyrillic > tool) to borrow them in them Tamil script, just in order to add Tamil > vowel diacritics on top of them. Actually, this touches on a very real issue. U+0BC0 TAMIL VOWEL SIGN II has a script property of Tamil, and there is a very strong tendency for to be split between two script runs and consequently to be rendered as containing a defective sequence - the cursed dotted circle of the literal grammar police appears. I confirmed this in LibreOffice using the Code2000 font, which I know supports Tamil. Richard. From everson at evertype.com Thu Jul 9 16:06:36 2015 From: everson at evertype.com (Michael Everson) Date: Thu, 9 Jul 2015 22:06:36 +0100 Subject: ISO 15924 Message-ID: Please see http://www.unicode.org/iso15924/codechanges.html for today?s updates. Michael Everson Registrar, ISO 15924 From richard.wordingham at ntlworld.com Thu Jul 9 16:59:29 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Thu, 9 Jul 2015 22:59:29 +0100 Subject: Precomposed Cyrillic letters In-Reply-To: References: <20150709085326.665a7a7059d7ee80bb4d670165c8327d.7125f102ec.wbe@email03.secureserver.net> Message-ID: <20150709225929.1f3b029a@JRWUBU2> On Thu, 9 Jul 2015 09:37:21 -0700 Markus Scherer wrote: > On Thu, Jul 9, 2015 at 8:53 AM, Doug Ewell wrote: > > > From http://www.unicode.org/L2/L2015/15169-montenegro-cyrillic.pdf, > > "Addition of two letters from Montenegrin language, CYRILLIC > > script": > > > > > 9. Can any of the proposed characters be encoded using a composed > > > character sequence of either existing characters or other proposed > > > characters? > > > No > > > > Saying it doesn't make it so: Is there a requirement to answer those questions truthfully? > Right, although I doubt that the proposers monitor this mailing > list... > > In case an interested party is listening: If sr-ME needs different > locale data than sr, then one could contribute such data to CLDR > . > See the current state: > http://unicode.org/cldr/trac/browser/trunk/common/main/sr_Cyrl_ME.xml Presumably http://cldr.unicode.org/index/survey-tool/accounts is the most relevant page for someone with credibility. However, as Montenegro has an army and a navy, you have the wrong locale. It's still waiting for a language code. See the language family panels at https://en.wikipedia.org/wiki/Eastern_Herzegovinian_dialect and https://en.wikipedia.org/wiki/Montenegrin_language for the extreme Balkanisation. But in short, yes we need the extra Cyrillic letters ?? and ?? and Latin letters ? and ? for the exemplar characters in sr_Cyrl_ME and sr_Latn_ME (or should that be sr_ME?). I can't work out the status of Montenegrin Latin {sj} and {zj}. Richard. From doug at ewellic.org Thu Jul 9 17:23:50 2015 From: doug at ewellic.org (Doug Ewell) Date: Thu, 09 Jul 2015 15:23:50 -0700 Subject: Precomposed Cyrillic letters Message-ID: <20150709152350.665a7a7059d7ee80bb4d670165c8327d.539fcf5c76.wbe@email03.secureserver.net> Richard Wordingham wrote: > Presumably http://cldr.unicode.org/index/survey-tool/accounts is the > most relevant page for someone with credibility. However, as > Montenegro has an army and a navy, you have the wrong locale. It's > still waiting for a language code. See the language family panels > at https://en.wikipedia.org/wiki/Eastern_Herzegovinian_dialect and > https://en.wikipedia.org/wiki/Montenegrin_language for the extreme > Balkanisation. Montenegro could have all the military power in the world, but that doesn't make "Montenegrin" a distinct language. It's a dialect of Serbian. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From richard.wordingham at ntlworld.com Thu Jul 9 18:03:20 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Fri, 10 Jul 2015 00:03:20 +0100 Subject: Precomposed Cyrillic letters In-Reply-To: <20150709152350.665a7a7059d7ee80bb4d670165c8327d.539fcf5c76.wbe@email03.secureserver.net> References: <20150709152350.665a7a7059d7ee80bb4d670165c8327d.539fcf5c76.wbe@email03.secureserver.net> Message-ID: <20150710000320.19415118@JRWUBU2> On Thu, 09 Jul 2015 15:23:50 -0700 "Doug Ewell" wrote: > Montenegro could have all the military power in the world, but that > doesn't make "Montenegrin" a distinct language. It's a dialect of > Serbian. "A language is a dialect with an army and a navy." - Variously attributed, including to Antoine Meillet, who may not have required a navy. Richard. From markus.icu at gmail.com Thu Jul 9 21:18:15 2015 From: markus.icu at gmail.com (Markus Scherer) Date: Thu, 9 Jul 2015 19:18:15 -0700 Subject: ISO 15924 In-Reply-To: References: Message-ID: Thanks! markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcb+unicode at inf.ed.ac.uk Sat Jul 11 08:48:05 2015 From: jcb+unicode at inf.ed.ac.uk (Julian Bradfield) Date: Sat, 11 Jul 2015 14:48:05 +0100 (BST) Subject: a mug Message-ID: I feel the following mug says something about a popular topic of debate on this list... http://www.redbubble.com/people/insider/works/15315362-i-3-unicode (do look at the picture, don't just infer from the url) -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From charupdate at orange.fr Sat Jul 11 10:26:13 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Sat, 11 Jul 2015 17:26:13 +0200 (CEST) Subject: a mug In-Reply-To: References: Message-ID: <2124139245.13765.1436628373232.JavaMail.www@wwinf1f26> On Sat, Jul 11, 2015, Julian Bradfield wrote: > I feel the following mug says something about a popular topic of > debate on this list... As I feel concerned too, I'd like (I ?) to underscore that the designer of this mug seems to be insulting Unicode implementers and developers. Given the mass of popular characters that are already well rendered across platforms, and the huge sets of *new* items that are constantly adding, blaming people of not having done their job is doing no good. And above all, regardless of personal opinions and personality of mug designers, I think that the name of UNICODE should be left aside in such messages, because linking implementation issues with Unicode's corporate image is simply dishonest. Thank you however for the information, it's always good to know what ideas are on stage out there... Marcel Schneider ? > Message du 11/07/15 15:58 > De : "Julian Bradfield" > A : unicode at unicode.org > Copie ? : > Objet : a mug > > I feel the following mug says something about a popular topic of > debate on this list... > > > http://www.redbubble.com/people/insider/works/15315362-i-3-unicode > > (do look at the picture, don't just infer from the url) > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > From daniel.buenzli at erratique.ch Sat Jul 11 11:15:33 2015 From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=) Date: Sat, 11 Jul 2015 17:15:33 +0100 Subject: a mug In-Reply-To: <2124139245.13765.1436628373232.JavaMail.www@wwinf1f26> References: <2124139245.13765.1436628373232.JavaMail.www@wwinf1f26> Message-ID: Le samedi, 11 juillet 2015 ? 16:26, Marcel Schneider a ?crit : > As I feel concerned too, I'd like (I ?) to underscore that the designer of this mug seems to be insulting Unicode implementers Being one of these I would like to tell you that I feel absolutely not insulted by this mug. I find it rather funny as it actually reflects a reality you can expect to see more and more. Given the sheer volume of characters that are being added to the standard you can't expect font designers to cater for all of them. And this is actually due to the very definition of Unicode itself whether you like it or not. Best, Daniel From johannes at bergerhausen.com Sat Jul 11 11:36:30 2015 From: johannes at bergerhausen.com (Johannes Bergerhausen) Date: Sat, 11 Jul 2015 18:36:30 +0200 Subject: a mug In-Reply-To: <2124139245.13765.1436628373232.JavaMail.www@wwinf1f26> References: <2124139245.13765.1436628373232.JavaMail.www@wwinf1f26> Message-ID: <138DBFCD-FCE2-4575-9AB4-C5F5C51B2A96@bergerhausen.com> Yes, the mug is funny. It shows not a Unicode problem, it points at a general font problem of operating systems. Dear Apple, Dear Google, Dear Microsoft: please give us *all* missing Unicode glyphs right inside your operating systems! As I said at TEDx in Vienna: www.youtube.com/watch?v=IRdupNXpm8k So, better would be: I [] Apple. I [] Google. I [] Microsoft. All the best, Johannes From public at khwilliamson.com Sat Jul 11 12:33:54 2015 From: public at khwilliamson.com (Karl Williamson) Date: Sat, 11 Jul 2015 11:33:54 -0600 Subject: a mug In-Reply-To: <138DBFCD-FCE2-4575-9AB4-C5F5C51B2A96@bergerhausen.com> References: <2124139245.13765.1436628373232.JavaMail.www@wwinf1f26> <138DBFCD-FCE2-4575-9AB4-C5F5C51B2A96@bergerhausen.com> Message-ID: <55A15382.9000608@khwilliamson.com> On 07/11/2015 10:36 AM, Johannes Bergerhausen wrote: > Yes, the mug is funny. > > It shows not a Unicode problem, it points at a general font problem of operating systems. > > Dear Apple, Dear Google, Dear Microsoft: please give us *all* missing Unicode glyphs right inside your operating systems! > > As I said at TEDx in Vienna: > www.youtube.com/watch?v=IRdupNXpm8k > > So, better would be: > > I [] Apple. > I [] Google. > I [] Microsoft. > > All the best, > Johannes > http://i1.cpcache.com/product/27297813/utf8_value_tshirt.jpg From shervinafshar at gmail.com Sat Jul 11 13:08:56 2015 From: shervinafshar at gmail.com (Shervin Afshar) Date: Sat, 11 Jul 2015 11:08:56 -0700 Subject: a mug In-Reply-To: References: Message-ID: ????????. ????, Unicode ??? ??? ????? ; ?? vs. ??. ????????. ????? ?? Unicode ????. ?????? ??????. ? ????? ????. ? Shervin On Sat, Jul 11, 2015 at 6:48 AM, Julian Bradfield wrote: > I feel the following mug says something about a popular topic of > debate on this list... > > > http://www.redbubble.com/people/insider/works/15315362-i-3-unicode > > (do look at the picture, don't just infer from the url) > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From haberg-1 at telia.com Sat Jul 11 13:54:34 2015 From: haberg-1 at telia.com (Hans Aberg) Date: Sat, 11 Jul 2015 20:54:34 +0200 Subject: a mug In-Reply-To: <138DBFCD-FCE2-4575-9AB4-C5F5C51B2A96@bergerhausen.com> References: <2124139245.13765.1436628373232.JavaMail.www@wwinf1f26> <138DBFCD-FCE2-4575-9AB4-C5F5C51B2A96@bergerhausen.com> Message-ID: <6E678419-0F79-4DD0-BC05-2833BA23E66D@telia.com> > On 11 Jul 2015, at 18:36, Johannes Bergerhausen wrote: > > As I said at TEDx in Vienna: > [https://www.youtube.com/watch?v=IRdupNXpm8k] The keyboards for different languages are essentially the same nowadays: it sends a code indicating which button is acted on and whether it is depressed or released. The computer then translates using a key map. So for a Cherokee keyboard, as discussed in the video, one would need different images on the keys if one bothers, and a key map. One problem here is that is that it is very time consuming to design such key maps. This is another shortcoming of Unicode usage: lack of input methods, in addition to the font issue. From petercon at microsoft.com Sun Jul 12 01:09:03 2015 From: petercon at microsoft.com (Peter Constable) Date: Sun, 12 Jul 2015 06:09:03 +0000 Subject: ISO 15924 In-Reply-To: References: Message-ID: Is there a significance to the colours in the table? Peter -----Original Message----- From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Michael Everson Sent: Thursday, July 9, 2015 2:07 PM To: unicode Unicode Discussion; UnicoRe Mailing List Subject: ISO 15924 Please see http://www.unicode.org/iso15924/codechanges.html for today?s updates. Michael Everson Registrar, ISO 15924 From everson at evertype.com Sun Jul 12 06:19:57 2015 From: everson at evertype.com (Michael Everson) Date: Sun, 12 Jul 2015 12:19:57 +0100 Subject: ISO 15924 In-Reply-To: References: Message-ID: <95DC744F-63A0-4C3B-A45C-DF746FFDB063@evertype.com> Yes, and this usage is explained on the page (as it has been since 2006). > On 12 Jul 2015, at 07:09, Peter Constable wrote: > > Is there a significance to the colours in the table? > > Peter Michael Everson * http://www.evertype.com/ From umesh.p.nair at gmail.com Sat Jul 11 11:17:23 2015 From: umesh.p.nair at gmail.com (Umesh P N) Date: Sat, 11 Jul 2015 09:17:23 -0700 Subject: a mug In-Reply-To: <2124139245.13765.1436628373232.JavaMail.www@wwinf1f26> References: <2124139245.13765.1436628373232.JavaMail.www@wwinf1f26> Message-ID: On Sat, Jul 11, 2015 at 8:26 AM, Marcel Schneider wrote: > On Sat, Jul 11, 2015, Julian Bradfield wrote: > > > I feel the following mug says something about a popular topic of > > debate on this list... > > As I feel concerned too, I'd like (I ?) to underscore that the designer of > this mug seems to be insulting Unicode implementers and developers. > Given the mass of popular characters that are already well rendered across > platforms, and the huge sets of *new* items that are constantly adding, > blaming people of not having done their job is doing no good. > Henri Bergson has observed ?: Laughter is purely cerebral: being able to laugh seems to require a > detached attitude, an emotional distance to the object of laughter > ?. > (A well-known example is laughing when somebody falls down over a banana peel?. We can't laugh if the fall was serious and causes the person some injury, thus making us emotionally attached to the person.) So, having a strong emotional attachment to unicode can make this kind of joke offensive. I found it as funny as the CSS mug . (Some version of this mug has the pun overflow:hidden also specified.) I don't know the people who maintain the CSS standards and the developers of various browsers and tools get heavily offended by that mug. Satire and cartoons exaggerate minor things that helps making the object better and healthier. We are not dictators who cannot tolerate criticism and satire. - Umesh -------------- next part -------------- An HTML attachment was scrubbed... URL: From charupdate at orange.fr Mon Jul 13 04:15:54 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Mon, 13 Jul 2015 11:15:54 +0200 (CEST) Subject: a mug Message-ID: <1005356845.3994.1436778954130.JavaMail.www@wwinf1h12> On Sat, Jul 11, 2015, 18:15, Daniel B?nzli wrote: > On Sat, Jul 11, 2015, 16:26, Marcel Schneider a ?crit : > > As I feel concerned too, I'd like (I ?) to underscore that the designer of this mug seems to be insulting Unicode implementers > > Being one of these I would like to tell you that I feel absolutely not insulted by this mug. > > I find it rather funny as it actually reflects a reality you can expect to see more and more. Given the sheer volume of characters that are being added to the standard you can't expect font designers to cater for all of them. And this is actually due to the very definition of Unicode itself whether you like it or not. On Sat, Jul 11, 2015, 18:17, Umesh P N wrote: > Henri Bergson has observed: > Laughter is purely cerebral: being able to laugh seems to require a detached attitude, an emotional distance to the object of laughter. > > (A well-known example is laughing when somebody falls down over a banana peel.? We can't laugh if the fall was serious and causes the person some injury, thus making us emotionally attached to the person.) > > So, having a strong emotional attachment to unicode can make this kind of joke offensive.? I found it as funny as the CSS mug. (Some version of this mug has the pun java-script:hidden also specified.) I don't know the people who maintain the CSS standards and the developers of various browsers and tools get heavily offended by that mug. > > Satire and cartoons exaggerate minor things that helps making the object better and healthier.? We are not dictators who cannot tolerate criticism and satire. I see that taking it serious I was very wrong, and I thank all who answered on this thread, for having helped to put things into perspective. Of course everybody may feel free to laugh. There are just two problems about. First, as Umesh points out quoting Bergson, this implies some lack of empathy. Abb? Pierre never laughed, as he has discovered about himself in an interview. Personally, I do, unfortunately, even too much. However, and this is the second problem, one should not mix up responsibilities and then laugh at the wrong body, because here's where satire ends and injustice is starting. As Johannes Bergerhausen pointed it out a little later: On Sat, Jul 11, 2015, 18:44, Johannes Bergerhausen" wrote: > Yes, the mug is funny. > > It shows not a Unicode problem, it points at a general font problem of operating systems. > > Dear Apple, Dear Google, Dear Microsoft: please give us *all* missing Unicode glyphs right inside your operating systems! > > As I said at TEDx in Vienna: > www.youtube.com/watch?v=IRdupNXpm8k > > So, better would be: > > I [] Apple. > I [] Google. > I [] Microsoft. If people (including me) took the pain of installing some complete fonts and setting the fallback behavior of the app (if feasible), they would not experience any longer the oddities this satirist seems to be laughing at while making (hateful?) insinuations. But they?re too busy with designing mugs... It's roughly the same problem with the CSS and UTF-8 malfunctioning that is laughed at with the other merchandising items brought in by Umesh: http://www.zazzle.com/cheap_css_is_awesome_mug-168565401817501350 http://www.zazzle.com/css_is_awesome_with_java-script_mug-168685521846695550 and Karl Williamson (On Sat, Jul 11, 2015, 19:42): http://i1.cpcache.com/product/27297813/utf8_value_tshirt.jpg Personally the only time CSS was awesome to me is when I'd written bad code. In truth, CSS is very smart and allows browsers to adapt the box width to the content, if not hindered in doing so by some fixed-width. We can write bad code in any language, but then we should rather laugh at our own incapacity. Idem with charsets. The only time I saw UTF-8 like on the T-shirt, was when opening UTF-8 files that didn't specify charset=UTF-8. The thing to do was to add the charset in the file header. Of course one can make T-shirts about that, but people wearing them while meaning to be laughing at Unicode Transformation Format, are more likely to get other people laughing at themselves for not knowing how to begin an HTML file, isn't it? I feel concerned because I recently published on this list (WORD JOINER vs ZWNBSP) some harsh criticism about a word processor that hadn?t implemented U+2060 WORD JOINER, which displays as a kind of .notdef box unless the font is set to Segoe UI Symbol. ?I?am concerned to mention that this very valuable workaround has been provided on this List by Mr?Doug?Ewell (on Tue, Jun 30, 2015). I wouldn?t have got by myself the idea to look for U+2060 in Segoe UI Symbol. This works also for U+205D TRICOLON. When I insert the tricolon and the quadricolon U+205E side by side in Segoe UI Symbol, and then switch the font to Arial, the tricolon is replaced with a .notdef box on my version of Word. The behavior of LibreOffice?4.2.4.2 this time is exactly the same except for the .notdef box, which in that case is *not* displayed on LibreOffice, letting me unaware of the missing tricolon! Well, I?m likely to restart, making my first reply turn out to be a kind of lenification... About why I come up with the tricolon-quadricolon (VERTICAL FOUR DOTS) issue, there is to say that I?wanted to use ? as a representation of U+2060, and ? for U+FEFF. Now I must use a common colon for this. (|, ? and ?? are already taken. Alternate ideas are welcome.) All those mischiefs, I fully agree, are clearly all about implementation and particularly, about font support and fallback handling, and nothing about Unicode. Best regards, Marcel P.S.: For the case that future readers stumble on this thread by a Google or Bing search (and because I hope so mean a mug won?t find many buyers), I should have mentioned the topic: an ?I ? UNICODE? mug where the heart symbol (U+2665) is replaced with a .notdef box: The product designation is ?I <3 UNICODE!?, insinuating that for exapmple emojis still aren?t converted to pictures. The message as I decrypt it, is: ?Unicode implementations are so uncomplete that I can?t use the Unicode characters I?d like to; consequently I cannot like/love Unicode.? ?BTW I find the expression is rather clumsy, as this one is inserted (and displayed!) by Alt+3 on every Windows numpad. And here are CSS and UTF-8: -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Mon Jul 13 05:53:25 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Mon, 13 Jul 2015 12:53:25 +0200 Subject: a mug In-Reply-To: <1005356845.3994.1436778954130.JavaMail.www@wwinf1h12> References: <1005356845.3994.1436778954130.JavaMail.www@wwinf1h12> Message-ID: 2015-07-13 11:15 GMT+02:00 Marcel Schneider : > It's roughly the same problem with the CSS and UTF-8 malfunctioning that > is laughed at with the other merchandising items brought in by Umesh: > http://www.zazzle.com/cheap_css_is_awesome_mug-168565401817501350 > > http://www.zazzle.com/css_is_awesome_with_java-script_mug-168685521846695550 > and Karl Williamson (On Sat, Jul 11, 2015, > 19:42): > http://i1.cpcache.com/product/27297813/utf8_value_tshirt.jpg > > Personally the only time CSS was awesome to me is when I'd written bad > code. In truth, CSS is very smart and allows browsers to adapt the box > width to the content, if not hindered in doing so by some fixed-width. We > can write bad code in any language, but then we should rather laugh at our > own incapacity. > > Idem with charsets. The only time I saw UTF-8 like on the T-shirt, was > when opening UTF-8 files that didn't specify charset=UTF-8. The thing to do > was to add the charset in the file header. > Or simply add a leading BOM. All browsers will autodetect it. This only concerns HTML files (on a local filesystem). BOMs are not recommended for UTF-8 encoded javascripts: if your HTML local file references a local javascript file, it can specify the expected file type in addition to the local URL of the script file itself: this is an HTML attribute to add to the HTML "script" element. If your page needs to perform JSON requests, the JSON is normally served by a webserver that will deliver the MIME type and charset in metadata. Some JSON parsers can also be set to autodetect the BOM and then discard it from the visible content. That's just the first 3 bytes to check in the input stream before sending the stream data to the parser which can then be instantiated and initialized directly with the correct charset. For pages served by webservers, you add it in the metadata of your shared folder to associate some files with MIME types. This can even be a global setting of the server if all your pages and scripts are UTF-8 encoded, or this can be set on the main folder and changed for specific folders for files that should not be sent with the UTF-8 MIME metadata but with another charset. Or you can add the autodetection feature in Apache which will autodetect the BOM in the file, then serve the UTF-8 file without this leading BOM but with the corrected filesize and the correct MIME type with its charset extension. It is more complicate for files hosted on FTP as there's no MIME metadata: for that the BOM is still the easiest option (but it will be up to the FTP client to perform the autodetection. Autodetecting a BOM is much more efficient than autodetecting an HTML meta tag in the header (this requires aborting the curent parsing in the middle and restart it, this uses more memory that will need to be garbage collected, and requires some miliseconds and more CPU resources as HTML parsers are very costly in terms of CPU-processing).. If you place the charset in a meta tag of the HTML page, make sure that this tag is near the begining of the HTML header (it should be fully within the first 4KB, and even before the mandatory element). In my opinion this meta tag should ve the first child element of the <head> element which is otself the first element of the <html> element that immediately follows the optional HTML doc type declaration. If your page is XHTML, you should use the leading XML declaration line to put that charset indication: putting the indication in the first 4KB allows some charset guessers to identify the charset faster without actually starting to instanciate a parser and abort it in the middle. 4KB is typically the size of a single memory page, so that page will remain in CPU/bus caches without using paging I/O. The CPU cost will be minimal if the charset can be autodetected very early in a few nanoseconds by just scanning the content of a single memory page. 4KB is much large enough so that any placement of the autodetected signatures will succeed without having to wait for long. Actually I even think that the tag should be in the first 1400 bytes (to match the maximum size of a single TCP packet with the smallest MTU:it will minimize the networking I/O delays: aborting a parser and restartging it has a significant processing time that could delay even more the processing of the next TCP packet, which coudl then be paged out by the OS if there are concurrent networking streams used by concurrent processes, such as large file downloads or an active streamed video). I just wonder why HTML5 did not deprecate the old meta tag of HTML4 in favor of an attribute directly in the <html> root element, or even in its recommended DOCTYPE declaration. But if you use the abbreviated HTML5 doctype line, its default should be UTF-8 and no indeication is necessary (charset guessers should not be used with HTML5, except in case of parsing failure only as a possible recovery solution, in which case the meta tag may be processed. If there's no parsing error for the main document, excluding all other referenced documents suc has scripts or inner frames, the meta tag should better be ignored even if its present and specifies something else). May be in some future, there will be an HTML6 that enforces the use of a single charset and possibly a more compact encoding. We've seen similar radical changes including for core protocols such as HTTP(S) itself. this could become a single unified protocol mixing this new generation HTTP and HTML capabilities, but with more capabilities such as dynamic parallel streams, encryption, authentication, simplified and more efficient data signature, real time constraints and QoS management of streams for web applications, and a more efficient support for encapsulated binary data (notably audio/video/images, or even nearly native executable scripts, precompiled by the server for the target client when its processing capabilities are constrained, notably smartphones to save energy in their battery). That future of HTML will focus muich more on its API, the effective encoding may be autoadapted or negociated and cached (given that we need security now everywhere on the web, negociation protocols are already used: this is for now just for authenticating and exchanging encryption pairs, but it could negociate in the same roundtrip some presentation formats such as the MIME type and charset encoding, compression levels, and binary compatibility of the clients for receiving precompiled executable contents, or for sharing tasks and CPU/GPU resources or local/remote storage, or synchronization of cached data) --- We'll rapidely need in the future a true "network-centered OS" where applications can run on one or more devices in parallel, owned by the client or by the service provider, and allowing on-demand allocation and sharing of processing ressources available locally or remotely. On that OS, there will no longer be the concept of a host (or it will just be a virtual delocalized host), the concept of "local" may be replaced by the concept of personal user environment which will autoadapt to the capacilities of devices around him and the available networking bandwidths. At that time, this virtualized OS will certainly be 128-bit (and not 64-bit as of today), and it will manage many terabytes of virtual memory, including the environment of other users located anywhere. Clients and servers will share or demand resources to that network dynamically and the core element of this OS will be to manage caches, automatic synchronization, and bandwidth allocations, and nobody will know "where" the code is actually running physically. All devices will then exchange indifferently code or data, or will perform computing tasks delegated to them by other members in the network (including transformation codecs). The network OS will provide the necessary isolation for security and the architecture will be more peer-to-peer, working in a collaborative grid computing architecture. It will be also failure resistant, with implicit backup/replication. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150713/2baf72c4/attachment.html> From richard.wordingham at ntlworld.com Wed Jul 15 02:49:13 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Wed, 15 Jul 2015 08:49:13 +0100 Subject: Mark-up to Indicate Words Message-ID: <20150715084913.2b66392e@JRWUBU2> What mark-up schemes exist to show that a sequence of letters and combining marks constitutes a single word? Such mark-up would be useful when using spell checkers. At present, I use U+2060 WORD JOINER (WJ) to indicate the absence of a word boundary. (Systematic marking of boundaries using ZWSP is not popular with users, and is normally not used in Thai - it's not supported in their national or Windows 8-bit encodings.) However, it seems likely that when Unicode 8.00 is defined in August, WJ will suppress line breaks but not word breaks. There would still be the limitation that mark-up is not available in plain text. It appears that, for example, Open Document Format has no mark-up to indicate word boundaries, relying instead on the overrides of the word boundary detection algorithms being stored at character level. Richard. From charupdate at orange.fr Wed Jul 15 04:06:41 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Wed, 15 Jul 2015 11:06:41 +0200 (CEST) Subject: Input methods at the age of Unicode (was: a mug) Message-ID: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12> On Sat, Jul 11, 2015, at 20:54, Hans Aberg wrote: > On 11 Jul 2015, at 18:36, Johannes Bergerhausen wrote: >> >> As I said at TEDx in Vienna: >> [https://www.youtube.com/watch?v=IRdupNXpm8k] > The keyboards for different languages are essentially the same nowadays: it sends a code indicating which button is acted on and whether it is depressed or released. The computer then translates using a key map. So for a Cherokee keyboard, as discussed in the video, one would need different images on the keys if one bothers, and a key map. > One problem here is that is that it is very time consuming to design such key maps. This is another shortcoming of Unicode usage: lack of input methods, in addition to the font issue. I fully agree. These keyboard updates are consistent with Microsoft?s new corporate ambition which consists in empowering people to achieve more, Microsoft?s CEO Satya?Nadella wrote to All?Employees on July 10, 2014 at 6:00 a.m. PT http://bit.ly/1wRIBqD If we understand the goal as a relative one, users will be allowed to do more than during the past few decades. Obviously, better keyboard UIs are essential in this process. We are today mainly still using inherited ANSI keyboards, despite of using Unicode characters. Overcoming this discrepancy is urgent, and I believe that at development level, this is very easy (though it may be time consuming, as Johannes warns us). Whether it is easy at users? level too, depends on the amount of novelty packed into the keymap. In Cherokee, users now would probably be learning to use casing, due to the script?s new extension to bicamerality. By contrast, to convert all US American Standard keyboards to Unicode keyboards, nothing else is needed than replacing the spacing Grave with the Letter Apostrophe, and the right-hand Alt key with a Compose key, acted by the right-hand thumb. The need of U+02BC in English results from evidence accessible by last month?s thread ?A new take on the English apostrophe in Unicode?. For example, users who want to input smart quotes without an algorighm may then type Compose, {, ", for an opening quotation mark, or Compose, ], ', for a closing single-quote. Compose, Letter Apostrophe, a, brings ?. This principle extends to all Latin letters and punctuations (about two thousand, if my estimation is correct). There will then be no more separate US?International keyboard layout. That layout seems not to be determined by efficiency but by it?s creation environment (seemingly excluding dead key chaining), as well as by IBM?s choice not to copy Digital?s Compose key (but the inverted T arrow keys and six miscellaneous only). The US?Intl is so bad it cannot be currently kept in use, Mark?Davis explained on Sun Jul 18 1999 - 13:47:47 EDT http://www.unicode.org/mail-arch/unicode-ml/Archives-Old/UML017/0558.html). The set of all Latin letters is thus made available thanks to the chained dead keys implementation of the Compose functionality. On the other hand, designing key maps for any alphabetical language on earth appears to be rather easy. Much easier and probably far less time consuming in any case than writing some other software. Writing keyboard drivers is essentially editing key defines, allocation tables, and deadtrans function lists. The latter two are best done with spreadsheet software. At the condition that spreadsheet software (e.g. Excel?2010?Starter) is used, the job is much less complicated than it ordinarily may have the reputation. Because good keyboard layouts have long deadlists, and these are not efficiently edited with ordinary keyboard editing software UIs. Keyboard layout sources in software format too may be edited in spreadsheets and lead to good results if the deadkey chaining flag is accessible. On Windows this is the case in KbdEdit, but the object modules (drivers) compiled by this software are proprietary and therefore cannot be effectively shared. Editing keyboard layouts is a job anybody can tackle who is willing to spend some time for a useful work (as opposed to a set of leisures like gaming, chasing and the like). Nothing is needed that would not be publicly available. There?s nothing to wait for. Good luck, Marcel P.S.: There?s a new version of the Compose Key article in Wikipedia: https://en.wikipedia.org/wiki/Compose_key To quickly resume the advantages of the new US English Unicode keyboard layout and the similar UK English Unicode keyboard layout: - Backward compatibility: Simply consider that the engraved Grave now stays for a curly apostrophe (which is very approximate but avoids keycap stickers). - Application compatibility: The smart quotes algorithm stays working for what it is made for, and stops to be sollicited for what it isn?t made for: simulating apostrophes in all positions, including leading apostrophes. - Adaptability: The user recovers full autonomy and can now decide by himself whether he wants an apostrophe or a quotation mark. No more workarounds are needed. - Efficiency: The reintroduced Compose key, on right Alt, is a super dead key which allows to type huge sets of characters without much memorization, while the nearly useless** Grave accent key position becomes suddenly useful again. - Efficacy: No more spaces needed to type apostrophes and quotes, no key is hijacked for a dead key any longer, except the otherwise rather useless right Alt key (a double of the left, and on the wrong side of the space bar for Alt+NumPad). No more confusion with Ctrl+Alt application shortcuts, like AltGr used to create on Windows, while AltGr can be made available in a safe emulation thanks to a Shift?+?Right?Alt dead key. - Quality: Resulting text files are much more useful than versions that mix up apostrophes and single closing-quotes. For computer processing, paired punctuations and unpaired punctuations must be clearly distinct, regardless of any glyphic resemblance, and even more as in real English, the apostrophe has not punctuation status but letter status. **I know that because the Grave is on the keyboard, it is used in markup and perhaps in programming (seemingly not in C/C++). On a Unicode keyboard, a Space following a diacritic dead key chain inserts the combining diacritic (which is against the inherited rule, dating from before combining diacritics were encoded). As on a Unicode keyboard, Shift+Space should be NBSP, spacing diacritics are inserted when the diacritic is followed by NBSP. Both behaviors are already implemented for Mac?OS?X: http://uscustom.sourceforge.net/. In current writing, spacing diacritics are generally much less useful than combining ones. To speed up the insertion of the spacing Grave, we might use Compose, s (for Spacing), g (for Grave). Likewise we would have spacing Acute (sa), Cedilla (sc) and Little Tilde (?st? or ?slt?, not ?lt? which is already taken). Along with this, word processor updates must extend the smart quotes algorithms to support the correct handling of the apostrophe. This too is rather easy to implement: * Extended autocorrect settings will allow users to specify whether the most used squiggle is apostrophe or single quotation mark, and whether the apostrophe be U+02BC or U+2019. These toggles should be actionable by customizable keyboard shortcuts, and an info bubble and/or a flag will show what?s on. * Conforming to Ted?Clancy?s proposal, a new Option setting will empower users to dedicate the Apostrophe key to the apostrophe *exclusively*, and to use the Quotation mark key for *all* quotation marks, whether they be double or simple. This is indeed feasible in English (otherwise as I thought when replying in the thread ?A new take on the English apostrophe in Unicode?, and otherwise as in good French and German usage where angle quotation marks are used for quotations, vs comma quotation marks for scares [using angle quotes as scare quotes is bad practice]). * Automatic quotes pairing therefore will insert matching characters at input, and check pairing at revision. * Multiple stroke with circular output will insert the most used quotation mark after the Quotation mark key is hit one time, and the other after two times. The most used is set in the options. For example, in American English, the user may choose to get single quotes first because he?s a scientist and needs to mark many words, while he may switch to double quotes first when writing litterary text. The same should be available for the Apostrophe key: whether leading apostrophe or quotation mark after one stroke, the other one after two strokes, and an appropriate sequence of both after three keystrokes. Hitting the key again will restart the cycle, and so forth. An info bubble, or colored display as suggested by William Overington on Fri, Jun?05, 2015, 11:48, could disambiguate apostrophe and quote. Alternately the letter apostrophe may be displayed on the customizable ?field? color as are NBSP and WJ on LibreOffice. * New Help sections may be invoked for ready information about the usefulness of Letter Apostrophe and the features facilitating its usage. We must depart from the comfortable idea about users who are meant to be unwilling to spend any thought about why and how to distinguish two characters that look identical. This idea should be considered as respectless (despiteful, I would say), and IMO this idea is probably just a mean pretext for reducing production costs by lowering the product quality. (The product being the word processor, e.g. Microsoft Word.) * An optional dialog will display every time there is an ambiguity, that is when a leading apostrophe is typed, and also when a trailing apostrophe is typed while a marked quotation is open (after an opening single quotation mark). This dialog may ask ?Do you wish to type an apostrophe?? or alternately, ?Is this a quotation mark??. The choice may be set with Tab, and validated with Space. * Users who wish to keep mixing up, will be welcome to do so (???Don?t ask me again?). This choice may be cancelled in the Settings (??Distinguish apostrophes and quotation marks; ??Display the apostrophe dialog). For subscribers who have read until here and who agree to read forth, I?m concerned to note that any criticism is rather easily uttered as long as the default seems to be on the side of Unicode, a fact that would explain why Unicode bashing is meant to be so popular that we can find it even on mugs (see the parent thread of this), as if we were meant to take pleasure in repeating to ourselves every morning at breakfast that our universal charset is still useless and won?t work before a long time. By contrast, as soon as the responsibilities end up to be shifted from the Consortium to its most powerful members, as are Apple, Google, Microsoft, especially the latter, only very few persons carry on. In this paragraph I would like to vent more and try to debrief the Apostrophe thread, but I fear that would be too long and tiresome. I just mention that many persons are monitoring this Mailing List who know exactly why Unicode decided to recommend U+02BC for the English apostrophe, and who know exactly how things happened when U+02BC was discarded to the benefit of U+2019, but that nobody conceded to disclose these pieces of information, neither when the information written up by Ted?Clancy was submitted by a Mailing List subscriber, nor when I shared the results of my decrypting early NamesList versions. Consistently, I ended up to be blamed of knowing little about. Now I try again to learn more by submitting the following three questions: 1. Why had the UTC recommended U+02BC as apostrophe? 2. Why has the UTC withdrawn its recommendation? 3. On whose demand the UTC moved the information about the preferred character for apostrophe from U+02BC to U+2019? Answering these three questions is essential for a thorough understanding of history, which will reinforce the bases of keyboard reengineering as it must be carried on at this juncture of imminent Windows?10 release. Best regards, Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150715/b6fdd057/attachment.html> From duerst at it.aoyama.ac.jp Wed Jul 15 06:18:09 2015 From: duerst at it.aoyama.ac.jp (=?UTF-8?Q?Martin_J._D=c3=bcrst?=) Date: Wed, 15 Jul 2015 20:18:09 +0900 Subject: Mark-up to Indicate Words In-Reply-To: <20150715084913.2b66392e@JRWUBU2> References: <20150715084913.2b66392e@JRWUBU2> Message-ID: <55A64171.9070600@it.aoyama.ac.jp> Hello Richard, On 2015/07/15 16:49, Richard Wordingham wrote: > What mark-up schemes exist to show that a sequence of letters and > combining marks constitutes a single word? > > Such mark-up would be useful when using spell checkers. At present, I > use U+2060 WORD JOINER (WJ) to indicate the absence of a word boundary. > (Systematic marking of boundaries using ZWSP is not popular with > users, and is normally not used in Thai - it's not supported in > their national or Windows 8-bit encodings.) However, it seems likely > that when Unicode 8.00 is defined in August, WJ will suppress line > breaks but not word breaks. There would still be the limitation that > mark-up is not available in plain text. > > It appears that, for example, Open Document Format has no mark-up to > indicate word boundaries, relying instead on the overrides of > the word boundary detection algorithms being stored at character level. I'd suggest looking at higher-end formats such as DITA or TEI (Text Encoding Initiative). Regards, Martin. > Richard. > . > From haberg-1 at telia.com Wed Jul 15 09:07:12 2015 From: haberg-1 at telia.com (Hans Aberg) Date: Wed, 15 Jul 2015 16:07:12 +0200 Subject: Input methods at the age of Unicode In-Reply-To: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12> References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12> Message-ID: <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com> > On 15 Jul 2015, at 11:06, Marcel Schneider <charupdate at orange.fr> wrote: > Editing keyboard layouts is a job anybody can tackle who is willing to spend some time for a useful work (as opposed to a set of leisures like gaming, chasing and the like). In mathematics, there are a couple of thousands of characters, including Latin and Greek styles, which would take some time to develop a key map for. From petercon at microsoft.com Wed Jul 15 16:03:08 2015 From: petercon at microsoft.com (Peter Constable) Date: Wed, 15 Jul 2015 21:03:08 +0000 Subject: ISO 15924 In-Reply-To: <95DC744F-63A0-4C3B-A45C-DF746FFDB063@evertype.com> References: <B56ECD1A-F1E9-4081-BFD3-CC00116F27A9@evertype.com> <BL2PR03MB114B8CD1566A476DC804C2CD59D0@BL2PR03MB114.namprd03.prod.outlook.com> <95DC744F-63A0-4C3B-A45C-DF746FFDB063@evertype.com> Message-ID: <BLUPR03MB1207BC0324C68FDC0C46759D59A0@BLUPR03MB120.namprd03.prod.outlook.com> I don't see an explanation of the pale yellow or pale green shading. Also, re this: "All changes are displayed in color and italics..." Every row is a change record, yet not every row (in fact no row) is entirely coloured and in italics. If what is meant is "All changed values are displayed in color and italics...", then that is still not the case: there are lots of coloured cells that do not have italics text. To me, it's all rather unclear. Peter -----Original Message----- From: Unicore [mailto:unicore-bounces at unicode.org] On Behalf Of Michael Everson Sent: Sunday, July 12, 2015 4:20 AM To: unicode Unicode Discussion; UnicoRe Mailing List Subject: Re: ISO 15924 Yes, and this usage is explained on the page (as it has been since 2006). > On 12 Jul 2015, at 07:09, Peter Constable <petercon at microsoft.com> wrote: > > Is there a significance to the colours in the table? > > Peter Michael Everson * http://www.evertype.com/ From verdy_p at wanadoo.fr Wed Jul 15 16:31:09 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Wed, 15 Jul 2015 23:31:09 +0200 Subject: ISO 15924 In-Reply-To: <BLUPR03MB1207BC0324C68FDC0C46759D59A0@BLUPR03MB120.namprd03.prod.outlook.com> References: <B56ECD1A-F1E9-4081-BFD3-CC00116F27A9@evertype.com> <BL2PR03MB114B8CD1566A476DC804C2CD59D0@BL2PR03MB114.namprd03.prod.outlook.com> <95DC744F-63A0-4C3B-A45C-DF746FFDB063@evertype.com> <BLUPR03MB1207BC0324C68FDC0C46759D59A0@BLUPR03MB120.namprd03.prod.outlook.com> Message-ID: <CAGa7JC33NZjufTWY964O1Lp_uRd96XJ+w-wjg1u-tjDZhce=oA@mail.gmail.com> pale yellow are cells that have had a change since the first publication (most of them for fixing names with better ones, less ambiguous, or changing the order of names when there are synonyms, to put the most common one at first position, or to fic minor typos when the first publication was an approximative translation that does not match the most common name : they have an history you can look at, the date indicated is the date of last modification which is different from their first release) The history is not on the table itself. 2015-07-15 23:03 GMT+02:00 Peter Constable <petercon at microsoft.com>: > I don't see an explanation of the pale yellow or pale green shading. > > Also, re this: > > "All changes are displayed in color and italics..." > > Every row is a change record, yet not every row (in fact no row) is > entirely coloured and in italics. If what is meant is "All changed values > are displayed in color and italics...", then that is still not the case: > there are lots of coloured cells that do not have italics text. > > To me, it's all rather unclear. > > > Peter > > -----Original Message----- > From: Unicore [mailto:unicore-bounces at unicode.org] On Behalf Of Michael > Everson > Sent: Sunday, July 12, 2015 4:20 AM > To: unicode Unicode Discussion; UnicoRe Mailing List > Subject: Re: ISO 15924 > > Yes, and this usage is explained on the page (as it has been since 2006). > > > On 12 Jul 2015, at 07:09, Peter Constable <petercon at microsoft.com> > wrote: > > > > Is there a significance to the colours in the table? > > > > Peter > > Michael Everson * http://www.evertype.com/ > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150715/b15fa3f7/attachment.html> From everson at evertype.com Wed Jul 15 17:47:06 2015 From: everson at evertype.com (Michael Everson) Date: Wed, 15 Jul 2015 23:47:06 +0100 Subject: ISO 15924 In-Reply-To: <BLUPR03MB1207BC0324C68FDC0C46759D59A0@BLUPR03MB120.namprd03.prod.outlook.com> References: <B56ECD1A-F1E9-4081-BFD3-CC00116F27A9@evertype.com> <BL2PR03MB114B8CD1566A476DC804C2CD59D0@BL2PR03MB114.namprd03.prod.outlook.com> <95DC744F-63A0-4C3B-A45C-DF746FFDB063@evertype.com> <BLUPR03MB1207BC0324C68FDC0C46759D59A0@BLUPR03MB120.namprd03.prod.outlook.com> Message-ID: <6A8D873D-8812-4098-B3B5-ED5C130DBF01@evertype.com> > On 15 Jul 2015, at 22:03, Peter Constable <petercon at microsoft.com> wrote: > > I don't see an explanation of the pale yellow or pale green shading. > > Also, re this: > > "All changes are displayed in color and italics?" Please read the next clause: ?entry additions are not given in italics." The Category of Change Key is found at the bottom of the page. > Every row is a change record, yet not every row (in fact no row) is entirely coloured and in italics. Nor should they be. A full row is an addition. Only changes are in italics. > If what is meant is "All changed values are displayed in color and italics...", then that is still not the case: there are lots of coloured cells that do not have italics text. Are any of those cells in a row marked ?Add?? Michael Everson * http://www.evertype.com/ From charupdate at orange.fr Thu Jul 16 03:29:11 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Thu, 16 Jul 2015 10:29:11 +0200 (CEST) Subject: Input methods at the age of Unicode In-Reply-To: <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com> References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12> <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com> Message-ID: <784036309.5510.1437035352057.JavaMail.www@wwinf1f21> On Sat, Jul 11, 2015, at 20:54, Hans Aberg wrote: > So for a Cherokee keyboard, as discussed in the video, one would need different images on the keys if one bothers, and a key map. > One problem here is [...] that it is very time consuming to design such key maps.? On Wen, Jul 15, 2015, at 16:07, Hans Aberg wrote: > > On 15 Jul 2015, at 11:06, Marcel Schneider wrote: > > > Editing keyboard layouts is a job anybody can tackle who is willing to spend some time for a useful work (as opposed to a set of leisures like gaming, chasing and the like). > > In mathematics, there are a couple of thousands of characters, including Latin and Greek styles, which would take some time to develop a key map for. That is of course a hard piece of work. For mathematical symbols, rather than a keymap, I'd prefer a Compose tree. For natural languages like Cherokee, Spanish, Welsh, English, or all languages together that use a given script, like Cyrillic, Greek, or Latin, developing keymaps is a very grateful job, regardless of the time we finally spend on, because the results will be useful to many people?at the condition that the results are good. Now, the better a keymap, the more it's likely to need time and personal investment (that is, we need to spend supplemental thinking time, additionally to the worktime). Obviously we can't rely on Apple, Google and Microsoft for doing this job, they simply *cannot* afford to spend so much time, which in this case is money, to develop absolutely free products that will never pay back all that money. By "pay back all that money" I mean that e.g. Microsoft would sell more Windows licenses for the sake of all the ultra-performative keyboard layouts the OS will be shipped with. I?don't believe that things could happen this way. First, Windows will now be distributed as a free update; second, OEMs *cannot* afford neither to raise computer prices for the sake of keyboard layouts; third, these keyboard drivers are so transparent by nature that de facto they're open source; fourth, the goal being that *everybody* come into the benefit of those keyboard layouts, they *must* be shared for free; and last but not least, a keyboard driver is not a good spot to place ads. This is why *everybody* is invited to tackle this job. The idea is that when we concede to do some good with our personal time (as opposed to gaming or chasing, which are just two examples of time consuming activities that personally I consider as doing no good), then time will stop to be in the foreground when talking about key maps and Compose trees. Best, Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150716/8aeaed4f/attachment.html> From haberg-1 at telia.com Thu Jul 16 03:35:35 2015 From: haberg-1 at telia.com (Hans Aberg) Date: Thu, 16 Jul 2015 10:35:35 +0200 Subject: Input methods at the age of Unicode In-Reply-To: <784036309.5510.1437035352057.JavaMail.www@wwinf1f21> References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12> <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com> <784036309.5510.1437035352057.JavaMail.www@wwinf1f21> Message-ID: <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com> On 16 Jul 2015, at 10:29, Marcel Schneider <charupdate at orange.fr> wrote: > > On Sat, Jul 11, 2015, at 20:54, Hans Aberg <haberg-1 at telia.com> wrote: > > > On 15 Jul 2015, at 11:06, Marcel Schneider <charupdate at orange.fr> wrote: > > > > > Editing keyboard layouts is a job anybody can tackle who is willing to spend some time for a useful work (as opposed to a set of leisures like gaming, chasing and the like). > > > > In mathematics, there are a couple of thousands of characters, including Latin and Greek styles, which would take some time to develop a key map for. > > That is of course a hard piece of work. For mathematical symbols, rather than a keymap, I'd prefer a Compose tree. One still has to figure out a good map. Using Unicode helps the readability of the input file, though. One can use for example ConTeXt with LuaLaTeX, which comes with the TeX live installation. From charupdate at orange.fr Thu Jul 16 04:21:23 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Thu, 16 Jul 2015 11:21:23 +0200 (CEST) Subject: Input methods at the age of Unicode In-Reply-To: <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com> References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12> <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com> <784036309.5510.1437035352057.JavaMail.www@wwinf1f21> <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com> Message-ID: <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21> On 16 Jul 2015, at 10:35, Hans Aberg wrote: > One still has to figure out a good map. > > Using Unicode helps the readability of the input file, though. One can use for example ConTeXt with LuaLaTeX, which comes with the TeX live installation. Thank you very much for these hints, I'll try to apply them. Actually I stick with a rather common set of characters on the key map except that I've added U+2610, which is very useful, even more when it's a part of the dead lists as a base character, and several additional exotic currency symbols as a mark of respect. Backwards compatibility leads to limit the number of key positions. From eight per key I've come back down to four, and from a dozen or more dead keys (and a maximum of about twenty-five or thirty) back to five plus the Compose key (one key with four dead key positions: Compose, AltGr, Greek, Secondary group?with respect to ISO 9995). But with one Compose key we've potentially as many dead keys as there are key positions on the rest of the keyboard, and each one of them can give access to as many again. I?believe that the future of keyboards is as well in the Compose tree as in the key map, or even more. The file format of my source files is UTF-8, however the compiler admits clear characters only up to U+008F. From U+00A0 upwards, we must use code points. For readability I?add Unicode characters in the trailing comments, as well as automatically added Unicode character identifiers (names), along with as much comments as we want. Doing all in spreadsheets allows to semi-automatically derive HTML tables without needing any other software than a text editor. Now I've just downloaded the two versions of ConTEXT, which might well be the enhanced text editor I'm looking for since a while. LuaLaTeX will be very interesting too if I can edit source files with (however the bulk job is done in spreadsheet software which is Unicode; actual versions include even the UNICAR and UNICODE functions). I'll try if ConTeXt recognizes the Kana shift states (Gedit seemingly doesn't). Have a great day, Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150716/083f134f/attachment.html> From charupdate at orange.fr Thu Jul 16 04:53:54 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Thu, 16 Jul 2015 11:53:54 +0200 (CEST) Subject: Input methods at the age of Unicode In-Reply-To: <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21> References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12> <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com> <784036309.5510.1437035352057.JavaMail.www@wwinf1f21> <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com> <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21> Message-ID: <105482380.7936.1437040434492.JavaMail.www@wwinf1f21> On 16 Jul 2015, at 11:30, I wrote: > the compiler admits clear characters only up to U+008F. Up to U+007E, of course. On 16 Jul 2015, at 10:35, Hans Aberg wrote: > One still has to figure out a good map. Yes this is the primary issue for every newly encoded script, and it remains important with respect to ergonomics. I just wanted to say that I'm focussing on the Compose tree of a Latin keyboard layout. Do you mean that the US American English keymap should be thoroughly reengineered too, additionally to the solutions of ANSI, ISO, and August Dvorak? I think that on the ANSI/ISO keyboards it would be sufficient to remove the dead keys, change T29/E00 from 0x0060 to 0x02bc, and replace VK_RMENU with a Compose key. It's a bit more complicated however to get a simple *and* complete keymap for France, and surely a number of other countries using diacrited characters. ? Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150716/20b8cd6a/attachment.html> From haberg-1 at telia.com Thu Jul 16 06:12:48 2015 From: haberg-1 at telia.com (Hans Aberg) Date: Thu, 16 Jul 2015 13:12:48 +0200 Subject: Input methods at the age of Unicode In-Reply-To: <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21> References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12> <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com> <784036309.5510.1437035352057.JavaMail.www@wwinf1f21> <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com> <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21> Message-ID: <860A971A-C656-499D-BB2C-CC8AB11E6688@telia.com> > On 16 Jul 2015, at 11:21, Marcel Schneider <charupdate at orange.fr> wrote: > > On 16 Jul 2015, at 10:35, Hans Aberg <haberg-1 at telia.com> wrote: > > > Using Unicode helps the readability of the input file, though. One can use for example ConTeXt with LuaLaTeX, which comes with the TeX live installation. > > Thank you very much for these hints, I'll try to apply them. > Now I've just downloaded the two versions of ConTEXT, which might well be the enhanced text editor I'm looking for since a while. LuaLaTeX will be very interesting too if I can edit source files with (however the bulk job is done in spreadsheet software which is Unicode; actual versions include even the UNICAR and UNICODE functions). It is simplest to just download the whole Tex Live: https://www.tug.org/texlive/ There is special package for OS X. Though large, the main distribution lives in a single directory, so it is easy to throw away. > I'll try if ConTeXt recognizes the Kana shift states (Gedit seemingly doesn't). It seems to depending on the font: When trying a OS X systems Arabic font, the ligatures where broken. However, when trying Khaled Hosny's <http://www.amirifont.org/>, it seemed working. There is a ConTeXt users list <http://www.ntg.nl/mailman/listinfo/ntg-context>, as well as support pages <http://wiki.contextgarden.net/> From haberg-1 at telia.com Thu Jul 16 08:20:11 2015 From: haberg-1 at telia.com (Hans Aberg) Date: Thu, 16 Jul 2015 15:20:11 +0200 Subject: Input methods at the age of Unicode In-Reply-To: <105482380.7936.1437040434492.JavaMail.www@wwinf1f21> References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12> <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com> <784036309.5510.1437035352057.JavaMail.www@wwinf1f21> <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com> <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21> <105482380.7936.1437040434492.JavaMail.www@wwinf1f21> Message-ID: <D5B9129B-544D-4F6B-AA7F-4BE19E60C6BF@telia.com> > On 16 Jul 2015, at 11:53, Marcel Schneider <charupdate at orange.fr> wrote: > On 16 Jul 2015, at 10:35, Hans Aberg <haberg-1 at telia.com> wrote: > > > One still has to figure out a good map. > > Yes this is the primary issue for every newly encoded script, and it remains important with respect to ergonomics. > > I just wanted to say that I'm focussing on the Compose tree of a Latin keyboard layout. > > Do you mean that the US American English keymap should be thoroughly reengineered too, additionally to the solutions of ANSI, ISO, and August Dvorak? It may suffice with a logical layout, letters in alphabetical order. The traditional layouts were designed for speed typing on physical typing machines, specifically, with fixed finger positioning, in order not having look at the keyboard while typing. Speed typing is not so important in these days, as it is mostly for secretaries that write down material in other format. And the computer keyboard does not have the physical limitation of mechanical typewriters. It is also considerably faster with moving finger positioning, which can be done if one does not have too look at some text while typing. From haberg-1 at telia.com Thu Jul 16 08:26:19 2015 From: haberg-1 at telia.com (Hans Aberg) Date: Thu, 16 Jul 2015 15:26:19 +0200 Subject: Input methods at the age of Unicode In-Reply-To: <5596365.26347.1437045209046.JavaMail.defaultUser@defaultHost> References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12> <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com> <784036309.5510.1437035352057.JavaMail.www@wwinf1f21> <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com> <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21> <5596365.26347.1437045209046.JavaMail.defaultUser@defaultHost> Message-ID: <68383FA8-F314-4439-862D-59E03710FE2F@telia.com> > On 16 Jul 2015, at 13:13, William_J_G Overington <wjgo_10009 at btinternet.com> wrote: > I do not know if it is of interest, but some time ago I produced some pdf files that can each be used as a typecase so as to copy a character from the pdf, then paste into a Unicode-aware wordprocessor or desktop publishing program and then formatted to the desired font and font size. On OS X there is a ?Character Viewer?, which has a similar purpose. One has access to all of Unicode, and can click on characters to get them pasted into the text. One can use special categories and also make ones own table. But it is slow. From charupdate at orange.fr Thu Jul 16 09:44:20 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Thu, 16 Jul 2015 16:44:20 +0200 (CEST) Subject: Input methods at the age of Unicode In-Reply-To: <860A971A-C656-499D-BB2C-CC8AB11E6688@telia.com> References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12> <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com> <784036309.5510.1437035352057.JavaMail.www@wwinf1f21> <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com> <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21> <860A971A-C656-499D-BB2C-CC8AB11E6688@telia.com> Message-ID: <748934299.14990.1437057860253.JavaMail.www@wwinf1e21> On 16 Jul 2015, at 13:21, Hans Aberg wrote: > On 16 Jul 2015, at 11:21, Marcel Schneider wrote: >> >> Now I've just downloaded the two versions of ConTEXT, which might well be the enhanced text editor I'm looking for since a while. LuaLaTeX will be very interesting too if I can edit source files with (however the bulk job is done in spreadsheet software which is Unicode; actual versions include even the UNIC[H]AR and UNICODE functions). Knowing nothing about, I mixed up ConTeXt you referred to, and ConTEXT, and ended up downloading and istalling a new text editor. At least, this time, that is very useful to me, as ConTEXT will replace for me the use of Gedit, because ConTEXT handles correctly the Kana shift states (about a half of my keyboard layout). However, as it is new, the support of characters like U+2610 or simply precomposed letters with macron or double acute is not yet ensured. When I've some time left I'll write to them, because the project is very promising. > > It is simplest to just download the whole Tex Live: > https://www.tug.org/texlive/ > There is special package for OS X. Unfortunately I've no OS X machine at home nor otherwhere, nor have I Linux at home. Where I use Ubuntu I cannot install this. I'll check if there is a Windows version, but it seems to move me from my urgent goal, so it'll be for a bit later. > > Though large, the main distribution lives in a single directory, so it is easy to throw away. Nor will I throw away this software, could I install it. > >> I'll try if ConTeXt recognizes the Kana shift states (Gedit seemingly doesn't). > > It seems to depending on the font: > > When trying a OS X systems Arabic font, the ligatures where broken. However, when trying Khaled Hosny's , it seemed working. First I'll have to learn the language. This is a very valuable purpose, but it needs some time I don't have actually. > > There is a ConTeXt users list , as well as support pages I'll save, thank you. On 16 Jul 2015, at 15:20, Hans Aberg wrote: > On 16 Jul 2015, at 11:53, Marcel Schneider wrote: >> >> Do you mean that the US American English keymap should be thoroughly reengineered too, additionally to the solutions of ANSI, ISO, and August Dvorak? > It may suffice with a logical layout, letters in alphabetical order. The traditional layouts were designed for speed typing on physical typing machines, specifically, with fixed finger positioning, in order not having look at the keyboard while typing. This is an important point, not to look at the keyboard. Even with alphabetical order, one *must* learn typing. Often suggested for computers, the alphabetical order is also often rejected, because it needs much more finger move than its couterpart, the ergonomical order as proposed by August Dvorak, and very actively promoted in a French version by the association ERGODIS [http://bepo.fr/]. > Speed typing is not so important in these days, as it is mostly for secretaries that write down material in other format. And the computer keyboard does not have the physical limitation of mechanical typewriters. Yes for the hardware, but no for the need of speed typing. By the time, secretaries were almost the only people using typewriting. Today, more and more managers do their own mailing by themselves, without dicting to a secretary, while their employee manages much more than writing (as they did already by the time). Personally I wonna look at my keyboard when typing text, nor do you nor does anybody at all. > It is also considerably faster with moving finger positioning, which can be done if one does not have too look at some text while typing. I don't understand well how to speed up with moving fingers except towards the dedicated keys, the little fingers having much more of these, and the thumbs acting the central modifiers if any, and/or the central Compose key, additionally to the space bar. Central means on the Alt keys. Alt itself at this favorite position is counter-productive, it should be moved on Left Windows, this on Apps (Menu), which is not suppressed by a set of netbook manufacturers. If it is, then use the mouse/trackpad. I believe that at this juncture of imminent climate change and global destruction, we should stick with the existing hardware. For France neither I am not going to propose a completely *new* layout, I will bring something you can use by simply thinking at the little set of useful modifications, even without needing keyboard stickers. A reuse-what-you've-got concept. Best, Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150716/014e9215/attachment.html> From wjgo_10009 at btinternet.com Thu Jul 16 06:13:29 2015 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Thu, 16 Jul 2015 12:13:29 +0100 (BST) Subject: Input methods at the age of Unicode In-Reply-To: <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21> References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12> <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com> <784036309.5510.1437035352057.JavaMail.www@wwinf1f21> <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com> <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21> Message-ID: <5596365.26347.1437045209046.JavaMail.defaultUser@defaultHost> Hi I do not know if it is of interest, but some time ago I produced some pdf files that can each be used as a typecase so as to copy a character from the pdf, then paste into a Unicode-aware wordprocessor or desktop publishing program and then formatted to the desired font and font size. The following might be of particular interest. http://www.users.globalnet.co.uk/~ngo/typecase_accented_characters_for_Latvian.pdf http://www.users.globalnet.co.uk/~ngo/typecase_esperanto.pdf http://www.users.globalnet.co.uk/~ngo/typecase_hot_beverage.pdf http://www.users.globalnet.co.uk/~ngo/typecase_maltese.pdf http://www.users.globalnet.co.uk/~ngo/typecase_quotation_marks.pdf http://www.users.globalnet.co.uk/~ngo/typecase_spaces.pdf http://www.users.globalnet.co.uk/~ngo/typecase_welsh_accented_characters.pdf These and some others are linked from the following web page. http://www.users.globalnet.co.uk/~ngo/outlinks.htm That page is linked from another web page. http://www.users.globalnet.co.uk/~ngo/library.htm Best regards, William Overington 16 July 2015 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150716/9104f766/attachment.html> From charupdate at orange.fr Thu Jul 16 10:49:45 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Thu, 16 Jul 2015 17:49:45 +0200 (CEST) Subject: Input methods at the age of Unicode Message-ID: <1804867121.15811.1437061785195.JavaMail.www@wwinf1n11> On 16 Jul 2015, at 13:12, William_J_G Overington wrote: > Hi > I do not know if it is of interest, but some time ago I produced some pdf files that can each be used as a typecase so as to copy a character from the pdf, then paste into a Unicode-aware wordprocessor or desktop publishing program and then formatted to the desired font and font size. This is a nice piece of work. If you are using these characters very often, a solution using a Compose tree may be interesting too. It allows to type a sequence of characters available on the keyboard, to obtain the insertion of precomposed characters, punctuation and symbols. I'll insert some suggestions between, and I'm curious to know if you would like them. > The following might be of particular interest. > http://www.users.globalnet.co.uk/~ngo/typecase_accented_characters_for_Latvian.pdf To input a letter with macron, it is current to type 'Compose, _' and then the letter. With hacek, there is 'Compose, v' or 'Compose, <' but this is taken for "subscript", so I prefer 'v' and 'V'. You can find 'Compose, c' because of the ISO name of this diacritic, which has been enforced at merger (Unicode called it HACEK, which is the true name). So better is to choose 'v', a mnemonic derived from the shape. For comma below, take 'Compose, <, Comma', and for turned comma above, 'Compose, >, #, Comma' (I'm not quite sure, because I've not yet implemented these ones). But in fact, AFAIK the turned comma above is a preferred glyphic variant of the hacek on the g. > http://www.users.globalnet.co.uk/~ngo/typecase_esperanto.pdf These are easy, you need 'Compose, ^' and 'Compose, v'. > http://www.users.globalnet.co.uk/~ngo/typecase_hot_beverage.pdf This may be obtained by typing 'Compose, h, o, t' or 'Compose, h, b'. > http://www.users.globalnet.co.uk/~ngo/typecase_maltese.pdf With dot above is usually 'Compose, Full stop'; and the latin letter h with stroke is 'Compose, -, h'. > http://www.users.globalnet.co.uk/~ngo/typecase_quotation_marks.pdf You may type 'Compose, Grave' as a grave accent dead key, then go on with 'Apostrophe' or 'Quotation mark' for either single or double opening qoutation marks. Or 'Comose, Apostrophe' for the acute, then equally for the closing. That matches old ASCII practice, hence the mnemonics. For the low, type 'Compose, <', and for the reversed, 'Compose, \'. > http://www.users.globalnet.co.uk/~ngo/typecase_spaces.pdf There is an ultra-performative way to get *all* Unicode spaces (perhaps without the two doubles) with 'Compose, Space' and then any mnemonic letter, digit (1; 2; 3; 4; 6), and even < or > for the unpaired directional marks (very useful to correct the display when RTL characters are used in a LTR context and vice versa). > http://www.users.globalnet.co.uk/~ngo/typecase_welsh_accented_characters.pdf For the letters with diaeresis one can use the usual 'Compose, "', or the alternate 'Compose, :'. The latter helps disambiguating the use of quotation marks, because 'Compose, Apostrophe, Quotation mark' is already used for the closing double quote, so "diaeresis and acute" may interfere. For acute, grave, circumflex, we use 'Compose, '/`/^'. (Alternately, if the apostrophe risks to interfere, one can use the vertical bar instead, which is a solution that should have been implemented on the US International keyboard to prevent that "it messes" apostrophe, single quotes, and acute dead key. Instead of the quotation mark for diaeresis, IMO one could have chosen the number sign or some other less often used character. I know that ASCII used ' and " after Backspace to diacrite letters, hence the choice of the dead keys on the US International.) > These and some others are linked from the following web page. > http://www.users.globalnet.co.uk/~ngo/outlinks.htm > That page is linked from another web page. > http://www.users.globalnet.co.uk/~ngo/library.htm I'm confident to extrapolate that for each one of the other PDF typecases, there will be Compose solutions too. To implement a two characters Compose sequence, program the following: DEADTRANS(first character, compose, first character, 0x0001), DEADTRANS(second character, first character, target character, 0x0000) Best, Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150716/0a26a74c/attachment.html> From haberg-1 at telia.com Thu Jul 16 11:21:59 2015 From: haberg-1 at telia.com (Hans Aberg) Date: Thu, 16 Jul 2015 18:21:59 +0200 Subject: Input methods at the age of Unicode In-Reply-To: <748934299.14990.1437057860253.JavaMail.www@wwinf1e21> References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12> <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com> <784036309.5510.1437035352057.JavaMail.www@wwinf1f21> <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com> <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21> <860A971A-C656-499D-BB2C-CC8AB11E6688@telia.com> <748934299.14990.1437057860253.JavaMail.www@wwinf1e21> Message-ID: <F3206D2D-79D3-4A62-9D42-DF2DABC4B2D7@telia.com> > On 16 Jul 2015, at 16:44, Marcel Schneider <charupdate at orange.fr> wrote: > > On 16 Jul 2015, at 13:21, Hans Aberg <haberg-1 at telia.com> wrote: > Knowing nothing about, I mixed up ConTeXt you referred to, and ConTEXT, and ended up downloading and istalling a new text editor. At least, this time, that is very useful to me, as ConTEXT will replace for me the use of Gedit, because ConTEXT handles correctly the Kana shift states (about a half of my keyboard layout). However, as it is new, the support of characters like U+2610 or simply precomposed letters with macron or double acute is not yet ensured. When I've some time left I'll write to them, because the project is very promising. One needs a good UTF-8 text editor as well. > > It is simplest to just download the whole Tex Live: > > https://www.tug.org/texlive/ > > There is special package for OS X. > > Unfortunately I've no OS X machine at home nor otherwhere, nor have I Linux at home. Where I use Ubuntu I cannot install this. I'll check if there is a Windows version, but it seems to move me from my urgent goal, so it'll be for a bit later. The link above has an entry for that, too. > > Though large, the main distribution lives in a single directory, so it is easy to throw away. > > Nor will I throw away this software, could I install it. It is updated yearly, and there is usually no need to keep the old, but one can - they end up different directories. > > There is a ConTeXt users list <http://www.ntg.nl/mailman/listinfo/ntg-context>, as well as support pages <http://wiki.contextgarden.net/> > > I'll save, thank you. It hard to figure out from the documentation, so it might be better to ask there. From eliz at gnu.org Thu Jul 16 11:33:34 2015 From: eliz at gnu.org (Eli Zaretskii) Date: Thu, 16 Jul 2015 19:33:34 +0300 Subject: Input methods at the age of Unicode In-Reply-To: <F3206D2D-79D3-4A62-9D42-DF2DABC4B2D7@telia.com> References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12> <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com> <784036309.5510.1437035352057.JavaMail.www@wwinf1f21> <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com> <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21> <860A971A-C656-499D-BB2C-CC8AB11E6688@telia.com> <748934299.14990.1437057860253.JavaMail.www@wwinf1e21> <F3206D2D-79D3-4A62-9D42-DF2DABC4B2D7@telia.com> Message-ID: <838uag6p0h.fsf@gnu.org> > From: Hans Aberg <haberg-1 at telia.com> > Date: Thu, 16 Jul 2015 18:21:59 +0200 > Cc: Unicode Mailing List <unicode at unicode.org> > > One needs a good UTF-8 text editor as well. Emacs is one possibility, of course. From haberg-1 at telia.com Thu Jul 16 11:35:49 2015 From: haberg-1 at telia.com (Hans Aberg) Date: Thu, 16 Jul 2015 18:35:49 +0200 Subject: Input methods at the age of Unicode In-Reply-To: <748934299.14990.1437057860253.JavaMail.www@wwinf1e21> References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12> <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com> <784036309.5510.1437035352057.JavaMail.www@wwinf1f21> <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com> <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21> <860A971A-C656-499D-BB2C-CC8AB11E6688@telia.com> <748934299.14990.1437057860253.JavaMail.www@wwinf1e21> Message-ID: <A2B794E6-03BA-4AB4-A9C4-5570D2C21F8A@telia.com> > On 16 Jul 2015, at 16:44, Marcel Schneider <charupdate at orange.fr> wrote: > On 16 Jul 2015, at 15:20, Hans Aberg <haberg-1 at telia.com> wrote: > > It may suffice with a logical layout, letters in alphabetical order. The traditional layouts were designed for speed typing on physical typing machines, specifically, with fixed finger positioning, in order not having look at the keyboard while typing. > > This is an important point, not to look at the keyboard. Even with alphabetical order, one *must* learn typing. Often suggested for computers, the alphabetical order is also often rejected, because it needs much more finger move than its couterpart, the ergonomical order as proposed by August Dvorak, and very actively promoted in a French version by the association ERGODIS [http://bepo.fr/]. It depends on the objective. Languages may have a number of layouts, which may efficient for just that. But if one would want to have a single layout for the Latin scripts, it would be hard to have special letter orders. > > It is also considerably faster with moving finger positioning, which can be done if one does not have too look at some text while typing. > > I don't understand well how to speed up with moving fingers except towards the dedicated keys, the little fingers having much more of these, and the thumbs acting the central modifiers if any, and/or the central Compose key, additionally to the space bar. Central means on the Alt keys. Alt itself at this favorite position is counter-productive, it should be moved on Left Windows, this on Apps (Menu), which is not suppressed by a set of netbook manufacturers. If it is, then use the mouse/trackpad. It is used on music keyboards. For example, one can use more than one finger on the same key if it should be pressed rapidly in succession. If the hand needs to move, one shifts the fingers, which will avoid the stretching that would occur in fixed hand positioning. > I believe that at this juncture of imminent climate change and global destruction, we should stick with the existing hardware. For France neither I am not going to propose a completely *new* layout, I will bring something you can use by simply thinking at the little set of useful modifications, even without needing keyboard stickers. A reuse-what-you've-got concept. There are physical keyboard with displays on the keys that can be changed, e.g., [1], thus able to display different key layouts, but currently they are expensive, and the keys require more force when depressed. 1. http://www.artlebedev.com/everything/optimus/ From haberg-1 at telia.com Thu Jul 16 11:36:39 2015 From: haberg-1 at telia.com (Hans Aberg) Date: Thu, 16 Jul 2015 18:36:39 +0200 Subject: Input methods at the age of Unicode In-Reply-To: <838uag6p0h.fsf@gnu.org> References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12> <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com> <784036309.5510.1437035352057.JavaMail.www@wwinf1f21> <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com> <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21> <860A971A-C656-499D-BB2C-CC8AB11E6688@telia.com> <748934299.14990.1437057860253.JavaMail.www@wwinf1e21> <F3206D2D-79D3-4A62-9D42-DF2DABC4B2D7@telia.com> <838uag6p0h.fsf@gnu.org> Message-ID: <7A8C61A0-C4AC-4D08-BA1D-BF46850D0BB8@telia.com> > On 16 Jul 2015, at 18:33, Eli Zaretskii <eliz at gnu.org> wrote: > >> From: Hans Aberg <haberg-1 at telia.com> >> Date: Thu, 16 Jul 2015 18:21:59 +0200 >> Cc: Unicode Mailing List <unicode at unicode.org> >> >> One needs a good UTF-8 text editor as well. > > Emacs is one possibility, of course. And on OS X, Xcode has a good text editor as well. From richard.wordingham at ntlworld.com Thu Jul 16 17:59:24 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Thu, 16 Jul 2015 23:59:24 +0100 Subject: Input methods at the age of Unicode In-Reply-To: <838uag6p0h.fsf@gnu.org> References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12> <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com> <784036309.5510.1437035352057.JavaMail.www@wwinf1f21> <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com> <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21> <860A971A-C656-499D-BB2C-CC8AB11E6688@telia.com> <748934299.14990.1437057860253.JavaMail.www@wwinf1e21> <F3206D2D-79D3-4A62-9D42-DF2DABC4B2D7@telia.com> <838uag6p0h.fsf@gnu.org> Message-ID: <20150716235924.2dfc406b@JRWUBU2> On Thu, 16 Jul 2015 19:33:34 +0300 Eli Zaretskii <eliz at gnu.org> wrote: > > One needs a good UTF-8 text editor as well. > Emacs is one possibility, of course. If you're prepared to cut and paste, it's easy to extend it own keyboards. (Creating the first one was a bit stressful - the ones that come with Emacs were almost all set up using ISO-2022, before Emacs adopted Unicode.) Richard. From jsbien at mimuw.edu.pl Thu Jul 16 22:41:11 2015 From: jsbien at mimuw.edu.pl (Janusz S. Bien) Date: Fri, 17 Jul 2015 05:41:11 +0200 Subject: Input methods at the age of Unicode In-Reply-To: <20150716235924.2dfc406b@JRWUBU2> References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12> <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com> <784036309.5510.1437035352057.JavaMail.www@wwinf1f21> <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com> <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21> <860A971A-C656-499D-BB2C-CC8AB11E6688@telia.com> <748934299.14990.1437057860253.JavaMail.www@wwinf1e21> <F3206D2D-79D3-4A62-9D42-DF2DABC4B2D7@telia.com> <838uag6p0h.fsf@gnu.org> <20150716235924.2dfc406b@JRWUBU2> Message-ID: <20150717054111.129625mbnpjlz5uf@mail.mimuw.edu.pl> Quote/Cytat - Richard Wordingham <richard.wordingham at ntlworld.com> (Fri 17 Jul 2015 12:59:24 AM CEST): > On Thu, 16 Jul 2015 19:33:34 +0300 > Eli Zaretskii <eliz at gnu.org> wrote: > >> > One needs a good UTF-8 text editor as well. > >> Emacs is one possibility, of course. > > If you're prepared to cut and paste, Why it is relevant? > it's easy to extend it own > keyboards. (Creating the first one was a bit stressful It is not clear for me what do you mean by "own keyboards" - the ones > that come with Emacs were almost all set up using ISO-2022, before > Emacs adopted Unicode.) I my opinion creating a new Emacs input method is extremely easy and I solve my problems my modifying 'polish-slash'. In a file you can associate an input method with it using Emacs an appropriate local variable. Best regards Janusz -- Prof. dr hab. Janusz S. Bie? - Uniwersytet Warszawski (Katedra Lingwistyki Formalnej) Prof. Janusz S. Bie? - University of Warsaw (Formal Linguistics Department) jsbien at uw.edu.pl, jsbien at mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/ From richard.wordingham at ntlworld.com Fri Jul 17 01:39:57 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Fri, 17 Jul 2015 07:39:57 +0100 Subject: Input methods at the age of Unicode In-Reply-To: <20150717054111.129625mbnpjlz5uf@mail.mimuw.edu.pl> References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12> <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com> <784036309.5510.1437035352057.JavaMail.www@wwinf1f21> <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com> <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21> <860A971A-C656-499D-BB2C-CC8AB11E6688@telia.com> <748934299.14990.1437057860253.JavaMail.www@wwinf1e21> <F3206D2D-79D3-4A62-9D42-DF2DABC4B2D7@telia.com> <838uag6p0h.fsf@gnu.org> <20150716235924.2dfc406b@JRWUBU2> <20150717054111.129625mbnpjlz5uf@mail.mimuw.edu.pl> Message-ID: <20150717073957.1290cd32@JRWUBU2> On Fri, 17 Jul 2015 05:41:11 +0200 "Janusz S. Bien" <jsbien at mimuw.edu.pl> wrote: > Quote/Cytat - Richard Wordingham <richard.wordingham at ntlworld.com> > (Fri 17 Jul 2015 12:59:24 AM CEST): Perhaps I'm missing a trick. My conception was that to use an Emacs keyboard for, say, word processor input, one would have to type into an Emacs buffer and then copy the text to the word processor application. > > it's easy to extend it own > > keyboards. (Creating the first one was a bit stressful > It is not clear for me what do you mean by "own keyboards" Except possibly for Windows (last time I looked into it, Emacs there was built as an ANSI application rather than as a Unicode application), Emacs can use the user-specified system keyboards (and general-purpose user keyboards) as well as the Emacs-specific keyboards. By "own keyboards" I meant the ones defined for Emacs, specifically the ones set up by quail-define-package and quail-define-rules. There was a period when, due to an external error, Emacs launched with an English locale couldn't use keyboards made available by ibus. > - the ones > > that come with Emacs were almost all set up using ISO-2022, before > > Emacs adopted Unicode.) > I my opinion creating a new Emacs input method is extremely easy and > I solve my problems my modifying 'polish-slash'. I see latin-pre.el and latin-post.el in particular are now defined in UTF-8, which simplifies adaptation. My exemplar was thai.el, which at the time was in ISO-2022. > In a file you can associate an input method with it using Emacs an > appropriate local variable. Another example of the first keyboard being difficult and the rest easy. Once one starts using that trick it is easy to modify it for other keyboards. Richard. From eliz at gnu.org Fri Jul 17 01:57:46 2015 From: eliz at gnu.org (Eli Zaretskii) Date: Fri, 17 Jul 2015 09:57:46 +0300 Subject: Input methods at the age of Unicode In-Reply-To: <20150716235924.2dfc406b@JRWUBU2> References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12> <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com> <784036309.5510.1437035352057.JavaMail.www@wwinf1f21> <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com> <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21> <860A971A-C656-499D-BB2C-CC8AB11E6688@telia.com> <748934299.14990.1437057860253.JavaMail.www@wwinf1e21> <F3206D2D-79D3-4A62-9D42-DF2DABC4B2D7@telia.com> <838uag6p0h.fsf@gnu.org> <20150716235924.2dfc406b@JRWUBU2> Message-ID: <831tg76zkl.fsf@gnu.org> > Date: Thu, 16 Jul 2015 23:59:24 +0100 > From: Richard Wordingham <richard.wordingham at ntlworld.com> > > On Thu, 16 Jul 2015 19:33:34 +0300 > Eli Zaretskii <eliz at gnu.org> wrote: > > > > One needs a good UTF-8 text editor as well. > > > Emacs is one possibility, of course. > > If you're prepared to cut and paste, it's easy to extend it own > keyboards. FWIW, I do that a lot, because the number of convenient input methods in Emacs far outnumbers what I have on MS-Windows. For example, if I have to type Russian with no Russian keyboard available, the cyrillic-translit input method is a life savior. From marc at keyman.com Fri Jul 17 03:01:46 2015 From: marc at keyman.com (Marc Durdin) Date: Fri, 17 Jul 2015 08:01:46 +0000 Subject: Input methods at the age of Unicode In-Reply-To: <831tg76zkl.fsf@gnu.org> References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12> <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com> <784036309.5510.1437035352057.JavaMail.www@wwinf1f21> <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com> <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21> <860A971A-C656-499D-BB2C-CC8AB11E6688@telia.com> <748934299.14990.1437057860253.JavaMail.www@wwinf1e21> <F3206D2D-79D3-4A62-9D42-DF2DABC4B2D7@telia.com> <838uag6p0h.fsf@gnu.org> <20150716235924.2dfc406b@JRWUBU2>,<831tg76zkl.fsf@gnu.org> Message-ID: <D2057BA2-75F0-4328-AAEB-4B078A448875@keyman.com> On Windows, you can always use Keyman and Keyman Developer to create very flexible input methods that work across pretty much any app, FWIW. Both of these are available free these days at least in basic editions (www.keyman.com/desktop<http://www.keyman.com/desktop> and www.keyman.com/developer<http://www.keyman.com/developer>). Just providing another alternative. Marc -----Original Message----- From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Eli Zaretskii Sent: Friday, 17 July 2015 4:58 PM To: Richard Wordingham Cc: unicode at unicode.org<mailto:unicode at unicode.org> Subject: Re: Input methods at the age of Unicode Date: Thu, 16 Jul 2015 23:59:24 +0100 From: Richard Wordingham <richard.wordingham at ntlworld.com<mailto:richard.wordingham at ntlworld.com>> On Thu, 16 Jul 2015 19:33:34 +0300 Eli Zaretskii <eliz at gnu.org<mailto:eliz at gnu.org>> wrote: One needs a good UTF-8 text editor as well. Emacs is one possibility, of course. If you're prepared to cut and paste, it's easy to extend it own keyboards. FWIW, I do that a lot, because the number of convenient input methods in Emacs far outnumbers what I have on MS-Windows. For example, if I have to type Russian with no Russian keyboard available, the cyrillic-translit input method is a life savior. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150717/b3bd1bfd/attachment.html> From eliz at gnu.org Fri Jul 17 03:28:10 2015 From: eliz at gnu.org (Eli Zaretskii) Date: Fri, 17 Jul 2015 11:28:10 +0300 Subject: Input methods at the age of Unicode In-Reply-To: <D2057BA2-75F0-4328-AAEB-4B078A448875@keyman.com> References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12> <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com> <784036309.5510.1437035352057.JavaMail.www@wwinf1f21> <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com> <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21> <860A971A-C656-499D-BB2C-CC8AB11E6688@telia.com> <748934299.14990.1437057860253.JavaMail.www@wwinf1e21> <F3206D2D-79D3-4A62-9D42-DF2DABC4B2D7@telia.com> <838uag6p0h.fsf@gnu.org> <20150716235924.2dfc406b@JRWUBU2> <831tg76zkl.fsf@gnu.org> <D2057BA2-75F0-4328-AAEB-4B078A448875@keyman.com> Message-ID: <83zj2v5gth.fsf@gnu.org> > From: Marc Durdin <marc at keyman.com> > CC: Richard Wordingham <richard.wordingham at ntlworld.com>, > "unicode at unicode.org" <unicode at unicode.org> > Date: Fri, 17 Jul 2015 08:01:46 +0000 > > On Windows, you can always use Keyman and Keyman Developer to create very > flexible input methods that work across pretty much any app, FWIW. Both of > these are available free these days at least in basic editions > (www.keyman.com/desktop and www.keyman.com/developer). Just providing another > alternative. I'm surprised there isn't such an input method already. I think it's available only with Some East Asia packs, or something. From charupdate at orange.fr Fri Jul 17 04:33:12 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Fri, 17 Jul 2015 11:33:12 +0200 (CEST) Subject: Input methods at the age of Unicode In-Reply-To: <F3206D2D-79D3-4A62-9D42-DF2DABC4B2D7@telia.com> References: <563809849.6137.1436951201748.JavaMail.www@wwinf1h12> <7F1744A8-5596-4347-AAF9-6EDB52A05309@telia.com> <784036309.5510.1437035352057.JavaMail.www@wwinf1f21> <BF62BC3B-0366-416A-89A2-15DD599F2B02@telia.com> <1200823275.6951.1437038483265.JavaMail.www@wwinf1f21> <860A971A-C656-499D-BB2C-CC8AB11E6688@telia.com> <748934299.14990.1437057860253.JavaMail.www@wwinf1e21> <F3206D2D-79D3-4A62-9D42-DF2DABC4B2D7@telia.com> Message-ID: <1139160524.7966.1437125592262.JavaMail.www@wwinf1f21> On 16 Jul 2015, at 18:22, Hans Aberg wrote: > One needs a good UTF-8 text editor as well. ConTEXT displays "UTF-8" in the status bar. I'm pretty confident that it has the potential of becoming the world's best text editor. It's not yet 1.0, still 0.98.6, and many users are already enthusiastic. > The link above has an entry for that, too. Thank you, I just can't work with TeX right now, I know it needs some skill. > It is updated yearly, and there is usually no need to keep the old, but one can - they end up different directories. > It hard to figure out from the documentation, so it might be better to ask there. Thank you. On 16 Jul 2015, at 18:35, Hans Aberg wrote: > It depends on the objective. Languages may have a number of layouts, which may efficient for just that. > But if one would want to have a single layout for the Latin scripts, it would be hard to have special letter orders. My goal is not a single Latin, just a universal Latin depending on locales, now French for France, then fr-BE, de-... en-... and so on, implementing some pinciples in different locales. > It is used on music keyboards. For example, one can use more than one finger on the same key if it should be pressed rapidly in succession. If the hand needs to move, one shifts the fingers, which will avoid the stretching that would occur in fixed hand positioning. I've little idea of music keyboards as I primarily learned other instruments, but AFAIK the keystroke dynamics are quite different as opposed to a classical computer keyboard, be it ergonomical or current. > There are physical keyboard with displays on the keys that can be changed, e.g., [1], thus able to display different key layouts, but currently they are expensive, and the keys require more force when depressed. > 1. http://www.artlebedev.com/everything/optimus/ I think that is an idea for users having to toggle between a lot of locales and not the time to learn them all. Very heavy, very much technology. Alternately an onscreen keyboard with visual real-time feedback may allow to get the same effect without looking at the fingers on the keycaps. This is much cheaper, as we have already HD screens if needed (not I, nor do I?need any). On 16 Jul 2015, at 18:36, Hans Aberg wrote: >> On 16 Jul 2015, at 18:33, Eli Zaretskii wrote: >> >>> From: Hans Aberg >>> Date: Thu, 16 Jul 2015 18:21:59 +0200 >>> Cc: Unicode Mailing List >>> >>> One needs a good UTF-8 text editor as well. >> >> Emacs is one possibility, of course. Almost everybody, including me, has heard of Emacs and that it is very hard to use. > And on OS X, Xcode has a good text editor as well. And on Xfce we have MousePad. No I'll try Notepad++, which reduces the environmental impact of text editing. Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150717/df7ffeb2/attachment.html> From doug at ewellic.org Fri Jul 17 09:31:37 2015 From: doug at ewellic.org (Doug Ewell) Date: Fri, 17 Jul 2015 07:31:37 -0700 Subject: Keyman Developer for =?UTF-8?Q?free=3F=20=28was=3A=20Re=3A=20Input=20meth?= =?UTF-8?Q?ods=20at=20the=20age=20of=20Unicode=29?= Message-ID: <20150717073137.665a7a7059d7ee80bb4d670165c8327d.f6b27513fe.wbe@email03.secureserver.net> Marc Durdin <marc at keyman dot com> wrote: > On Windows, you can always use Keyman and Keyman Developer to create > very flexible input methods that work across pretty much any app, > FWIW. Both of these are available free these days at least in basic > editions (www.keyman.com/desktop and www.keyman.com/developer). Just > providing another alternative. Can you provide a specific link to a freely available version? I hadn't heard before that there was such a thing, and the links above don't say anything about free. Limited-time evaluation versions don't count, of course. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From doug at ewellic.org Fri Jul 17 09:36:51 2015 From: doug at ewellic.org (Doug Ewell) Date: Fri, 17 Jul 2015 07:36:51 -0700 Subject: Keyman Developer for =?UTF-8?Q?free=3F=20=28was=3A=20Re=3A=20Input=20?= =?UTF-8?Q?methods=20at=20the=20age=20of=20Unicode=29?= Message-ID: <20150717073651.665a7a7059d7ee80bb4d670165c8327d.faa2fb2b36.wbe@email03.secureserver.net> I wrote: >> (www.keyman.com/desktop and www.keyman.com/developer) > > the links above don't say anything about free s/links/link/ The first link does offer a free version of Desktop, but that's for end users only. Creating a keyboard requires Developer. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From charupdate at orange.fr Fri Jul 17 10:26:26 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Fri, 17 Jul 2015 17:26:26 +0200 (CEST) Subject: Input methods at the age of Unicode Message-ID: <1146739442.16626.1437146786363.JavaMail.www@wwinf1k33> On 30 Jun 2015, at 23:28, Doug Ewell wrote: > This works on the built-in Notepad as well as Notepad++ and BabelPad Notepad++ is great software. It supports Kana shift states and all of Unicode, I infere from what I've tested. The bit on process garbage found on the homepage might target other text editors that would then not be streamlined for efficiency, I suppose. As a text editor, I recommend Notepad++. Thank you for this information. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150717/6c857324/attachment.html> From charupdate at orange.fr Fri Jul 17 10:38:22 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Fri, 17 Jul 2015 17:38:22 +0200 (CEST) Subject: Input methods at the age of Unicode In-Reply-To: <1146739442.16626.1437146786363.JavaMail.www@wwinf1k33> References: <1146739442.16626.1437146786363.JavaMail.www@wwinf1k33> Message-ID: <2010183755.15134.1437147502497.JavaMail.www@wwinf1g19> On 30 Jun 2015, at 23:28, Doug Ewell wrote: > This works on the built-in Notepad as well as Notepad++ and BabelPad Notepad++ is great software. It supports Kana shift states and all of Unicode, I infere from what I've tested. The bit on process garbage found on the homepage might target other text editors that would then not be streamlined for efficiency, I suppose. As a text editor, I recommend Notepad++. Thank you for this information. Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150717/b2822955/attachment.html> From marc at keyman.com Fri Jul 17 17:55:27 2015 From: marc at keyman.com (Marc Durdin) Date: Fri, 17 Jul 2015 22:55:27 +0000 Subject: Keyman Developer for free? (was: Re: Input methods at the age of Unicode) In-Reply-To: <20150717073137.665a7a7059d7ee80bb4d670165c8327d.f6b27513fe.wbe@email03.secureserver.net> References: <20150717073137.665a7a7059d7ee80bb4d670165c8327d.f6b27513fe.wbe@email03.secureserver.net> Message-ID: <71DC5CC4-35DE-42BA-8093-5F1218E129A2@keyman.com> > On 18 Jul 2015, at 12:32 am, Doug Ewell <doug at ewellic.org> wrote: > > Marc Durdin <marc at keyman dot com> wrote: > >> On Windows, you can always use Keyman and Keyman Developer to create >> very flexible input methods that work across pretty much any app, >> FWIW. Both of these are available free these days at least in basic >> editions (www.keyman.com/desktop and www.keyman.com/developer). Just >> providing another alternative. > > Can you provide a specific link to a freely available version? I hadn't > heard before that there was such a thing, and the links above don't say > anything about free. Limited-time evaluation versions don't count, of > course. > http://tavultesoft.com/beta has the free download of Developer 9. The beta has the license key requirement but you can obtain a free perpetual license key on that page as well. While Keyman Developer 9 is version still in beta, it is stable and we are finalising the documentation and a few loose ends. The release version will continue to be free. Version 9 includes support for building keyboards for Windows, web, mobile web, iOS and Android, with Mac OS X coming shortly. The web and mobile web versions run with KeymanWeb 2.0 which is open source at http://www.keyman.com/developer/keymanweb. Keyman apps for mobile platforms can be found at keyman.com as well. Sorry if this sounds a bit like a commercial but wanted to clear up the some uncertainty on where Keyman is at today. Marc From charupdate at orange.fr Sat Jul 18 09:33:23 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Sat, 18 Jul 2015 16:33:23 +0200 (CEST) Subject: Input methods at the age of Unicode Message-ID: <208789398.11783.1437230003340.JavaMail.www@wwinf1k02> On 16 Jul 2015, at 23:59:24 +0100, Eli Zaretskii wrote: wrote: > FWIW, I do that a lot, because the number of convenient input methods > in Emacs far outnumbers what I have on MS-Windows. For example, if I > have to type Russian with no Russian keyboard available, the > cyrillic-translit input method is a life savior. You might wish also to use the Windows on-screen keyboard which allows to see what's exactly on each key while typing on whatever physical keyboard, without any need to have the keycap labels match the layout. This on-screen keyboard is built-in, only it does not support Kana shift states. Likewise Windows came to me along with all that is needed to type ?? ???? ?? ? ?????, so I can?t really believe that users need Emacs as a savior. When process garbage is an environmental issue, one might consider that our real savior is Notepad++, thanks to its energy saving algorithms. Indeed I do not think that we should get supplemental input facilities at any price. This is why, too, the goal should be to pack a reasonably large subset of Unicode into the very core of the keyboard driver of every locale, and make it accessible right there with a Compose tree. Every time we open charmap dialogs or even go on the internet to pick a character, we?re consuming some energy, and if it?s a routine task that could be done with a memorized Compose sequence, that energy is wasted. I don?t know if it?s a real issue, but I?m likely to believe it is. Of course we need some software as a savior, but this software is consequently called Zotero and helps us save and manage our research results (?Search, not re-search!? https://www.zotero.org). Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150718/c0e703da/attachment.html> From charupdate at orange.fr Sat Jul 18 09:47:09 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Sat, 18 Jul 2015 16:47:09 +0200 (CEST) Subject: Keyman Developer for free? (was: Re: Input methods at the age of Unicode) In-Reply-To: <71DC5CC4-35DE-42BA-8093-5F1218E129A2@keyman.com> References: <20150717073137.665a7a7059d7ee80bb4d670165c8327d.f6b27513fe.wbe@email03.secureserver.net> <71DC5CC4-35DE-42BA-8093-5F1218E129A2@keyman.com> Message-ID: <279207082.12069.1437230829393.JavaMail.www@wwinf1k02> On 18 Jul 2015, at 00:55:27, Marc Durdin wrote: > http://tavultesoft.com/beta has the free download of Developer 9. The beta has the license key requirement but you can obtain a free perpetual license key on that page as well. > While Keyman Developer 9 is version still in beta, it is stable and we are finalising the documentation and a few loose ends. The release version will continue to be free. > Version 9 includes support for building keyboards for Windows, web, mobile web, iOS and Android, with Mac OS X coming shortly. The web and mobile web versions run with KeymanWeb 2.0 which is open source at http://www.keyman.com/developer/keymanweb. Keyman apps for mobile platforms can be found at keyman.com as well. In front of this very outworked keyboard mapping solution I knew nothing about, I?m very astonished. If it helps make available the missing layouts and improve BTW a number of Windows keyboard layouts where I found some oddities, I welcome it and am considering to try. In the meantime however, I would ask a couple of questions: 1. Does Keyman allow to place a Kana toggle? This feature available at least on Windows is useful for locales like Czech and French that use so many precomposed characters that the upper row is filled up with them to some extent. When Kana toggle is on, digits will be in Base (Kana) there. The preferred place for this toggle is E00 (ISO 9995-1). 2. Does Keyman support extended Compose trees? An extended Compose tree allows to use ?Compose? as a part of Compose sequences. In fact, ?Compose? can convert to a dead key *any* key on the keyboard, including the Compose key itself (regardless of the fact that it is already a dead key). This allows to make sequences more user-friendly. For example, the h??ek dead key may be ?Compose, v?, while ? may be ?Compose, z, h?. With an extended Compose tree, users may input ? typing ?Compose, v, Compose, z, h?. Otherwise it must be typed ?Compose, z, v, h?, because ?Compose, v, z? is already ?. With ?Compose? acted by the right thumb, the first option may be appealing. One keystroke more, but one memorization less. However, I know that the second order matches the principle of double combining marks as stated in TUS ?7.9. It would be interesting to know the user preferences about these Compose sequences, as implementing them both is needless if one is disliked. 3. Does Keyman propose a spreadsheet-like UI? The use of spreadsheets for keyboard layout programming helps streamlining the development process. 4. Are Keyman layouts programmable in C? Windows drivers (at least, as I know little about other OSes) are. The syntax of C and C++ allows developers to use spreadsheets, from where allocation tables, deadtrans lists, and ligatures tables (that is, in keyboard driver language, Unicode character [WCHAR] sequences tables) are copied and pasted into the source. 5. Does Keyman allow to get such ligatures (sequences) accessed by dead keys? On Windows I don't see this possibility, and I never knew how to program it. But Unicode recommends that impl?mentations provide this facility. Regards, Marcel Schneider -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150718/86ab442a/attachment.html> From jsbien at mimuw.edu.pl Sat Jul 18 09:51:24 2015 From: jsbien at mimuw.edu.pl (Janusz S. Bien) Date: Sat, 18 Jul 2015 16:51:24 +0200 Subject: Input methods at the age of Unicode In-Reply-To: <208789398.11783.1437230003340.JavaMail.www@wwinf1k02> References: <208789398.11783.1437230003340.JavaMail.www@wwinf1k02> Message-ID: <20150718165124.24201ih6cra1y224@mail.mimuw.edu.pl> Quote/Cytat - Marcel Schneider <charupdate at orange.fr> (Sat 18 Jul 2015 04:33:23 PM CEST): > On 16 Jul 2015, at 23:59:24 +0100, Eli Zaretskii wrote: wrote: > >> FWIW, I do that a lot, because the number of convenient input methods >> in Emacs far outnumbers what I have on MS-Windows. For example, if I >> have to type Russian with no Russian keyboard available, the >> cyrillic-translit input method is a life savior. > > You might wish also to use the Windows on-screen keyboard which > allows to see what's exactly on each key while typing on whatever > physical keyboard, without any need to have the keycap labels match > the layout. This on-screen keyboard is built-in, only it does not > support Kana shift states. > Likewise Windows came to me along with all that is needed to type ?? > ???? ?? ? ?????, so I can?t really believe that users need Emacs as > a savior. cyrillic-translit and most other Emacs input methods are more convenient than on-screen keyboard, especially if you don't like to use mouse and your goal is to get the text into Emacs :-) > > When process garbage is an environmental issue, one might consider > that our real savior is Notepad++, thanks to its energy saving > algorithms. Indeed I do not think that we should get supplemental > input facilities at any price. This is why, too, the goal should be > to pack a reasonably large subset of Unicode into the very core of > the keyboard driver of every locale, and make it accessible right > there with a Compose tree. I don't think it would be practical. > Every time we open charmap dialogs or even go on the internet to > pick a character, we?re consuming some energy, Agreed. > and if it?s a routine task that could be done with a memorized Memorizing also requires some effort and energy. > Compose sequence, that energy is wasted. I don?t know if it?s a real > issue, but I?m likely to believe it is. > > Of course we need some software as a savior, but this software is > consequently called Zotero and helps us save and manage our research > results (?Search, not re-search!? https://www.zotero.org). I have nothing against Zotero, but its mention here seems completely irrelevant. Best regards Janusz -- Prof. dr hab. Janusz S. Bie? - Uniwersytet Warszawski (Katedra Lingwistyki Formalnej) Prof. Janusz S. Bie? - University of Warsaw (Formal Linguistics Department) jsbien at uw.edu.pl, jsbien at mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/ From eliz at gnu.org Sat Jul 18 10:31:02 2015 From: eliz at gnu.org (Eli Zaretskii) Date: Sat, 18 Jul 2015 18:31:02 +0300 Subject: Input methods at the age of Unicode In-Reply-To: <208789398.11783.1437230003340.JavaMail.www@wwinf1k02> References: <208789398.11783.1437230003340.JavaMail.www@wwinf1k02> Message-ID: <83bnf95vpl.fsf@gnu.org> > Date: Sat, 18 Jul 2015 16:33:23 +0200 (CEST) > From: Marcel Schneider <charupdate at orange.fr> > Cc: UnicodeMailingList <unicode at unicode.org> > > > FWIW, I do that a lot, because the number of convenient input methods > > in Emacs far outnumbers what I have on MS-Windows. For example, if I > > have to type Russian with no Russian keyboard available, the > > cyrillic-translit input method is a life savior. > > You might wish also to use the Windows on-screen keyboard which allows to see > what's exactly on each key while typing on whatever physical keyboard, without > any need to have the keycap labels match the layout. This on-screen keyboard is > built-in, only it does not support Kana shift states. That makes typing much more slow, unless you already know, at least approximately, where the keys are. you are talking to someone who is almost touch typist in English, but cannot remember for the life of me the Russian keyboard. Transliteration is the way to go in such cases, and it's strange that transliteration-based input methods are not readily available on Windows out of the box. From doug at ewellic.org Sat Jul 18 12:14:48 2015 From: doug at ewellic.org (Doug Ewell) Date: Sat, 18 Jul 2015 11:14:48 -0600 Subject: Keyman Developer for free? (was: Re: Input methods at the age of Unicode) In-Reply-To: <71DC5CC4-35DE-42BA-8093-5F1218E129A2@keyman.com> References: <20150717073137.665a7a7059d7ee80bb4d670165c8327d.f6b27513fe.wbe@email03.secureserver.net> <71DC5CC4-35DE-42BA-8093-5F1218E129A2@keyman.com> Message-ID: <B0ADB1717C6244B2A16364FEBF8CF5A1@DougEwell> Marc Durdin wrote: > http://tavultesoft.com/beta has the free download of Developer 9. The > beta has the license key requirement but you can obtain a free > perpetual license key on that page as well. Thanks for the additional link. I'll try this. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From charupdate at orange.fr Sat Jul 18 15:34:37 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Sat, 18 Jul 2015 22:34:37 +0200 (CEST) Subject: Input methods at the age of Unicode In-Reply-To: <83bnf95vpl.fsf@gnu.org> References: <208789398.11783.1437230003340.JavaMail.www@wwinf1k02> <83bnf95vpl.fsf@gnu.org> Message-ID: <1182315476.15127.1437251677051.JavaMail.www@wwinf1g36> On 18 Jul 2015, at 17:30, Eli Zaretskii wrote: > > Date: Sat, 18 Jul 2015 16:33:23 +0200 (CEST) > > From: Marcel Schneider > > You might wish also to use the Windows on-screen keyboard which allows to see > > what's exactly on each key while typing on whatever physical keyboard, without > > any need to have the keycap labels match the layout. This on-screen keyboard is > > built-in, only it does not support Kana shift states. > > That makes typing much more slow, unless you already know, at least > approximately, where the keys are. you are talking to someone who is > almost touch typist in English, but cannot remember for the life of me > the Russian keyboard. Transliteration is the way to go in such cases, > and it's strange that transliteration-based input methods are not > readily available on Windows out of the box. The Chinese IME new style is a very smart tool based on transliteration. You type just the syllables like they sound in English, and you get plenty of suggestions among which to choose. There is still the Chinese old style IME shipped with, too. I don't know Chinese so I can't tell more but visually I believe these tools are very performative. Perhaps for Russian no transliteration based input tool was built for Windows because we are meant to use the keyboard straightforward. Now, the osk.exe should probably include on each key picture the letter that is on the current physical keyboard. That is what I often missed on such UIs, that you cannot make the link with the base layout as the user knows it. I will say, too, that when the OS is in Russian, the OSK should display cyrillic letters following the Russian keyboard when the OSK displays a QWERTY keyboard layout. As you can have the OSK always above, you just look at it and see the keys you're striking. There is also the old solution with a keymap on a paper. You can open the Russian layout in the MSKLC, choose a nice font, font-size, window size (to get square keys; don't let the default rectangles), nice background colors. Then save it as a picture, in the File menu > Save as image. Open this in Paint or Gimp and add the Latin letters. Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150718/404cb932/attachment.html> From charupdate at orange.fr Sat Jul 18 15:44:49 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Sat, 18 Jul 2015 22:44:49 +0200 (CEST) Subject: Input methods at the age of Unicode In-Reply-To: <20150718165124.24201ih6cra1y224@mail.mimuw.edu.pl> References: <208789398.11783.1437230003340.JavaMail.www@wwinf1k02> <20150718165124.24201ih6cra1y224@mail.mimuw.edu.pl> Message-ID: <1769733270.15174.1437252289729.JavaMail.www@wwinf1g36> On 18 Jul 2015, at 16:58, Janusz S. Bien wrote: > cyrillic-translit and most other Emacs input methods are more > convenient than on-screen keyboard, especially if you don't like to > use mouse and your goal is to get the text into Emacs :-) The OSK while working by mouse click too, does not require the use of the mouse/trackpad. > > This is why, too, the goal should be > > to pack a reasonably large subset of Unicode into the very core of > > the keyboard driver of every locale, and make it accessible right > > there with a Compose tree. > > I don't think it would be practical. Could you please explain in any way what is the reason why a Compose key, or a huge Compose tree, wouldn't be practical? I'm interested in knowing more about this issue. > > Every time we open charmap dialogs or even go on the internet to > > pick a character, we?re consuming some energy, > > Agreed. > > > and if it?s a routine task that could be done with a memorized > > Memorizing also requires some effort and energy. Like using the bicycle instead of the car... > > Of course we need some software as a savior, but this software is > > consequently called Zotero and helps us save and manage our research > > results (?Search, not re-search!? https://www.zotero.org). > > I have nothing against Zotero, but its mention here seems completely > irrelevant. We just were talking about saviors. Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150718/f43db1b6/attachment.html> From marc at keyman.com Sun Jul 19 01:16:44 2015 From: marc at keyman.com (Marc Durdin) Date: Sun, 19 Jul 2015 06:16:44 +0000 Subject: Keyman Developer for free? (was: Re: Input methods at the age of Unicode) In-Reply-To: <279207082.12069.1437230829393.JavaMail.www@wwinf1k02> References: <20150717073137.665a7a7059d7ee80bb4d670165c8327d.f6b27513fe.wbe@email03.secureserver.net> <71DC5CC4-35DE-42BA-8093-5F1218E129A2@keyman.com> <279207082.12069.1437230829393.JavaMail.www@wwinf1k02> Message-ID: <1CEDD746887FFF4B834688E7AF5FDA5A82164B69@federation.tavultesoft.local> From: Marcel Schneider [mailto:charupdate at orange.fr] Sent: Sunday, 19 July 2015 12:47 AM Subject: Re: Keyman Developer for free? (was: Re: Input methods at the age of Unicode) 1. Does Keyman allow to place a Kana toggle? This feature available at least on Windows is useful for locales like Czech and French that use so many precomposed characters that the upper row is filled up with them to some extent. When Kana toggle is on, digits will be in Base (Kana) there. The preferred place for this toggle is E00 (ISO 9995-1). Yes. See http://help.keyman.com/developer/9.0/docs/guide/guide_lang_options.php for one way to implement this. Note: URLs I refer to are from the beta and so are subject to change shortly, but the details will still be found on http://help.keyman.com/developer/ after the site is updated. 2. Does Keyman support extended Compose trees? An extended Compose tree allows to use ?Compose? as a part of Compose sequences. In fact, ?Compose? can convert to a dead key *any* key on the keyboard, including the Compose key itself (regardless of the fact that it is already a dead key). This allows to make sequences more user-friendly. For example, the h??ek dead key may be ?Compose, v?, while ? may be ?Compose, z, h?. With an extended Compose tree, users may input ? typing ?Compose, v, Compose, z, h?. Otherwise it must be typed ?Compose, z, v, h?, because ?Compose, v, z? is already ?. With ?Compose? acted by the right thumb, the first option may be appealing. One keystroke more, but one memorization less. However, I know that the second order matches the principle of double combining marks as stated in TUS ?7.9. It would be interesting to know the user preferences about these Compose sequences, as implementing them both is needless if one is disliked. Yes, although not in the way you understand Compose trees. Keyman uses a more powerful context-based mechanism. See http://help.keyman.com/developer/9.0/docs/tutorial/tutorial_keyboard.php for a starter on how the Keyman keyboard language works. 3. Does Keyman propose a spreadsheet-like UI? The use of spreadsheets for keyboard layout programming helps streamlining the development process. Not really. Table-based setups tend to constrain the design of keyboards. Keyman uses a rule based model ? see the tutorial link above for more detail. 4. Are Keyman layouts programmable in C? Windows drivers (at least, as I know little about other OSes) are. The syntax of C and C++ allows developers to use spreadsheets, from where allocation tables, deadtrans lists, and ligatures tables (that is, in keyboard driver language, Unicode character [WCHAR] sequences tables) are copied and pasted into the source. No, this would not be cross-platform. Keyman layouts compile down to Javascript (web, mobile web, Android, iOS) or a proprietary binary format (Windows, Mac OS X). Keyman layouts can be extended with C/C++ (Windows) or Javascript (other platforms) to add more complex behaviours that cannot be represented in the Keyman keyboard language. 5. Does Keyman allow to get such ligatures (sequences) accessed by dead keys? On Windows I don't see this possibility, and I never knew how to program it. But Unicode recommends that impl?mentations provide this facility. Yes, although dead keys are typically not the best choice for the majority of the world?s languages. See the tutorial again, e.g. step 8. The help site for Keyman has a stack of documentation and examples and is the best place to start, but if you don?t find answers to your queries there, I am happy to answer additional questions about the specifics of Keyman off-list, or you can simply download and try the development tools yourself from http://tavultesoft.com/beta/ Regards, Marc -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150719/630407a5/attachment.html> From charupdate at orange.fr Sun Jul 19 07:37:55 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Sun, 19 Jul 2015 14:37:55 +0200 (CEST) Subject: Keyman Developer for free? (was: Re: Input methods at the age of Unicode) In-Reply-To: <1CEDD746887FFF4B834688E7AF5FDA5A82164B69@federation.tavultesoft.local> References: <20150717073137.665a7a7059d7ee80bb4d670165c8327d.f6b27513fe.wbe@email03.secureserver.net> <71DC5CC4-35DE-42BA-8093-5F1218E129A2@keyman.com> <279207082.12069.1437230829393.JavaMail.www@wwinf1k02> <1CEDD746887FFF4B834688E7AF5FDA5A82164B69@federation.tavultesoft.local> Message-ID: <730296831.6627.1437309475345.JavaMail.www@wwinf2221> On 19 Jul 2015, 08:17, Marc Durdin wrote: >> 1. Does Keyman allow to place a Kana toggle? ? > Yes. See http://help.keyman.com/developer/9.0/docs/guide/guide_lang_options.php for one way to implement this. [...] > The help site for Keyman has a stack of documentation and examples and is the best place to start, but if you don?t find answers to your queries there, I am happy to answer additional questions about the specifics of Keyman off-list, or you can simply download and try the development tools yourself from http://tavultesoft.com/beta/ Thank you for having answered my questions. It's a new universe for me. I understand that end-users cannot install and use the layouts like a Windows keyboard driver. I do confess that I don't feel ready to go on this way, even while seeing that it is a very performative one. Thank you for the information. Best regards, Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150719/8a295eb4/attachment.html> From c933103 at gmail.com Sun Jul 19 07:52:57 2015 From: c933103 at gmail.com (gfb hjjhjh) Date: Sun, 19 Jul 2015 20:52:57 +0800 Subject: Input methods at the age of Unicode In-Reply-To: <CAGHjPP+_pjvAi9GxhqBqHXy=J7Xhph3cf-=4aPoqoxq0piUAfg@mail.gmail.com> References: <208789398.11783.1437230003340.JavaMail.www@wwinf1k02> <83bnf95vpl.fsf@gnu.org> <1182315476.15127.1437251677051.JavaMail.www@wwinf1g36> <CAGHjPP+_pjvAi9GxhqBqHXy=J7Xhph3cf-=4aPoqoxq0piUAfg@mail.gmail.com> Message-ID: <CAGHjPPLJTbJks4g7Hg3+oPv1RmMZbR+sZk5+19SoEeX1RVncCw@mail.gmail.com> forget to add Unicode maillist to reply address in my previous mail...add back and resend ---------- ????? ---------- ????"gfb hjjhjh" <c933103 at gmail.com> ???2015?7?19? ??9:38 ???Re: Input methods at the age of Unicode ????"Marcel Schneider" <charupdate at orange.fr> ??? the input method of type in the sound and pick corresponding characters have been developed for more than 20 years by many Chinese companies. Featues include prioritize characters to be selected according to usage frequency, if multiple sounds are input together without selection then it would provide selection of best fit vocabulary, with database constantly updating from network database, analyzing and personalizing its wordbank from social application, contact list, email, SMS and what you type, and if you input even more sounds together it can also give out candidates that fit natural sentence structure. And for those more commonly used characters or vocabulary, entering the first latin character of each letter's romanization is already enough for the input method to provide a list of best fit words, and thus saving typing time as each chinese character can romanize up to six or seven characters. It have also been developed that input methods have included some auto correction capability such that even if you have not master mandarin Chinese pronounciation and make some common mistake durung romanization, the program can still understand what you want to type. And on the other hand for increasing speed, as typing each chinese character directly by their romamization often involve typing up to 6 characters, people map each vowel and each syllables into individual keys so that only 2 key strokes is needed to press before people start selecting which characters they want. However, as all the above mentioned methods involve body-eye coordination to select word they want, those who really emphasis speed would stock to some older input methods where they decompose characters base on glyph's shape, convert that into a series of string which if designed properly those string can be unique to most of the characters, or even if it really come down to repeated code or when you are using a scheme that uses shorter code which yield higher repeat rate, people would memorize the candidate # as part of the string so that they can type without looking at the screen. The typing speed using such method (with regular keyboard) have been recorded at more than 220 characters per minute which have already exceeded the Chinese national standard for stenographer that utilize specialized stenotype machine. On the other hand it appears that some Chinese stenotype machinese [in mainland China] used sound of characters to type just like those mentioned at the beginning, and some of them even used an application that compatible with the one used in desktop environment... So it's hard to say if it help or hinder the typing speed by letting typer rely on visual hint... 2015?7?19? ??4:39? "Marcel Schneider" <charupdate at orange.fr>??? > On 18 Jul 2015, at 17:30, Eli Zaretskii <eliz at gnu.org> wrote: > > > > Date: Sat, 18 Jul 2015 16:33:23 +0200 (CEST) > > > From: Marcel Schneider <charupdate at orange.fr> > > > > You might wish also to use the Windows on-screen keyboard which allows > to see > > > what's exactly on each key while typing on whatever physical keyboard, > without > > > any need to have the keycap labels match the layout. This on-screen > keyboard is > > > built-in, only it does not support Kana shift states. > > > > That makes typing much more slow, unless you already know, at least > > approximately, where the keys are. you are talking to someone who is > > almost touch typist in English, but cannot remember for the life of me > > the Russian keyboard. Transliteration is the way to go in such cases, > > and it's strange that transliteration-based input methods are not > > readily available on Windows out of the box. > > The Chinese IME new style is a very smart tool based on transliteration. > You type just the syllables like they sound in English, and you get plenty > of suggestions among which to choose. There is still the Chinese old style > IME shipped with, too. I don't know Chinese so I can't tell more but > visually I believe these tools are very performative. Perhaps for Russian > no transliteration based input tool was built for Windows because we are > meant to use the keyboard straightforward. Now, the osk.exe should probably > include on each key picture the letter that is on the current physical > keyboard. That is what I often missed on such UIs, that you cannot make the > link with the base layout as the user knows it. I will say, too, that when > the OS is in Russian, the OSK should display cyrillic letters following the > Russian keyboard when the OSK displays a QWERTY keyboard layout. As you can > have the OSK always above, you just look at it and see the keys you're > striking. > > There is also the old solution with a keymap on a paper. You can open the > Russian layout in the MSKLC, choose a nice font, font-size, window size (to > get square keys; don't let the default rectangles), nice background colors. > Then save it as a picture, in the File menu > Save as image. Open this in > Paint or Gimp and add the Latin letters. > > Marcel > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150719/4a855372/attachment.html> From c933103 at gmail.com Sun Jul 19 08:04:10 2015 From: c933103 at gmail.com (gfb hjjhjh) Date: Sun, 19 Jul 2015 21:04:10 +0800 Subject: UDHR in Unicode: 400 translations in text form! In-Reply-To: <CAGHjPPJ_s_n0yLCBTPdsfuD+We4h=F1mZHH0YxgyiR-z3nisvA@mail.gmail.com> References: <55903CBC.9050900@efele.net> <CAGHjPPJ_s_n0yLCBTPdsfuD+We4h=F1mZHH0YxgyiR-z3nisvA@mail.gmail.com> Message-ID: <CAGHjPPLztb3U_E+Q7iw-xnGwp=miz0PQ=M+9KrCFZqRZ9RgVkw@mail.gmail.com> resending previously sent mail that forgotten to add the mailing list to receiver ---------- ????? ---------- ????"gfb hjjhjh" <c933103 at gmail.com> ???2015?6?29? ??4:35 ???Re: UDHR in Unicode: 400 translations in text form! ????"Eric Muller" <eric.muller at efele.net> ??? I've just use the web report form to report the discovery of its translation (or its partial translation) in Classical Chinese, Yue Chinese, and Min Nan Chinese form (ISO 639-3 code: lzh, yue, nan) and all of them are from wikipedia. Please try to dig into Wikipedia to see if you can find more translations. 2015?6?29? ??2:30? "Eric Muller" <eric.muller at efele.net>??? > I am pleased to announce that the UDHR in Unicode project ( > http://unicode.org/udhr) has reached a notable milestone: we now have 400 > translations of the Universal Declaration of Human Rights in text form. > > The latest translation is in Sinhala, thanks to Keshan Sodimana, Pasundu > de Silva and Sascha Brawer. Many thanks to them and to all the contributors. > > There is still plenty of work: most translations would benefit from a > review, and there are 55 translations for which we have PDFs or images, but > not yet the text form (look for stage 2 translations). > > The site has also been revamped a bit, with a more functional map, and a > more functional table of the translations. The mapping to ISO 639-3 and BCP > 47 have been updated to take into account the evolution of those standards. > > Again, thanks to all the contributors, past, present and future, > > Eric. > > PS: I believe I have taken care of all the backlog of contributions and > comments. If I missed something, sorry, and please ping me again. > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150719/93608cad/attachment.html> From c933103 at gmail.com Sun Jul 19 08:05:43 2015 From: c933103 at gmail.com (gfb hjjhjh) Date: Sun, 19 Jul 2015 21:05:43 +0800 Subject: Stationary vs. waving flags (was: Re: Adding RAINBOW FLAG to Unicode) In-Reply-To: <CAGHjPPJWU08vEiEunGvkjGE0LrmYvwaa+R37-2NtS7TkEcj02Q@mail.gmail.com> References: <20150706131135.665a7a7059d7ee80bb4d670165c8327d.b0fde2cbd7.wbe@email03.secureserver.net> <CAJ6uix6hLxdnBYCiujViCqu2Rs-KjqF7GZ95fXphiDyGGC8Fbg@mail.gmail.com> <CAGHjPPJWU08vEiEunGvkjGE0LrmYvwaa+R37-2NtS7TkEcj02Q@mail.gmail.com> Message-ID: <CAGHjPPLW+vFm8Je_4S_M0mgfMCYBzqg0=LEt7ZgS5NOL4QqW2Q@mail.gmail.com> resending mails that were not sent correctly. ---------- ????? ---------- ????"gfb hjjhjh" <c933103 at gmail.com> ???2015?7?7? ??4:30 ???Re: Stationary vs. waving flags (was: Re: Adding RAINBOW FLAG to Unicode) ???? <unicode at unicode.prg> ??? How about transparent flag? 2015?7?7? ??4:24? "Leonardo Boiko" <leoboiko at namakajiri.net>??? > 2015-07-06 17:11 GMT-03:00 Doug Ewell <doug at ewellic.org>: > > Is it your belief that users who wish to display an emoji flag care > > whether the flag is shown stationary versus flapping in the wind? > > I think a waving white flag is an emoji symbol for > "truce/surrender/come in peace", whereas a white rectangle doesn't > easily transmit the same idea. > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150719/6ec229e6/attachment.html> From charupdate at orange.fr Sun Jul 19 08:10:40 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Sun, 19 Jul 2015 15:10:40 +0200 (CEST) Subject: On-screen keyboards (was: Re: Input methods at the age of Unicode) In-Reply-To: <208789398.11783.1437230003340.JavaMail.www@wwinf1k02> References: <208789398.11783.1437230003340.JavaMail.www@wwinf1k02> Message-ID: <228083042.7008.1437311440268.JavaMail.www@wwinf2221> On 18 Jul 2015, at 16:44, I wrote: > You might wish also to use the Windows on-screen keyboard which allows to see what's exactly on each key while typing on whatever physical keyboard, without any need to have the keycap labels match the layout. This on-screen keyboard is built-in, only it does not support Kana shift states. Although the support of Kana shift states by the Windows OSK is not complete, it is *not* completely missing. Even more, it works fully if the Kana modifier is on Left Alt (as on my actual French delta layout), I found out testing the OSK again today. My opinion was coined when testing the OSK with a Windows keyboard layout where the Kana modifier is implemented on Right Control. Hitting or clicking Right Ctrl you see nothing happen except the Ctrl turning to white. However, when hitting or clicking the letter key, you get effectively the Kana layer character. Now on my delta, even the key labels are updated with Kana characters when Kana (Left Alt) is pressed. Please do not understand the following as a mere criticizing. I think that suggestions on Microsoft products are most useful because of the widespread use of the products. So I would add some suggestions that if agreed may help improve user experience. + The dead keys are actually not highlighted on the OSK. Perhaps it would be useful to get them looking somewhat different. + When hitting a letter key, no visual feedback is provided. I suggest that the feedback be the same when pressing the key as when clicking the key. + A few option settings should be provided, among which the additional display of the physical keycap labels (see my e-mail on 18 Jul 2015 at 22:34), the highlighting of the pinned keys (F and J), the display of the middle line, things allowing users to see which finger to use for a given key. Probably there may be other suggestions. As a UI issue, it might however not be followed up on this List. Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150719/f1283500/attachment.html> From charupdate at orange.fr Sun Jul 19 08:15:59 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Sun, 19 Jul 2015 15:15:59 +0200 (CEST) Subject: Input methods at the age of Unicode In-Reply-To: <CAGHjPPLJTbJks4g7Hg3+oPv1RmMZbR+sZk5+19SoEeX1RVncCw@mail.gmail.com> References: <208789398.11783.1437230003340.JavaMail.www@wwinf1k02> <83bnf95vpl.fsf@gnu.org> <1182315476.15127.1437251677051.JavaMail.www@wwinf1g36> <CAGHjPP+_pjvAi9GxhqBqHXy=J7Xhph3cf-=4aPoqoxq0piUAfg@mail.gmail.com> <CAGHjPPLJTbJks4g7Hg3+oPv1RmMZbR+sZk5+19SoEeX1RVncCw@mail.gmail.com> Message-ID: <1700203382.7082.1437311759766.JavaMail.www@wwinf2221> Hello, thank you very much for this many information I didn't know and that is very useful to put into perspective the Windows Chinese IME new experience I referred to on the Mailing List. Best regards, Marcel ? > Message du 19/07/15 15:01 > De : "gfb hjjhjh" > A : unicode at unicode.org > Copie ? : > Objet : Re: Input methods at the age of Unicode > > forget to add Unicode maillist to reply address in my previous mail...add back and resend ---------- ????? ---------- > ????"gfb hjjhjh" > ???2015?7?19? ??9:38 > ???Re: Input methods at the age of Unicode > ????"Marcel Schneider" > ??? > the input method of type in the sound and pick corresponding characters have been developed for more than 20 years by many Chinese companies. Featues include prioritize characters to be selected according to usage frequency, if multiple sounds are input together without selection then it would provide selection of best fit vocabulary, with database constantly updating from network database, analyzing and personalizing its wordbank from social application, contact list, email, SMS and what you type, and if you input even more sounds together it can also give out candidates that fit natural sentence structure. And for those more commonly used characters or vocabulary, entering the first latin character of each letter's romanization is already enough for the input method to provide a list of best fit words, and thus saving typing time as each chinese character can romanize up to six or seven characters. It have also been developed that input methods have included some auto correction capability such that even if you have not master mandarin Chinese pronounciation and make some common mistake durung romanization, the program can still understand what you want to type. And on the other hand for? increasing speed, as typing each chinese character directly by their romamization often involve typing up to 6 characters, people map each vowel and each syllables into individual keys so that only 2 key strokes is needed to press before people start selecting which characters they want. However, as all the above mentioned methods involve body-eye coordination to select word they want, those who really emphasis speed would stock to some older input methods where they decompose characters base on glyph's shape, convert that into a series of string which if designed properly those string can be unique to most of the characters, or even if it really come down to repeated code or when you are using a scheme that uses shorter code which yield higher repeat rate, people would memorize the candidate # as part of the string so that they can type without looking at the screen. The typing speed using such method (with regular keyboard) have been recorded at more than 220 characters per minute which have already exceeded the Chinese national standard for stenographer that utilize specialized stenotype machine. On the other hand it appears that some Chinese stenotype machinese [in mainland China] used sound of characters to type just like those mentioned at the beginning, and some of them even used an application that compatible with the one used in desktop environment... So it's hard to say if it help or hinder the typing speed by letting typer rely on visual hint... 2015?7?19? ??4:39? "Marcel Schneider" ??? > On 18 Jul 2015, at 17:30, Eli Zaretskii wrote: > > > > Date: Sat, 18 Jul 2015 16:33:23 +0200 (CEST) > > > From: Marcel Schneider > > > > You might wish also to use the Windows on-screen keyboard which allows to see > > > what's exactly on each key while typing on whatever physical keyboard, without > > > any need to have the keycap labels match the layout. This on-screen keyboard is > > > built-in, only it does not support Kana shift states. > > > > That makes typing much more slow, unless you already know, at least > > approximately, where the keys are. you are talking to someone who is > > almost touch typist in English, but cannot remember for the life of me > > the Russian keyboard. Transliteration is the way to go in such cases, > > and it's strange that transliteration-based input methods are not > > readily available on Windows out of the box. > > The Chinese IME new style is a very smart tool based on transliteration. You type just the syllables like they sound in English, and you get plenty of suggestions among which to choose. There is still the Chinese old style IME shipped with, too. I don't know Chinese so I can't tell more but visually I believe these tools are very performative. Perhaps for Russian no transliteration based input tool was built for Windows because we are meant to use the keyboard straightforward. Now, the osk.exe should probably include on each key picture the letter that is on the current physical keyboard. That is what I often missed on such UIs, that you cannot make the link with the base layout as the user knows it. I will say, too, that when the OS is in Russian, the OSK should display cyrillic letters following the Russian keyboard when the OSK displays a QWERTY keyboard layout. As you can have the OSK always above, you just look at it and see the keys you're striking. > > There is also the old solution with a keymap on a paper. You can open the Russian layout in the MSKLC, choose a nice font, font-size, window size (to get square keys; don't let the default rectangles), nice background colors. Then save it as a picture, in the File menu > Save as image. Open this in Paint or Gimp and add the Latin letters. > > Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150719/85c79aae/attachment.html> From doug at ewellic.org Sun Jul 19 13:39:23 2015 From: doug at ewellic.org (Doug Ewell) Date: Sun, 19 Jul 2015 12:39:23 -0600 Subject: Stationary vs. waving flags (was: Re: Adding RAINBOW FLAG to Unicode) In-Reply-To: <mailman.1.1437325201.8488.unicode@unicode.org> References: <mailman.1.1437325201.8488.unicode@unicode.org> Message-ID: <DBBB8D665ED6452B89F82E55460FBB8A@DougEwell> gfb hjjhjh <c933103 at gmail dot com> wrote: >> I think a waving white flag is an emoji symbol for >> "truce/surrender/come in peace", whereas a white rectangle doesn't >> easily transmit the same idea. > > How about transparent flag? I'm still not convinced this is a problem that needs to be solved. "A flag goes here which your system couldn't display" is all that the base character's glyph is trying to convey. Proposing a new base character will ensure that this solution gets delayed by at least another year. Is it really worth it? -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From asmus-inc at ix.netcom.com Sun Jul 19 18:26:52 2015 From: asmus-inc at ix.netcom.com (Asmus Freytag (t)) Date: Sun, 19 Jul 2015 16:26:52 -0700 Subject: Stationary vs. waving flags In-Reply-To: <DBBB8D665ED6452B89F82E55460FBB8A@DougEwell> References: <mailman.1.1437325201.8488.unicode@unicode.org> <DBBB8D665ED6452B89F82E55460FBB8A@DougEwell> Message-ID: <55AC323C.7050105@ix.netcom.com> An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150719/a74be55a/attachment.html> From charupdate at orange.fr Mon Jul 20 02:39:30 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Mon, 20 Jul 2015 09:39:30 +0200 (CEST) Subject: Plain text custom fraction input (Child thread of: Input methods at the age of Unicode) Message-ID: <1598696058.2090.1437377970835.JavaMail.www@wwinf1f21> Hello, I've got a concern about entering customized (vulgar) fractions in plain text, using a sequence of superscript and subscript digits separated by U+2044 FRACTION SLASH. I submitted it in PRI#297. As I need to clear up this point for future keyboard layout usage recommendations, I would like to submit this to the attention of the Unicode Mailing List for advice and discussion. A demo file opening in a word processor, typeset in Arial Unicode MS typeface, is available at http://bit.ly/1DNPtf0 To view it in PDF, there is another file at http://bit.ly/1JutBGK The following is based on http://www.unicode.org/review/pri297/feedback.html Date/Time: Mon Apr 13 10:07:49 CDT 2015 There is some additional information about U+2044 FRACTION SLASH I would suggest adding at the ?Fraction Slash? paragraphs in the ?Other Punctuation? subsection of ?6.2, page 273 of the Standard, as well as in the Code Charts? Fractions subheader before U+2150. U+2044 FRACTION SLASH working together with superscripts and subscripts is so obvious no discussion is needed. [Note: This proved to be wrong. I'm sorry not to have e-mailed this to the List.] On the other hand, as fraction formatting needs at least desktop publishing software, it is usually not a part of office automation. It seems therefore useful to show the plain text entering method for (so-called vulgar) fractions. The "Number Forms" block?s "Fractions" subhead may therefore be followed by a NOTICE_LINE like this one: ?@+? [TAB] [TAB] ?Fractions may be composed in plain text on a [superscripts] 2044 [subscripts] pattern.? On the other hand, the Fraction Slash notice in the Standard might contain the informations below (including those already provided in the Standard). ___________________________ Fraction Slash. U+2044 FRACTION SLASH is used between digits to form numeric fractions. It is kerning for use with superscripts and subscripts to compose plain text fractions such as ??? and ???.The pattern of a plain text fraction built using the fraction slash is defined as follows: any sequence of one or more superscript digits (U+00B9, U+00B2, U+00B3, U+2074 - U+2079, U+2070), followed by the fraction slash, followed by any sequence of one or more subscript digits (U+2080 - U+2089). U+2044 FRACTION SLASH may also act as a formatting command for use with decimal digits, and it may be used instead of U+002F SOLIDUS prior to applying fraction formatting. The standard form of a fraction designed for formatting is defined as follows: any sequence of one or more decimal digits (General Category = Nd), followed by the fraction slash, followed by any sequence of one or more decimal digits. If the fraction is to be separated from a previous number, then a space can be used, choosing the appropriate width (normal, thin, zero width, and so on). For example, 1 + thin space + 3 + fraction slash + 4 can be displayed as 1?. Whether they are plain text or formatted, fractions should be displayed as a unit, such as ? or {unavailable glyph}. The precise choice of display can depend on additional formatting information. If the displaying software is incapable of mapping the fraction to a unit, then it can also be displayed as a simple linear sequence as a fallback (for example, 3/4). For fallback display, U+002F SOLIDUS is preferred, because the fraction slash kerns. ???????????????????????????? Date/Time: Wed Apr 22 11:26:44 CDT 2015 Opt Subject: PRI #297 Fraction slash 2044 FRACTION SLASH Additionally to a previous feedback, I would suggest adding the hint about how to compose arbitrary fractions in plain text, in another place as well. This could be the entry of the fraction slash U+2044 and, more precisely, the end of the existing COMMENT_LINE, after a comma: 2044 FRACTION SLASH = solidus (in typography) * for composing arbitrary fractions, in plain text with superscripts and subscripts. Thank you for feedback. Best regards, Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150720/1f097a8f/attachment.html> From doug at ewellic.org Mon Jul 20 10:30:48 2015 From: doug at ewellic.org (Doug Ewell) Date: Mon, 20 Jul 2015 08:30:48 -0700 Subject: Stationary vs. waving flags Message-ID: <20150720083048.665a7a7059d7ee80bb4d670165c8327d.7dd2ebc26a.wbe@email03.secureserver.net> Asmus Freytag (t) <asmus dash inc at ix dot netcom dot com> wrote: >> Proposing a new base character will ensure that this solution gets >> delayed by at least another year. Is it really worth it? > > Sometimes haste is a poor guide. This is ironic, considering that all of this flag stuff belongs to the emoji wing of Unicode, where fast-tracking of "urgently needed" cheese wedges and hockey sticks is the norm. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From jknappen at web.de Mon Jul 20 10:46:42 2015 From: jknappen at web.de (=?UTF-8?Q?=22J=C3=B6rg_Knappen=22?=) Date: Mon, 20 Jul 2015 17:46:42 +0200 Subject: Security concerns: OGHAM SPACE MARK Message-ID: <trinity-dd06ed7d-3dd0-4b5c-826d-0a9b88529134-1437407202845@3capp-webde-bs44> An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150720/e2b4c2a1/attachment.html> From verdy_p at wanadoo.fr Mon Jul 20 11:40:51 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Mon, 20 Jul 2015 18:40:51 +0200 Subject: Security concerns: OGHAM SPACE MARK In-Reply-To: <trinity-dd06ed7d-3dd0-4b5c-826d-0a9b88529134-1437407202845@3capp-webde-bs44> References: <trinity-dd06ed7d-3dd0-4b5c-826d-0a9b88529134-1437407202845@3capp-webde-bs44> Message-ID: <CAGa7JC0cmC+CSGv3Z205__oAyB5qzdqvhE+LJnxDVF2VA7yxeQ@mail.gmail.com> Bank transactions do not send in the same field amounts that contain operations to compute. Also they limit the kind of digits they accept for interchanging. A change of sign is a different kind of transaction with different responsabilities, so signs are prohibited, they are replaced by a separate codification of the transaction type. So the risk may only exist when presenting a signed number to a user and asking him to accept the transaction. There are simialr issues when amounts are using grouping separators and ambiguously use the decimal separator with a precision counting as many digits as there are digits in groups (for most locales, groups are made with 3 digits, so prices always avoid using formats with 3 decimals and most currencies have 0 or 2 decimals of precision). This could be a problem in locales grouping digits by group of 2. If group separators are used to show a price to a user in a UI, it is strongly suggested to avoid anything else than a (narrow) space. If the document will be printed you may avoid all separators and replace the decimal sepator by the currency symbol, or use a modified typography to render the decimals (e.g. in superscript or smaller font size). But the most common confusion when presenting prices to users, is to not clearly state if taxes and additional fees will be applied or have been included, or will have to be paid after the purchase when receiving the product (e.g. buying a product in Australia from Europe: you accept the price in AUD, you know that there will be bank fees to process the change operation, you pay the price to the seller, later your bank performs the change operation and applies a new currency rate plus fees, and you have a second line of payment in your bank account, then a week later you receive the product but to get it you must first pay the import taxes and VAT to the customs (via the postal or delivery service, plus sometimes a new fee to the devlivery service that had to advance the custom taxes and acts as an intermediate). The total price is much higher than that was advertized. Some sellers (notably on the Internet) do not explain clearly that these products will cost more and what to expect, even if they target customers in other countries in their own language as if they had a local branch in that country. Banks are protected from these errors, but not customers. 2015-07-20 17:46 GMT+02:00 "J?rg Knappen" <jknappen at web.de>: > I stumbled over a very strange snippet of javascript code, where an > apparent > minus sign is interpreted as a space here: > > http://stackoverflow.com/questions/31507143/why-does-2-40-equal-42 > > Imagine such kind of behaviour in bank transactions ... > > --J?rg Knappen > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150720/a7456ffe/attachment.html> From prosfilaes at gmail.com Tue Jul 21 00:05:11 2015 From: prosfilaes at gmail.com (David Starner) Date: Tue, 21 Jul 2015 05:05:11 +0000 Subject: Security concerns: OGHAM SPACE MARK In-Reply-To: <trinity-dd06ed7d-3dd0-4b5c-826d-0a9b88529134-1437407202845@3capp-webde-bs44> References: <trinity-dd06ed7d-3dd0-4b5c-826d-0a9b88529134-1437407202845@3capp-webde-bs44> Message-ID: <CAMZ=zj4Hn1aHfj+nh3COY57RyQ8T3TpA1b+bQRxTWRhXNL4dqg@mail.gmail.com> It's a confusable. There's a lot of them in Unicode. Auditing source code is hard, and if it's a concern, I suggest filtering out all non-ASCII characters. If you really think it's a concern, let's be specific; what do you mean this kind of behavior in bank transactions? If you're worried about the bank's JavaScript, you already have to trust code written for OS/360 that the bank considers proprietary and to be keep deeply hidden, as if you could read GOTO-laden PL/I anyway. On Mon, Jul 20, 2015 at 8:49 AM "J?rg Knappen" <jknappen at web.de> wrote: > I stumbled over a very strange snippet of javascript code, where an > apparent > minus sign is interpreted as a space here: > > http://stackoverflow.com/questions/31507143/why-does-2-40-equal-42 > > Imagine such kind of behaviour in bank transactions ... > > --J?rg Knappen > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150721/36638c01/attachment.html> From charupdate at orange.fr Tue Jul 21 01:45:19 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Tue, 21 Jul 2015 08:45:19 +0200 (CEST) Subject: Plain text custom fraction input (Part of: Input methods at the age of Unicode) In-Reply-To: <1598696058.2090.1437377970835.JavaMail.www@wwinf1f21> References: <1598696058.2090.1437377970835.JavaMail.www@wwinf1f21> Message-ID: <2061348317.2626.1437461119877.JavaMail.www@wwinf1f21> Entering fractions in plain text is consistent with the very core of Unicode?s purpose, which (please check if I?m right) is to empower all people on earth to get in readable plain text as much information as possible.? As fractions, that ISO wanted to stay called ?vulgar?, are part of this information, the designer of Arial?Unicode?MS matched precomposed fractions, superscript and subscript digits and the fraction slash so that in the cases where equal precomposed fractions exist, [superscript digit(s)] U+2044 [subscript digit(s)] looks exactly like [precomposed fraction].? I really can?t see any difference.? If we look at the example in the demo files, we get convinced that in Arial?Unicode?MS, U+00B3 U+2044 U+2085 ??? is congruent with U+2157 ?.? DejaVu?Sans and DejaVu Serif and their Condensed variants are some other fonts that work.? Well, a lot of other fonts don?t, because they are uncomplete or for some other reasons, but I cannot really infer from what I see on my machine, for the reason that my versions are uncomplete.? You may test it by yourself and you are still welcome to download the samples: .docx: http://bit.ly/1DNPtf0 .pdf: http://bit.ly/1JutBGK The lesson I?learned from this is that proportionally spaced fonts which comply fully to the Standard, allow users to get nice fractions without formatting.? Obviously that does not work with monospaced fonts, nor does it look nice when the ASCII superscripts (???) and the other super- and subscripts are not of the same font, as it may occur in browsers but also in word processing.? To run this?well, call it a trick, we must make sure to use a convenient font.? But at this condition it works, and I see no reason not to do it.? Even more, I do not consider it as a mere trick, but as normal usage. The problem we?ve now to deal with, is why this usage is hidden in the Standard.? And I?d like to bring immediately my answer to the question, an answer inherent in what I wrote yesterday:? The plain text custom fraction input method is not recommended in TUS *because* fraction formatting is a part of desktop publishing software but not of office automation software.? That may be wrong, and I didn?t check whether at one time of history, Unicode has removed plain text custom fractions from TUS, or not.? Nor can I?know whether Unicode has been urged to remove / not to inform, or not.? However, a number of facts lead me to the supposition that software marketing reasons are implied. I need probably to underscore that I?m not here to disturb business, but to try to help to improve user experience, worktool usefulness, and overall productivity. Regards, Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150721/45c45fea/attachment.html> From richard.wordingham at ntlworld.com Tue Jul 21 01:56:33 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Tue, 21 Jul 2015 07:56:33 +0100 Subject: Chinese Word Breaking Message-ID: <20150721075633.2e76fcab@JRWUBU2> I'm puzzled by a statement in UAX #29 Unicode Text Segmentation: "In particular, the characters with the Line_Break property values of Contingent_Break (CB), Complex_Context (SA/Southeast Asian), and Unknown (XX) are assigned word boundary property values based on criteria outside of the scope of this annex. That means that satisfactory treatment of languages like Chinese or Thai requires special handling." Is 'Contingent_Break (CB)' an error for 'Ideographic (ID)'? That would make sense for Chinese, for some applications needs to group ideographs into words. While I am on the topic, does anyone know of character level mechanisms used to advise alogrithms of the word boundaries (or lack of boundaries) in Chinese text? Richard. From charupdate at orange.fr Tue Jul 21 03:46:33 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Tue, 21 Jul 2015 10:46:33 +0200 (CEST) Subject: UTF-8 display (was: Re: a mug) In-Reply-To: <1005356845.3994.1436778954130.JavaMail.www@wwinf1h12> References: <1005356845.3994.1436778954130.JavaMail.www@wwinf1h12> Message-ID: <1650585311.5804.1437468393388.JavaMail.www@wwinf1f21> On 13 Jul 2015, at 11:28, I wrote: > The only time I saw UTF-8 like on the T-shirt, was when opening UTF-8 files that didn't specify charset=UTF-8. The thing to do was to add the charset in the file header. Now I see that this issue is much more tricky. I've just stumbled over a no-display page instead of (or at the URL of) http://www-01.ibm.com/software/globalization/topics/keyboards/physical.jsp where I read: Our apologies??? while the source as displayed by Firefox shows: charset=utf-8 Our apologies (The markup comes from the header 1 tags.) The trick is that the real HTML file as saved by Zotero contains: Our apologies? (with a U+2026) and is encoded in... charset=windows-1252 Once changed this to utf-8, the page displays correctly: Our apologies? This may be why people are puzzled with UTF-8 up to the end we've seen. So I would like to present my apologies to the List, and ask if anyone would help us to know the real problem (browsers, web editors, or else) and how to fix it. I don't think it's a mere HTML issue, as it concerns the Unicode Transformation Format. Best regards, Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150721/8397f282/attachment.html> From albrecht.dreiheller at siemens.com Tue Jul 21 04:12:00 2015 From: albrecht.dreiheller at siemens.com (Dreiheller, Albrecht) Date: Tue, 21 Jul 2015 09:12:00 +0000 Subject: AW: Security concerns: OGHAM SPACE MARK In-Reply-To: <trinity-dd06ed7d-3dd0-4b5c-826d-0a9b88529134-1437407202845@3capp-webde-bs44> References: <trinity-dd06ed7d-3dd0-4b5c-826d-0a9b88529134-1437407202845@3capp-webde-bs44> Message-ID: <3E10480FE4510343914E4312AB46E74212B1879D@DEFTHW99EH5MSX.ww902.siemens.net> Allowing arbitrary non-Ascii characters in programming languages will make it more difficult to detect malicious code. If the author really intends to deceive potential readers he will succeed. Programming languages like JS should at least implement exclusion rules from the "Unicode Confusables Characters" list. Otherwise such programming languages ought to be black-listed. Albrecht. Von: Unicode [mailto:unicode-bounces at unicode.org] Im Auftrag von "J?rg Knappen" Gesendet: Montag, 20. Juli 2015 17:47 An: Unicode Public Betreff: Security concerns: OGHAM SPACE MARK I stumbled over a very strange snippet of javascript code, where an apparent minus sign is interpreted as a space here: http://stackoverflow.com/questions/31507143/why-does-2-40-equal-42 Imagine such kind of behaviour in bank transactions ... --J?rg Knappen -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150721/fc17c543/attachment.html> From c933103 at gmail.com Tue Jul 21 05:10:14 2015 From: c933103 at gmail.com (gfb hjjhjh) Date: Tue, 21 Jul 2015 18:10:14 +0800 Subject: Chinese Word Breaking In-Reply-To: <20150721075633.2e76fcab@JRWUBU2> References: <20150721075633.2e76fcab@JRWUBU2> Message-ID: <CAGHjPP+MH6KYwLcC99X5+cxL+S0TzUgt42yMF0ONwNVCk1D7SQ@mail.gmail.com> When you write text in modern Chinese, there will not be any break between different words, and thus if you segment characters according to the ideographic characters, what being groupped together would either be a clausee or a sentence, Or even a whole paragraph if you are handling some older text without punctuations. Also, that group of characters are not solely used by modern standard chinese. For example, in Japanese there are expressions like ???? which these four characters are generally treated as one word but as you can see it is a mix of ideograph and hiragana. Similarly Taiwanese (nan) user would also write latin alphabet together with these ideograph to form word. In these cases if you change it to ID then what you are selecting would just be part of the word. And on character level you can't even tell what language the character is written in, let alone telling apart which character is word or not. In fact, in literal Chinese (lzh), most of these characters can be consider as a word itself. 2015?7?21? ??2:59? "Richard Wordingham" <richard.wordingham at ntlworld.com>??? > I'm puzzled by a statement in UAX #29 Unicode Text Segmentation: > > "In particular, the characters with the Line_Break property values of > Contingent_Break (CB), Complex_Context (SA/Southeast Asian), and > Unknown (XX) are assigned word boundary property values based on > criteria outside of the scope of this annex. That means that > satisfactory treatment of languages like Chinese or Thai requires > special handling." > > Is 'Contingent_Break (CB)' an error for 'Ideographic (ID)'? That would > make sense for Chinese, for some applications needs to group ideographs > into words. > > While I am on the topic, does anyone know of character level > mechanisms used to advise alogrithms of the word boundaries (or lack > of boundaries) in Chinese text? > > Richard. > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150721/fd30f9e6/attachment.html> From prosfilaes at gmail.com Tue Jul 21 05:45:40 2015 From: prosfilaes at gmail.com (David Starner) Date: Tue, 21 Jul 2015 10:45:40 +0000 Subject: Security concerns: OGHAM SPACE MARK In-Reply-To: <3E10480FE4510343914E4312AB46E74212B1879D@DEFTHW99EH5MSX.ww902.siemens.net> References: <trinity-dd06ed7d-3dd0-4b5c-826d-0a9b88529134-1437407202845@3capp-webde-bs44> <3E10480FE4510343914E4312AB46E74212B1879D@DEFTHW99EH5MSX.ww902.siemens.net> Message-ID: <CAMZ=zj4qLjGzUkp1HOEeq4UUNmrLgOXME9vHvjQf0zmwUy-E6g@mail.gmail.com> On Tue, Jul 21, 2015 at 2:14 AM Dreiheller, Albrecht < albrecht.dreiheller at siemens.com> wrote: > If the author really intends to deceive potential readers he will succeed. > Possibly. Code is hard. But the Ogham space is not a real threat; it's easy to search for and obviously a deliberate attempt to confuse. > Programming languages like JS should at least implement exclusion rules > from the "Unicode Confusables Characters" list. > Have you looked at that list? 1 and l is one pair of confusables in that list, and while that is an incredibly classic confusable pair, it's not one that's implementable in a programming language. ? and a is another pair; but if you ban ?, you've practically banned Cyrillic identifiers completely. > > Otherwise such programming languages ought to be black-listed. > Black-listed? By whom? If you wish to make sure a set of code you control does not use non-ASCII characters, most source-control systems.will let you reject such files from being checked in. If you want to reject JavaScript altogether, that is also your freedom. But of all the attacks weighed against JavaScript, I seriously doubt that this is the one that will bring it down. As note for confusable code, let me point out this code that someone tried to illicitly push into the Linux CVS back in 2003: if ((options == (__WCLONE|__WALL)) && (current->uid = 0)) retval = -EINVAL; the all-ASCII trick being that current->uid is being set to zero, not checked. It would be much easier to find any sort of Unicode trick then a backdoor like that in a sufficiently large body of code. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150721/34687d9f/attachment.html> From philip_chastney at yahoo.com Tue Jul 21 07:49:52 2015 From: philip_chastney at yahoo.com (philip chastney) Date: Tue, 21 Jul 2015 05:49:52 -0700 Subject: UTF-8 display (was: Re: a mug) In-Reply-To: <1650585311.5804.1437468393388.JavaMail.www@wwinf1f21> Message-ID: <1437482992.20206.YahooMailBasic@web162601.mail.bf1.yahoo.com> so the webmaster put up the page, declaring the charset to be UTF-8... but what charset was being used by the guy who knocked out the HTML? it could be more complicated than that: maybe the page was produced using UTF-8, somebody reads the page using, say, WIndows 1252, and "converts" it to UTF-8 I'm sure, with a little effort, ever more complicated scenarii could be constructed -- it's amazing what can be achieved when arrogance and ignorance are combined /phil -------------------------------------------- On Tue, 21/7/15, Marcel Schneider <charupdate at orange.fr> wrote: Subject: UTF-8 display (was: Re: a mug) To: "UmeshPN" <umesh.p.nair at gmail.com>, "DanielB?nzli" <daniel.buenzli at erratique.ch> Cc: "UnicodeMailingList" <unicode at unicode.org> Date: Tuesday, 21 July, 2015, 8:46 AM On 13 Jul 2015, at 11:28, I wrote: > The only time I saw UTF-8 like on the T-shirt, was when opening UTF-8 files that didn't specify charset=UTF-8. The thing to do was to add the charset in the file header. Now I see that this issue is much more tricky. I've just stumbled over a no-display page instead of (or at the URL of) http://www-01.ibm.com/software/globalization/topics/keyboards/physical.jsp where I read: Our apologies??? while the source as displayed by Firefox shows: charset=utf-8 Our apologies (The markup comes from the header 1 tags.) The trick is that the real HTML file as saved by Zotero contains: Our apologies? (with a U+2026) and is encoded in... charset=windows-1252 Once changed this to utf-8, the page displays correctly: Our apologies? This may be why people are puzzled with UTF-8 up to the end we've seen. So I would like to present my apologies to the List, and ask if anyone would help us to know the real problem (browsers, web editors, or else) and how to fix it. I don't think it's a mere HTML issue, as it concerns the Unicode Transformation Format. Best regards, Marcel From charupdate at orange.fr Tue Jul 21 08:45:24 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Tue, 21 Jul 2015 15:45:24 +0200 (CEST) Subject: UTF-8 display (was: Re: a mug) In-Reply-To: <1437482992.20206.YahooMailBasic@web162601.mail.bf1.yahoo.com> References: <1650585311.5804.1437468393388.JavaMail.www@wwinf1f21> <1437482992.20206.YahooMailBasic@web162601.mail.bf1.yahoo.com> Message-ID: <457130915.11820.1437486324730.JavaMail.www@wwinf1n18> On 21 Jul 2015, at 14;49, philip chastney wrote: > so the webmaster put up the page, declaring the charset to be UTF-8... > > but what charset was being used by the guy who knocked out the HTML? > > it could be more complicated than that: maybe the page was produced using UTF-8, > somebody reads the page using, say, WIndows 1252, and "converts" it to UTF-8 > > I'm sure, with a little effort, ever more complicated scenarii could be constructed > -- it's amazing what can be achieved when arrogance and ignorance are combined I fear things have grown somewhat upside down, so I'll try to outline the real scenario: 1 - I open the page, the horizontal ellipsis is displayed as ??? (of course I don't know yet that it's a horizontal ellipsis...). 2 - I remember my comment about the T-shirt and decide to check whether it's accurate. Firefox shows me the page is in UTF-8 and that there is nothing after "Our apologies". 3 - After some trial and error, I save the page in Zotero and open the folder. The only HTML file inside is declared as Windows-1252, and there is the horizontal ellipsis. 4 - I back up the original file, try modifying the charset value to utf-8 and refresh the page, the ??? converts to a horizontal ellipsis. To answer your questions, I figure out that the page was written on a Windows-1252 template but without sticking with this charset. U+2026 was probably an autocorrect. So it was "produced using UTF-8" but "the webmaster" must have published it under the old charset. The puzzling point is that Firefox tried UTF-8 and told me he's serious, but "ate" the U+2026 while it used the native Windows-1252 to "display" it... I hope that some macro could enable the "webmasters" to rapidly update websites, because resolving this "funny" "scenario" has cost me some "effort" today! Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150721/fb05d7ce/attachment.html> From tom at bluesky.org Tue Jul 21 09:00:45 2015 From: tom at bluesky.org (Tom Gewecke) Date: Tue, 21 Jul 2015 10:00:45 -0400 Subject: UTF-8 display (was: Re: a mug) In-Reply-To: <457130915.11820.1437486324730.JavaMail.www@wwinf1n18> References: <1650585311.5804.1437468393388.JavaMail.www@wwinf1f21> <1437482992.20206.YahooMailBasic@web162601.mail.bf1.yahoo.com> <457130915.11820.1437486324730.JavaMail.www@wwinf1n18> Message-ID: <22EDF3DA-7E3A-4861-96E2-9BF43E81DD0B@bluesky.org> The IBM page seems to have an ellipsis character in UTF-8, with bytes E2 80 A6. The web server is set to force all browsers to use the encoding iso-8859-1 regardless of what charset is stipulated in the html code. The browser uses the Win 1252 equivalents and displays ??? To see what a web server is forcing, if anything, you can use http://web-sniffer.net/ On Jul 21, 2015, at 9:45 AM, Marcel Schneider wrote: > > I fear things have grown somewhat upside down, so I'll try to outline the real scenario: > > 1 - I open the page, the horizontal ellipsis is displayed as ??? (of course I don't know yet that it's a horizontal ellipsis...). > 2 - I remember my comment about the T-shirt and decide to check whether it's accurate. Firefox shows me the page is in UTF-8 and that there is nothing after "Our apologies". > 3 - After some trial and error, I save the page in Zotero and open the folder. The only HTML file inside is declared as Windows-1252, and there is the horizontal ellipsis. > 4 - I back up the original file, try modifying the charset value to utf-8 and refresh the page, the ??? converts to a horizontal ellipsis. > > To answer your questions, I figure out that the page was written on a Windows-1252 template but without sticking with this charset. U+2026 was probably an autocorrect. So it was "produced using UTF-8" but "the webmaster" must have published it under the old charset. > > The puzzling point is that Firefox tried UTF-8 and told me he's serious, but "ate" the U+2026 while it used the native Windows-1252 to "display" it... > > I hope that some macro could enable the "webmasters" to rapidly update websites, because resolving this "funny" "scenario" has cost me some "effort" today! > > Marcel > From doug at ewellic.org Tue Jul 21 11:33:17 2015 From: doug at ewellic.org (Doug Ewell) Date: Tue, 21 Jul 2015 09:33:17 -0700 Subject: Plain text custom fraction input Message-ID: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> As explained in TUS 7.0, ?6.2 ("General Punctuation"), p. 273, U+2044 FRACTION SLASH is intended for use with Basic Latin digits, or other digits with General Category = Nd. The superscript and subscript presentation forms have General Category = No. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From gwalla at gmail.com Tue Jul 21 15:54:39 2015 From: gwalla at gmail.com (Garth Wallace) Date: Tue, 21 Jul 2015 13:54:39 -0700 Subject: Emoji: The Movie Message-ID: <CA+p4_H2OGVxzyrU=aKo6PdixhNfgyoL1BRRoSN3fvpTLA=umrg@mail.gmail.com> I'm not sure if this is a joke or not: http://deadline.com/2015/07/emoji-movie-sony-pictures-animation-anthony-leondis-kung-fu-panda-secrets-of-the-masters-1201482768/ From doug at ewellic.org Tue Jul 21 16:05:20 2015 From: doug at ewellic.org (Doug Ewell) Date: Tue, 21 Jul 2015 14:05:20 -0700 Subject: Emoji: The Movie Message-ID: <20150721140520.665a7a7059d7ee80bb4d670165c8327d.75b6c5f170.wbe@email03.secureserver.net> Garth Wallace <gwalla at gmail dot com> wrote: > I'm not sure if this is a joke or not: Yes. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From albrecht.dreiheller at siemens.com Tue Jul 21 16:55:05 2015 From: albrecht.dreiheller at siemens.com (Dreiheller, Albrecht) Date: Tue, 21 Jul 2015 21:55:05 +0000 Subject: AW: Security concerns: OGHAM SPACE MARK In-Reply-To: <CAMZ=zj4qLjGzUkp1HOEeq4UUNmrLgOXME9vHvjQf0zmwUy-E6g@mail.gmail.com> References: <trinity-dd06ed7d-3dd0-4b5c-826d-0a9b88529134-1437407202845@3capp-webde-bs44> <3E10480FE4510343914E4312AB46E74212B1879D@DEFTHW99EH5MSX.ww902.siemens.net> <CAMZ=zj4qLjGzUkp1HOEeq4UUNmrLgOXME9vHvjQf0zmwUy-E6g@mail.gmail.com> Message-ID: <3E10480FE4510343914E4312AB46E74212B189DC@DEFTHW99EH5MSX.ww902.siemens.net> On Tue, Jul 21, 2015 at 12:46 David Starner [mailto:prosfilaes at gmail.com] wrote: On Tue, Jul 21, 2015 at 2:14 AM Dreiheller, Albrecht <albrecht.dreiheller at siemens.com> wrote: If the author really intends to deceive potential readers he will succeed. Possibly. Code is hard. But the Ogham space is not a real threat; it's easy to search for and obviously a deliberate attempt to confuse. My concern is not about the Ogham space, but about the free usage of non-Ascii in programming languages in general. Just imagine, when you decide to open a door for public traffic in busy city with a security check point, you wouldn't consider only how to check a single person; instead, you have to consider how you would check thousands of people within one hour, if you don?t plan to close the door again. Therefore, consider a huge software system written developed in, let's say, Serbia or Russia using Cyrillic names throughout for classes and variables. int ?????? = ???????(?????????); return ??????; It might be a valuable system with some unique features and you want to evaluate the source code before you buy it. Or the community want's to adopt it for Open Source because it has some nice features. Looking for a deliberate attempt to confuse within this code would be like looking for a needle in a haystack, since every line has non-Ascii in it. Programming languages like JS should at least implement exclusion rules from the "Unicode Confusables Characters" list. Have you looked at that list? 1 and l is one pair of confusables in that list, and while that is an incredibly classic confusable pair, it's not one that's implementable in a programming language. ? and a is another pair; but if you ban ?, you've practically banned Cyrillic identifiers completely. Of course, there are confusables within the Ascii range, but they are well-known for years, and thus more likely to be detected. Regarding your other example, some compilers warn if you have an assignment within an if-clause. I used a term "exclusion rules", meaning a ruleset bases on the confusables list. For example the following code sequence int a; { int ?; a = 5; } (N.B. the second "?" is Cyrillic) could be banned by a rule saying "It's not allowed to declare a variable that is DISTINCT from others (thus not hiding them) but which is CONFUSABLY SIMILAR to another variable in the same scope." Another rule could demand "It's not allowed to mix two alphabets within one name". This would not ban Cyrillic identifiers in general. Otherwise such programming languages ought to be black-listed. Black-listed? By whom? If you wish to make sure a set of code you control does not use non-ASCII characters, most source-control systems.will let you reject such files from being checked in. If you want to reject JavaScript altogether, that is also your freedom. But of all the attacks weighed against JavaScript, I seriously doubt that this is the one that will bring it down. With "black-listed" I meant "known to be unsafe" in some way. Just the same way as domain-registration authorities would be "known to be unsafe" if they accept or allow domain names like myb?nk.com beside mybank.com where one has a Latin "a" and the other has a Cyrillic "?" in it, thus ignoring the confusables list. BTW, I don't want to attack JavaScript. It's pretty. The fathers of ALGOL and other early languages racked their brain to avoid ambigous semantics caused by poor syntax rules. Today when Unicode supersedes Ascii in some contexts the challenges are different, but not less important. Albrecht. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150721/bcc562f3/attachment.html> From asmus-inc at ix.netcom.com Tue Jul 21 18:06:45 2015 From: asmus-inc at ix.netcom.com (Asmus Freytag (t)) Date: Tue, 21 Jul 2015 16:06:45 -0700 Subject: AW: Security concerns: OGHAM SPACE MARK In-Reply-To: <3E10480FE4510343914E4312AB46E74212B189DC@DEFTHW99EH5MSX.ww902.siemens.net> References: <trinity-dd06ed7d-3dd0-4b5c-826d-0a9b88529134-1437407202845@3capp-webde-bs44> <3E10480FE4510343914E4312AB46E74212B1879D@DEFTHW99EH5MSX.ww902.siemens.net> <CAMZ=zj4qLjGzUkp1HOEeq4UUNmrLgOXME9vHvjQf0zmwUy-E6g@mail.gmail.com> <3E10480FE4510343914E4312AB46E74212B189DC@DEFTHW99EH5MSX.ww902.siemens.net> Message-ID: <55AED085.2040109@ix.netcom.com> An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150721/3945cc16/attachment.html> From prosfilaes at gmail.com Tue Jul 21 18:29:33 2015 From: prosfilaes at gmail.com (David Starner) Date: Tue, 21 Jul 2015 23:29:33 +0000 Subject: Security concerns: OGHAM SPACE MARK In-Reply-To: <3E10480FE4510343914E4312AB46E74212B189DC@DEFTHW99EH5MSX.ww902.siemens.net> References: <trinity-dd06ed7d-3dd0-4b5c-826d-0a9b88529134-1437407202845@3capp-webde-bs44> <3E10480FE4510343914E4312AB46E74212B1879D@DEFTHW99EH5MSX.ww902.siemens.net> <CAMZ=zj4qLjGzUkp1HOEeq4UUNmrLgOXME9vHvjQf0zmwUy-E6g@mail.gmail.com> <3E10480FE4510343914E4312AB46E74212B189DC@DEFTHW99EH5MSX.ww902.siemens.net> Message-ID: <CAMZ=zj7mUn911+Xavqr_Tj4HVsfByaXh+tTB+zmEnW9=eoBAfg@mail.gmail.com> On Tue, Jul 21, 2015 at 2:55 PM Dreiheller, Albrecht < albrecht.dreiheller at siemens.com> wrote: > My concern is not about the Ogham space, but about the free usage of non-Ascii in programming languages in general. > Just imagine, when you decide to open a door for public traffic in busy city with a security check point, you wouldn't consider only how to check a single person; instead, you have to consider how you would check thousands of people within one hour, if you don?t plan to close the door again. There is no way to check thousands of people in an hour through a door that's a security check point. That's why few places have security check points. That's comparable; it's very hard to check any significant body of code at any speed, so it's a rare issue. > Therefore, consider a huge software system written developed in, let's say, Serbia or Russia using Cyrillic names throughout for classes and variables. > int ?????? = ???????(?????????); return ??????; Then do what you need to do. Transliterate the Serbian characters, see if it works any differently. The language (in any character set) is going to be a large barrier for a lot of audiences, but that's what it is. > Looking for a deliberate attempt to confuse within this code would be like looking for a needle in a haystack, since every line has non-Ascii in it. Looking for a deliberate attempt to confuse in code is like looking for a needle in a haystack. If those two lines shown in my last post had been hidden in a million line kernel, they would have been rather hard to find, particularly if the kernel wasn't warning-clean. > I used a term "exclusion rules", meaning a ruleset bases on the confusables list. First step probably is implement it as a lint type program. Then discuss it with the compiler writers of the languages you're worried about. As I've said above, I don't see this as a huge concern for most real-life programs, since the attack surface is huge. > With "black-listed" I meant "known to be unsafe" in some way. I.e. Javascript. C. C++. A huge amount of existing and still-in-use code is written in C, whose buffer overruns are a notorious source of security holes. It seems like a much better candidate to be black-listed, if anyone was capable of such. > The fathers of ALGOL and other early languages racked their brain to avoid ambigous semantics caused by poor syntax rules. Published examples of ALGOL 60 are unreadable, and very hard to verify correctness; a modern reader will generally have to start by reformatting the code, and then replacing GOTOs with loops and ifs, and finding better variable names, if they want to know what's going on. We've increased code clarity hugely, but reading large amounts of code is still hard, hard enough that I see stressing about deliberate deception to be a narrow market. This is not something that really needs language support; it can be done in compilers and editors and lint-type programs without that support. > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150721/514472b4/attachment.html> From richard.wordingham at ntlworld.com Tue Jul 21 18:33:34 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Wed, 22 Jul 2015 00:33:34 +0100 Subject: Chinese Word Breaking In-Reply-To: <CAGHjPP+MH6KYwLcC99X5+cxL+S0TzUgt42yMF0ONwNVCk1D7SQ@mail.gmail.com> References: <20150721075633.2e76fcab@JRWUBU2> <CAGHjPP+MH6KYwLcC99X5+cxL+S0TzUgt42yMF0ONwNVCk1D7SQ@mail.gmail.com> Message-ID: <20150722003334.2b5e6b94@JRWUBU2> On Tue, 21 Jul 2015 18:10:14 +0800 gfb hjjhjh <c933103 at gmail.com> wrote: > When you write text in modern Chinese, there will not be any break > between different words, and thus if you segment characters according > to the ideographic characters, what being groupped together would > either be a clausee or a sentence, Or even a whole paragraph if you > are handling some older text without punctuations. I had another look at Chinese word breaking algorithms today and saw that their practical purposes were mostly indexing and machine translation. Consequently, I suspect that authors have little incentive to mark word boundaries in the texts they originate. This differs from the Thai situation where marking word boundaries improves layout and spell-checking. Richard. From charupdate at orange.fr Wed Jul 22 01:38:42 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Wed, 22 Jul 2015 08:38:42 +0200 (CEST) Subject: UTF-8 display (was: Re: a mug) In-Reply-To: <22EDF3DA-7E3A-4861-96E2-9BF43E81DD0B@bluesky.org> References: <1650585311.5804.1437468393388.JavaMail.www@wwinf1f21> <1437482992.20206.YahooMailBasic@web162601.mail.bf1.yahoo.com> <457130915.11820.1437486324730.JavaMail.www@wwinf1n18> <22EDF3DA-7E3A-4861-96E2-9BF43E81DD0B@bluesky.org> Message-ID: <899256744.2177.1437547122576.JavaMail.www@wwinf1f21> On 21 Jul 2015, at 16;00, Tom Gewecke wrote: > The IBM page seems to have an ellipsis character in UTF-8, with bytes E2 80 A6. The web server is set to force all browsers to use the encoding iso-8859-1 regardless of what charset is stipulated in the html code. The browser uses the Win 1252 equivalents and displays ??? > > To see what a web server is forcing, if anything, you can use > > http://web-sniffer.net/ Thank you. So the file i get when saving the page is a modified one. The workaround is then, if I understand well, to let web-sniffer check whether the server is forcing an unconsistent encoding: | Content-Type: text/html;charset=ISO-8859-1 Then save the page... | meta http-equiv="Content-Type" content="text/html; charset=windows-1252" ...and reset the charset to the value shown in the source code: | meta http-equiv="Content-Type" content="text/html; charset=utf-8" Then open this. That's very useful! Have a great day, Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150722/7d46b03f/attachment.html> From c933103 at gmail.com Wed Jul 22 01:46:57 2015 From: c933103 at gmail.com (gfb hjjhjh) Date: Wed, 22 Jul 2015 14:46:57 +0800 Subject: Chinese Word Breaking In-Reply-To: <20150722003334.2b5e6b94@JRWUBU2> References: <20150721075633.2e76fcab@JRWUBU2> <CAGHjPP+MH6KYwLcC99X5+cxL+S0TzUgt42yMF0ONwNVCk1D7SQ@mail.gmail.com> <20150722003334.2b5e6b94@JRWUBU2> Message-ID: <CAGHjPP+956vDSxuhYBH=UYtkrUj0RGgHFtw0U7JHvszOWLkuYw@mail.gmail.com> Pretty much so, and IMO it is actually quite unnatural to write Chinese with marking boundaries for word, and even in cases like machine translation, people would expect the translation engine figure out how characters should be grouped into words on its own without any markup for word boundary or so, just like when you type a sentence into machine translator, you would not expect the machine translator to ask you or show you which part is subject and which part is verb, etc. btw, you might want to look up GB/T 13715 standard from mainland China (PRC) or CNS 14366 standard from Taiwan (ROC) fof some standard that discuss about how to handle word segmentation when processing Chinese with technology. 2015?7?22? ??7:37? "Richard Wordingham" <richard.wordingham at ntlworld.com>??? > On Tue, 21 Jul 2015 18:10:14 +0800 > gfb hjjhjh <c933103 at gmail.com> wrote: > > > When you write text in modern Chinese, there will not be any break > > between different words, and thus if you segment characters according > > to the ideographic characters, what being groupped together would > > either be a clausee or a sentence, Or even a whole paragraph if you > > are handling some older text without punctuations. > > I had another look at Chinese word breaking algorithms today and saw > that their practical purposes were mostly indexing and machine > translation. Consequently, I suspect that authors have little > incentive to mark word boundaries in the texts they originate. This > differs from the Thai situation where marking word boundaries improves > layout and spell-checking. > > Richard. > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150722/a1a771df/attachment.html> From charupdate at orange.fr Wed Jul 22 02:00:38 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Wed, 22 Jul 2015 09:00:38 +0200 (CEST) Subject: Plain text custom fraction input In-Reply-To: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> Message-ID: <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> On 21 Jul 2015, at 18;42, Doug Ewell wrote: > As explained in TUS 7.0, ?6.2 ("General Punctuation"), p. 273, U+2044 > FRACTION SLASH is intended for use with Basic Latin digits, or other > digits with General Category = Nd. The superscript and subscript > presentation forms have General Category = No. That is was bugs me, that this kerning fraction slash is presented to us as to be used with plain digits, that overlap the fraction slash in proportional fonts. That recommendation is inconsistent with plain text encoding. Following TUS, anybody who uses U+2044 must use a fraction formatting feature. I?know this from the time I'd got the valid demo version of some Desktop Publishing software. The feature wasn't flagged by the fraction slash, and on the other hand, the feature worked with the common slash U+002F too. It's a formatting command like superscript or underline. Might anybody explain to us why the font designers of Arial Unicode MS and DejaVu Serif / DejaVu Sans defined the matching glyphs that allow users to compose professionally looking fractions in plain text, without any need of the high-end formatting as specified in TUS? I'm most likely to believe that any proportional font that complies fully to TUS, works the same way. But this fact is hidden in the Standard. I can't believe that Unicode didn't think about this usage. If really it didn't, the invention of the fully operational fraction slash is wholly the merit of the innovative font designers. Why is this invention not being welcomed? This is why I?suggested completing right this section of the Standard. This is also why I finally decided to bring it to the attention of the Mailing List. I hope that a huge majority will allow Unicode to complete this point. Thank you for your feedback. Have a nice day, Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150722/8bb37020/attachment.html> From charupdate at orange.fr Wed Jul 22 02:22:58 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Wed, 22 Jul 2015 09:22:58 +0200 (CEST) Subject: Global apostrophe solution? (Part of: A new take on the English apostrophe in Unicode; Keyman Developer for free?; Input methods at the age of Unicode) Message-ID: <113034332.3118.1437549778585.JavaMail.www@wwinf1f21> On Mon, Jun 15, 2015 at 10:19 AM, Mark Davis ?? wrote: > More seriously, it is not all so black and white. This applies to apostrophe recommendations too. The thread about the English apostrophe was biased because it (I) ended up discussing Unicode?s general apostrophe recommendation, while the scope of the thread was originally limited to one language. And before all, the discussion was somewhat biased by not taking into consideration the following TUS statement (??6.2?Punctuation Apostrophe): | The semantics of U+2019 are therefore context dependent. For example, if surrounded by | letters or digits on both sides, it behaves as an in-text punctuation character and does not | separate words or lines. I may fail, of course, but actually I?m thinking that U+02BC is not needed to prevent word separation.? As U+02BC is missing in most fonts and on all native Latin Windows keyboards, it cannot be used, even as a letter, before we have resolved some problems.? Please see the advice of User:Gholton in the very last paragraph of https://en.wikipedia.org/wiki/Talk:Gwich%27in_people Moreover, if it exists in a given font, U+02BC looks mostly like U+2019, slanted if this is slanted (as in Tahoma, Segoe?UI, Open?Sans, Sakkal?Majalla), and thus does not match some expectations as stated on a web page I already cited: http://www.languagegeek.com/typography/apostrophes.html The only fonts I found where U+02BC is a bit smaller than U+2019, are Linux?Biolinum?G, Gentium?Basic, Gentium?Book?Basic. If this difference of size matches the preferences of English native readers, U+02BC could be preferred in English typography. Another bias of the Apostrophe tread was that it focussed on disambiguation for text processing only, whereas disambiguation is more generally a human readers? issue, which needs to be resolved on a glyphic level. And which comes from far, very far into the past. See again http://www.languagegeek.com/typography/apostrophes.html#Anchor-Potentia-61409 ? the last section, where Potential Problems are resolved. Along with adding some missing information in the Standard about disambiguating quotation quotes and scare quotes, we?ll end up with language-specific recommendations for the apostrophe like for the quotation marks. About the mixup between scare quotes and quotation quotes, there was my last sentence yesterday that contained a lot of quotes looking like scare quotes but that marked quotations. Let?s take this handy example: > I hope that some macro could enable "webmasters" to rapidly update websites, because resolving this "funny" "scenario" has cost me some "effort" today! I?m not going to put webmasters between scare quotes! The quotes in _"webmasters"_ indicate that I?m quoting somebody who?s started talking about webmasters. That goes on with "funny", a word that is often scare-quoted, but here it is simply a quotation from ?Re:?a?mug?, where such kind of phenomena looked rather funny (on a mug), I was told. Again, "scenario" and "effort" are two more quotations from the e-mail I?was responding?to. Straightforward: In English we should take example on the French and German people, who distinguish quotations and scares by using angle quotation marks for the former, comma quotation marks for the latter, even though these are considered as ?English? (I?m quoting) in France, so primarily French typographers are reluctant to use them, generating thus exactly the same irritating mixup where one is often unsure whether the author is serious or not. But serious journalism leads to systematically differenciate ?quotations? and ?scares?. This is common usage in print and web news media products from roughly all publishers. In actual French and German usage, single quotes are nearly unexistent, despite of U+2019 being unambiguously an apostrophe in German.? Primary quotations are always in ?double quotes? (or ?this way?), and a nested close-quote (? or ?) never looks like an apostrophe.? When the goal is to help text reading and text handling, would using angle quotation marks for quotations not be a good idea?? I would add that personally I?consider these marks as more respectful towards authors who are quoted, as well as towards readers who are to understand unambiguously how it?s meant. Eventually there could be different recommendations, so for example, in German, U+2019 is preferred for apostrophe, in French it is, too, and the use of U+2018 should be strongly discouraged, which it should be in English too when U+2019?is preferred for apostrophe; otherwise, following user preferences, U+02BC can be preferred for this, and the use of U+00AB and U+00BB would be preferred for quotations, U+2039 and U+203A for nested quotations, and U+201C - U+201D for markup that does not mean a quotation.? The same sould be recommended for all languages that don?t already differenciate visually the two meanings of quotation marks, because they don't already use angle quotes, or comma quotes. For input, rather than (as I meant) a layout with U+02BC on E00 (because this key is too peripherical for an often used character, and the grave accent is used in TeX), a smart keyboard layout is needed, with an *apostrophe toggle* that allows to get alternately U+0027, U+2019, U+02BC on the same apostrophe key, and another independent or related toggle that makes the < and > keys produce the ? and ? quotes. Such keyboards can be programmed using Keyman?Developer. Keyman uses a powerful language to define flexible layouts including an unlimited number of toggles, which may have more than two states. See http://www.unicode.org/mail-arch/unicode-ml/y2015-m07/0146.html Keyman is the solution for what I expected a keyboard layout to perform, and that is very hard (or even impossible) to obtain with the OS related keyboard drivers as I am programming for Windows. As a keyboard layout framework, I?recommend Keyman. Best regards, Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150722/63aad7a6/attachment.html> From richard.wordingham at ntlworld.com Wed Jul 22 02:52:40 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Wed, 22 Jul 2015 08:52:40 +0100 Subject: Plain text custom fraction input In-Reply-To: <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> Message-ID: <20150722085240.00f61ba2@JRWUBU2> On Wed, 22 Jul 2015 09:00:38 +0200 (CEST) Marcel Schneider <charupdate at orange.fr> wrote: > On 21 Jul 2015, at 18;42, Doug Ewell wrote: > > > As explained in TUS 7.0, ?6.2 ("General Punctuation"), p. 273, > > U+2044 FRACTION SLASH is intended for use with Basic Latin digits, > > or other digits with General Category = Nd. The superscript and > > subscript presentation forms have General Category = No. > > That is was bugs me, that this kerning fraction slash is presented to > us as to be used with plain digits, that overlap the fraction slash > in proportional fonts. That recommendation is inconsistent with plain > text encoding. Following TUS, anybody who uses U+2044 must use a > fraction formatting feature. I?know this from the time I'd got the > valid demo version of some Desktop Publishing software. The feature > wasn't flagged by the fraction slash, and on the other hand, the > feature worked with the common slash U+002F too. It's a formatting > command like superscript or underline. Implementing FRACTION SLASH is fiddly, and formally it is impossible in OpenType - the lookup tables can only cope with a finite range of numerator and denominator lengths. The next problem is what feature to put it under. Microsoft Word is notorious for preventing users from using ligatures in Latin script text, though that restriction has been relaxed. One of the touted capabilities of Microsoft's Universal Script Engine is the rendering of cartouches for Egyptian hieroglyphs. However, the interface specification makes no mention of special handling for them - I can only assume that the capability arises through the enabling of certain features. Egyptian hieroglyphs are currently a simple script - it lacks essential support for writing the script seen on Egyptian monuments. (I'm not entirely sure of the correct bidi classification of the original hieroglyphs - they should probably be weakly right-to-left, not strongly left-to-right. Strong left-to-right may, however, be appropriate for most printed hieroglyphs - I've even seen plain text hieroglyphs running left to right on a page whose primary script is Arabic.) Richard. From tom at bluesky.org Wed Jul 22 04:57:49 2015 From: tom at bluesky.org (Tom Gewecke) Date: Wed, 22 Jul 2015 05:57:49 -0400 Subject: UTF-8 display (was: Re: a mug) In-Reply-To: <899256744.2177.1437547122576.JavaMail.www@wwinf1f21> References: <1650585311.5804.1437468393388.JavaMail.www@wwinf1f21> <1437482992.20206.YahooMailBasic@web162601.mail.bf1.yahoo.com> <457130915.11820.1437486324730.JavaMail.www@wwinf1n18> <22EDF3DA-7E3A-4861-96E2-9BF43E81DD0B@bluesky.org> <899256744.2177.1437547122576.JavaMail.www@wwinf1f21> Message-ID: <3AEE94B6-3D82-477A-8B81-75A184AD1F1E@bluesky.org> Normally you should be able to get correct display in a case like this by just going to the View > Encoding menu of your browser and switching to Unicode UTF-8. On Jul 22, 2015, at 2:38 AM, Marcel Schneider wrote: > The workaround is then, if I understand well, to let web-sniffer check whether the server is forcing an unconsistent encoding: > | Content-Type: text/html;charset=ISO-8859-1 > Then save the page... > | meta http-equiv="Content-Type" content="text/html; charset=windows-1252" > ...and reset the charset to the value shown in the source code: > | meta http-equiv="Content-Type" content="text/html; charset=utf-8" > Then open this. > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150722/62138fca/attachment.html> From charupdate at orange.fr Wed Jul 22 05:21:32 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Wed, 22 Jul 2015 12:21:32 +0200 (CEST) Subject: Plain text custom fraction input In-Reply-To: <20150722085240.00f61ba2@JRWUBU2> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> Message-ID: <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> On 22 Jul 2015, at 09:52, Richard Wordingham wrote: > Implementing FRACTION SLASH is fiddly, and formally it is impossible in > OpenType - the lookup tables can only cope with a finite range > of numerator and denominator lengths. The next problem is what feature > to put it under. Microsoft Word is notorious for preventing users from > using ligatures in Latin script text, though that restriction has been > relaxed. > > One of the touted capabilities of Microsoft's Universal > Script Engine is the rendering of cartouches for Egyptian hieroglyphs. > However, the interface specification makes no mention of special > handling for them - I can only assume that the capability arises > through the enabling of certain features. Egyptian hieroglyphs are > currently a simple script - it lacks essential support for writing the > script seen on Egyptian monuments. (I'm not entirely sure of the > correct bidi classification of the original hieroglyphs - they should > probably be weakly right-to-left, not strongly left-to-right. Strong > left-to-right may, however, be appropriate for most printed hieroglyphs > - I've even seen plain text hieroglyphs running left to right on a page > whose primary script is Arabic.) We never thought of common hieroglyphs otherwise as running LTR, while on monuments the great liberty of the script allows to run in amost all directions. IMO monumental transcription is always difficult to deal with, whenever exact rendering is expected. However, since Unicode's purpose is plain text encoding, we must stick with what I consider as a convention in egyptology... ...which brings us back to plain text fractions, which by an apparent but tacit convention we can input as an *unlimited* string of superscript digits, followed by U+2044, followed by an *unlimited* string of subscript digits. What are you referring to when talking about implementing the fraction slash? The fonts I've tested successfully are OpenType at least as for Arial Unicode MS. The way the fraction slash is actually implemented, was purely a font design issue, which has been brilliantly resolved: 1 - Superscript digits match numerators like they appear in precomposed fractions. 2 - Subscript digits match denominators. 3 - The fraction slash kerns consequently. If this input method is not encouraged, what's the use of U+215F FRACTION NUMERATOR ONE? About ligatures: Replacing ff, fl, ffl with ligatures is typically a rendering engine task, but for backwards compatibility the precomposed ligatures of the Alphabetic Presentation Forms FB00 - FB4F have been encoded in Unicode. What is the relation with plain text fractions, and why do you look out for a feature? The fraction formatting feature I mentioned, becomes right completely useless when users start typing custom fractions in plain text. That is what I suspect to be at the origin of the taboo that seems to be observed about this hint. If you would ask me if I know hieroglyphs, well I'd just started a little bit learning. But I launched this thread only for the purpose of Latin plain text, no feature, no bidi-mirroring, just plain text fractions. The skill, if there is any, is only about how to get supers, subs, and fraction slash at reach on the keyboard. A good solution is to put them in AltGr on the NumPad. So you press Left Ctrl and Left Alt together to get superscripts right on the numpad. Adding Shift, you get the subscripts. Ah, the fraction slash: just press the numpad Divide after the last numerator digit. That works because we can program for the numpad exactly the same shift states as on the alphanumerical block. Don't trust the comment in the C source which prevents us from integrating the numpad into the general allocation table, urging us to "put this last" (quotation). I've got no bug by not following this. Well its still "last", but at the bottom of the big table! Thank you for your feedback. Have a nice afternoon, Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150722/8fbe44a7/attachment.html> From frederic.grosshans at gmail.com Wed Jul 22 05:21:43 2015 From: frederic.grosshans at gmail.com (=?UTF-8?B?RnLDqWTDqXJpYyBHcm9zc2hhbnM=?=) Date: Wed, 22 Jul 2015 12:21:43 +0200 Subject: Machine learning to find the meaning of emojis Message-ID: <55AF6EB7.9070204@gmail.com> The following post, by Instagram engineering team, might be interesting for the people in this list who are interested in the emoji use in the wild. It?s an attempt to algorithmically define teh meaning of emojis as they are used on instagram . http://instagram-engineering.tumblr.com/post/117889701472/emojineering-part-1-machine-learning-for-emoji I find the synonymity of ?? with #fingerscrossed quite funny but understandable. Fr?d?ric PS: Found via ?All Things Linguistic? aka ??? http://allthingslinguistic.com/post/124609017512/emojineering-part-1-machine-learning-for-emoji . By the way, this blog post contains the first emojis in italics I ever saw. From charupdate at orange.fr Wed Jul 22 05:42:30 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Wed, 22 Jul 2015 12:42:30 +0200 (CEST) Subject: UTF-8 display (was: Re: a mug) In-Reply-To: <3AEE94B6-3D82-477A-8B81-75A184AD1F1E@bluesky.org> References: <1650585311.5804.1437468393388.JavaMail.www@wwinf1f21> <1437482992.20206.YahooMailBasic@web162601.mail.bf1.yahoo.com> <457130915.11820.1437486324730.JavaMail.www@wwinf1n18> <22EDF3DA-7E3A-4861-96E2-9BF43E81DD0B@bluesky.org> <899256744.2177.1437547122576.JavaMail.www@wwinf1f21> <3AEE94B6-3D82-477A-8B81-75A184AD1F1E@bluesky.org> Message-ID: <2083521746.6980.1437561751081.JavaMail.www@wwinf1d31> On 22 Jul 2015, at 11:58, Tom Gewecke wrote:? > Normally you should be able to get correct display in a case like this by just going to the View > Encoding menu of your browser and switching to Unicode UTF-8. Indeed. And now Firefox saves the page as UTF-8. Now I found that this concern has already been dealt with at http://superuser.com/questions/765044/how-do-i-view-a-page-with-a-different-character-encoding-in-firefox To quickly look back to the T-shirt of the parent thread http://i1.cpcache.com/product/27297813/utf8_value_tshirt.jpg Perhaps like I and a user on this forum page, people have been puzzled to find "utf-8" in the source of the page and concluded prematurely that it's buggy and hard to deal with... while it's so easy. Thanks a lot for your help! Best regards, Marcel ? > Message du 22/07/15 11:58 > De : "Tom Gewecke" > A : "Marcel Schneider" > Copie ? : "Unicode Public" > Objet : Re: UTF-8 display (was: Re: a mug) > >Normally you should be able to get correct display in a case like this by just going to the View > Encoding menu of your browser and switching to Unicode UTF-8. > > On Jul 22, 2015, at 2:38 AM, Marcel Schneider wrote: > ?The workaround is then, if I understand well, to let web-sniffer check whether the server is forcing an unconsistent encoding: > | Content-Type: text/html;charset=ISO-8859-1 > Then save the page... > | meta http-equiv="Content-Type" content="text/html; charset=windows-1252" > ...and reset the charset to the value shown in the source code: > | meta http-equiv="Content-Type" content="text/html; charset=utf-8" > Then open this. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150722/6ada1071/attachment.html> From khaledhosny at eglug.org Wed Jul 22 08:01:48 2015 From: khaledhosny at eglug.org (Khaled Hosny) Date: Wed, 22 Jul 2015 15:01:48 +0200 Subject: Plain text custom fraction input In-Reply-To: <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> Message-ID: <20150722130143.GA29225@khaled-laptop> On Wed, Jul 22, 2015 at 09:00:38AM +0200, Marcel Schneider wrote: > On 21 Jul 2015, at 18;42, Doug Ewell wrote: > > > As explained in TUS 7.0, ?6.2 ("General Punctuation"), p. 273, U+2044 > > FRACTION SLASH is intended for use with Basic Latin digits, or other > > digits with General Category = Nd. The superscript and subscript > > presentation forms have General Category = No. > > That is was bugs me, that this kerning fraction slash is presented to > us as to be used with plain digits, that overlap the fraction slash in > proportional fonts. That recommendation is inconsistent with plain > text encoding. Following TUS, anybody who uses U+2044 must use a > fraction formatting feature. I?know this from the time I'd got the > valid demo version of some Desktop Publishing software. The feature > wasn't flagged by the fraction slash, and on the other hand, the > feature worked with the common slash U+002F too. It's a formatting > command like superscript or underline. Some layout engines, like HarfBuzz, automatically turn on the required OpenType features for proper fraction rendering when fraction flag is used. If the font has ?numr? and ?dnom? features, HarfBuzz will turn them on for the <digits><fraction slash><digits> sequence. IMHO, that is the most Unicode-compliant approach and other engines should do the same. Regards, Khaled From richard.wordingham at ntlworld.com Wed Jul 22 17:54:02 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Wed, 22 Jul 2015 23:54:02 +0100 Subject: Plain text custom fraction input In-Reply-To: <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> Message-ID: <20150722235402.7770e30a@JRWUBU2> On Wed, 22 Jul 2015 12:21:32 +0200 (CEST) Marcel Schneider <charupdate at orange.fr> wrote: > On 22 Jul 2015, at 09:52, Richard Wordingham wrote: > We never thought of common hieroglyphs otherwise as running LTR, > while on monuments the great liberty of the script allows to run in > amost all directions. IMO monumental transcription is always > difficult to deal with, whenever exact rendering is expected. > However, since Unicode's purpose is plain text encoding, we must > stick with what I consider as a convention in egyptology... Which means that Ancient Egyptian hieroglyphs are unencoded! Their default direction is right-to-left, but that's only the start of the trouble. The encoded hieroglyphs aren't Bidi-mirrored, so if I embed then in a right-to-left override, I should get retrograde characters. Now these aren't totally useless, but at present we seem to need a duplicate set of right-to-left hieroglyphs for unstacked text. There is work in progress to allow normal Egyptological hieroglyphic text. There seems to have been a change in the notion of what the Egyptian scripts are. Hieratic texts are normally printed in hieroglyphs for general study, so it had seemed that it would be legitimate to use a font that rendered a hieratic style rather than a hieroglyphic style. (Some 'hieroglyphs' only occurred in the hieratic style.) The hieratic style is strictly right-to-left, so rendering the text in a hieratic style would not be compliant with Unicode. However, it seems that the hieratic style is now a separate script, so any such rendering would now be doubly non-compliant. > ...which brings us back to plain text fractions, which by an apparent > but tacit convention we can input as an *unlimited* string of > superscript digits, followed by U+2044, followed by an *unlimited* > string of subscript digits. What are you referring to when talking > about implementing the fraction slash? If you are happy with that style, I was wrong, I wasn't being clever enough. In a left to right context, the conversion of digits to the numerator and denominator forms can progress from right to left for the numerator by conditioning on the following character being a fraction slash or converted digit, and similarly from left to right for the denominator. I'm not sure what should happen in right to left contexts. I've a feeling the numerator should come before the denominator, but the bidi algorithm doesn't swap them - it keeps the first number on the left. Note that subscript and superscript digits are only available for those of us who use the Western Arabic digits. However, I believe there is a real problem for the 'nut' style, where the numerator and denominator are separated by a horizontal line - in Western Asia westwards. I'm having trouble finding examples of fractions using Indic scripts - apparently they originally stacked the numerator above the denominator, but I don't know what happens nowadays. <snip> > If this input method is not encouraged, what's the use of U+215F > FRACTION NUMERATOR ONE? It's for temporarily storing a character defined in some other coding standard. Richard. From c933103 at gmail.com Thu Jul 23 00:54:45 2015 From: c933103 at gmail.com (gfb hjjhjh) Date: Thu, 23 Jul 2015 13:54:45 +0800 Subject: Plain text custom fraction input In-Reply-To: <20150722235402.7770e30a@JRWUBU2> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> Message-ID: <CAGHjPPJ2pBCM62eB7upaHX7HLesTY2b3FMURRG45sEG1XombTA@mail.gmail.com> 1. aren't the 'nut' style you said used in daily English too? 2.most of the time I seen fraction within Chinese text are in the 'nut' style. 3. I think standards should noy be written in a way that users or implementers cannot choose their preferred style to represent fractions? 2015?7?23? ??6:58? "Richard Wordingham" <richard.wordingham at ntlworld.com>??? > > ...which brings us back to plain text fractions, which by an apparent > > but tacit convention we can input as an *unlimited* string of > > superscript digits, followed by U+2044, followed by an *unlimited* > > string of subscript digits. What are you referring to when talking > > about implementing the fraction slash? > > If you are happy with that style, I was wrong, I wasn't being clever > enough. In a left to right context, the conversion of digits to the > numerator and denominator forms can progress from right to left for the > numerator by conditioning on the following character being a fraction > slash or converted digit, and similarly from left to right for the > denominator. I'm not sure what should happen in right to left > contexts. I've a feeling the numerator should come before the > denominator, but the bidi algorithm doesn't swap them - it keeps the > first number on the left. Note that subscript and superscript digits > are only available for those of us who use the Western Arabic digits. > > However, I believe there is a real problem for the 'nut' style, where > the numerator and denominator are separated by a horizontal line - in > Western Asia westwards. I'm having trouble finding examples of > fractions using Indic scripts - apparently they originally stacked the > numerator above the denominator, but I don't know what happens nowadays. > > <snip> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150723/0c42aaa9/attachment.html> From richard.wordingham at ntlworld.com Thu Jul 23 01:44:11 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Thu, 23 Jul 2015 07:44:11 +0100 Subject: Plain text custom fraction input In-Reply-To: <CAGHjPPJ2pBCM62eB7upaHX7HLesTY2b3FMURRG45sEG1XombTA@mail.gmail.com> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> <CAGHjPPJ2pBCM62eB7upaHX7HLesTY2b3FMURRG45sEG1XombTA@mail.gmail.com> Message-ID: <20150723074411.5880dd99@JRWUBU2> On Thu, 23 Jul 2015 13:54:45 +0800 gfb hjjhjh <c933103 at gmail.com> wrote: > 1. aren't the 'nut' style you said used in daily English too? > 2.most of the time I seen fraction within Chinese text are in the > 'nut' style. > 3. I think standards should noy be written in a way that users or > implementers cannot choose their preferred style to represent > fractions? The style is left to the rendering system. The problem I see is that the usual shaping instructions in a font cannot handle arbitrarily long numerators and denominators for the nut style. Perhaps I am wrong again. Richard. From haberg-1 at telia.com Thu Jul 23 03:20:47 2015 From: haberg-1 at telia.com (Hans Aberg) Date: Thu, 23 Jul 2015 10:20:47 +0200 Subject: Plain text custom fraction input In-Reply-To: <20150722235402.7770e30a@JRWUBU2> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> Message-ID: <51DE554B-3798-4252-ABE2-2CC73BA89433@telia.com> > On 23 Jul 2015, at 00:54, Richard Wordingham <richard.wordingham at ntlworld.com> wrote: > > On Wed, 22 Jul 2015 12:21:32 +0200 (CEST) > Marcel Schneider <charupdate at orange.fr> wrote: > >> On 22 Jul 2015, at 09:52, Richard Wordingham wrote: > >> We never thought of common hieroglyphs otherwise as running LTR, >> while on monuments the great liberty of the script allows to run in >> amost all directions. IMO monumental transcription is always >> difficult to deal with, whenever exact rendering is expected. >> However, since Unicode's purpose is plain text encoding, we must >> stick with what I consider as a convention in egyptology... > > Which means that Ancient Egyptian hieroglyphs are unencoded! Their > default direction is right-to-left, but that's only the start of the > trouble. The encoded hieroglyphs aren't Bidi-mirrored, so if I embed > then in a right-to-left override, I should get retrograde characters. > Now these aren't totally useless, but at present we seem to need a > duplicate set of right-to-left hieroglyphs for unstacked text. There > is work in progress to allow normal Egyptological hieroglyphic text. Egyptian hieroglyphs are read in the direction the heads are facing. So you need more than an RTL mapping. From charupdate at orange.fr Thu Jul 23 03:25:22 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Thu, 23 Jul 2015 10:25:22 +0200 (CEST) Subject: Plain text custom fraction input In-Reply-To: <20150722130143.GA29225@khaled-laptop> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722130143.GA29225@khaled-laptop> Message-ID: <220949466.4439.1437639922889.JavaMail.www@wwinf1f21> On 22 Jul 2015, at 15:08, Khaled Hosny wrote: > Some layout engines, like HarfBuzz, automatically turn on the required > OpenType features for proper fraction rendering when fraction flag is > used. If the font has ?numr? and ?dnom? features, HarfBuzz will turn > them on for the sequence. IMHO, that is > the most Unicode-compliant approach and other engines should do the > same. I fully agree that every good rendering engine must implement the Unicode fraction scheme. I'm glad to learn that Firefox and LibreOffice use HarfBuzz. Even more, as Richard Wordingham wrote yesterday, this scheme should be transposable on Arabic digits where as he writes, no super- nor subscripts are available. Moreover, uncomplete fonts?for example, ornamental fonts, which sometimes lack super- and subscripts because the user is expected to use the formatting tool (consistently with the ornamental purpose of the font), can be used for fractions thanks to the formatting feature. Using the fraction slash as a formatting flag, considerably lightens the work. Seen from this point of view, the fractions handling as specified by Unicode is the most universal and most reliable way. On the other hand, the harmonization inside the fonts, between super- and subscripts and the numerators and denominators of the precomposed fractions they contain, could be purely esthetical without any idea of using superscripts as numerators, subscripts as denominators. The remaining question would then be: What was the idea when at font design, the fraction slash was given left and right kerning, so that a preceding superscript digit will take exactly the place it has as a part of a precomposed fraction, and a following subscript takes place like if it were a denominator in one of the precomposed fractions? If Unicode really never targeted such a usage and always thought of the fraction slash as a mere formatting flag with some glyph to make the user aware of its presence, this kerning idea was, as I?outlined yesterday, the merit of a caring and innovative font designer. (We should get some testimony, surely a Latin font designer on this List would be glad to share his experience, given that because of the lack of Arabic super- and subscripts in the UCS, IMHO you were not given this peculiar opportunity.) Then it would be ungrateful not to make use of his invention whenever the font complies with this alternate scheme, additionally to its support of the standard scheme. Perhaps should we consider plain text rendering too, because many situations require that all the needed information be given in plain text. Especially in these cases, it could be interesting to be able to enter fractions that look like if they were formatted. However, keyboard layout considerations can lead to not officially recommend this input method, in order not to bug people who will complain not to have super- and subscripts along with the accompanying fraction slash right on their keyboard. Yesterday I explained that this is very easy to enter, at least on Windows (but on Linux too we have AltGr layers on the Numpad, except that these are used for the simple and double arrows like they are engraved following the legacy implementation of the caret commands). With an appropriate Windows keyboard driver, it's enough to hold down the left Ctrl and Alt while typing the numerator on the numpad followed by the numpad slash (a key that in AltGr will produce 0x2044), and adding Shift while ending with the denominator. As I outlined yesterday at this occasion, the default Windows keyboard driver templates contain a warning to prevent developers from adding more characters on the numpad. More precisely, the allocation tables are split according to the number of shift states, and the numpad allocation table contains the least number of shift states among all these split alloc tables. Moreover, a comment says to "put this last", adding some explanation based on internal processes. But experience, at least as it is actually provided on Windows 7, proved that the numpad as well as all other keys can be unified in *one* table containing all shift states (including the Kana shift states, up to Shift + Ctrl + Alt + Kana). This is how I've got the arrows, too. I simply press *all* keys to the left of the spacebar, and I get simple or double arrows (the latter with Shift). So I must hold down Shift with the left little finger, and Ctrl, Fn, Alt, Kana with the four other fingers, while typing on the Numpad. For fractions, it's roughly the same, except that Kana is not to be pressed. This may be somewhat complicated, but I do believe that using character tables for super- and subscripts is a less performative input method. As already outlined yesterday, I fear that much is done to prevent users from getting plainly started with the worktool, in order to keep us prisoners of some high-end software. I do not deny that this software is sometimes or often indispensible at work. But I?do wish that everybody come into the benefit of *all* performative input methods, including those which do not require more than a complete keyboard layout. Thank you for your feedback. Best regards, Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150723/7ae686b7/attachment.html> From moyogo at gmail.com Thu Jul 23 03:48:47 2015 From: moyogo at gmail.com (Denis Jacquerye) Date: Thu, 23 Jul 2015 09:48:47 +0100 Subject: Plain text custom fraction input In-Reply-To: <220949466.4439.1437639922889.JavaMail.www@wwinf1f21> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722130143.GA29225@khaled-laptop> <220949466.4439.1437639922889.JavaMail.www@wwinf1f21> Message-ID: <CAJKta0xVrZ5a7e1DDPf7ZM-s81Y9yo-K2RAh3ECqgP3nZ_OKyA@mail.gmail.com> On Thu, Jul 23, 2015 at 9:25 AM, Marcel Schneider <charupdate at orange.fr> wrote: > > > The remaining question would then be: What was the idea when at font > design, the fraction slash was given left and right kerning, so that a > preceding superscript digit will take exactly the place it has as a part of > a precomposed fraction, and a following subscript takes place like if it > were a denominator in one of the precomposed fractions? > Many font designers do not differentiate between superscript and numerator, subscript and denominator because it?s easier to design glyphs once and can work fine in some cases. In some fonts, the superscript and subscript figures are completely different from the numerators and denominators, or are at different heights, because this is better in some cases. In the end it's a design issue but you cannot expect either behaviour in every font. Using the recommended figures with the fraction slash will not work everywhere or with every font, but abusing the superscript and subscript will not either. -- Denis Moyogo Jacquerye On Thu, Jul 23, 2015 at 9:25 AM, Marcel Schneider <charupdate at orange.fr> wrote: > On 22 Jul 2015, at 15:08, Khaled Hosny <khaledhosny at eglug.org> wrote: > > > Some layout engines, like HarfBuzz, automatically turn on the required > > OpenType features for proper fraction rendering when fraction flag is > > used. If the font has ?numr? and ?dnom? features, HarfBuzz will turn > > them on for the sequence. IMHO, that is > > the most Unicode-compliant approach and other engines should do the > > same. > > > I fully agree that every good rendering engine must implement the Unicode > fraction scheme. I'm glad to learn that Firefox and LibreOffice use > HarfBuzz. Even more, as Richard Wordingham wrote yesterday, this scheme > should be transposable on Arabic digits where as he writes, no super- nor > subscripts are available. Moreover, uncomplete fonts?for example, > ornamental fonts, which sometimes lack super- and subscripts because the > user is expected to use the formatting tool (consistently with the > ornamental purpose of the font), can be used for fractions thanks to the > formatting feature. Using the fraction slash as a formatting flag, > considerably lightens the work. > > Seen from this point of view, the fractions handling as specified by > Unicode is the most universal and most reliable way. On the other hand, the > harmonization inside the fonts, between super- and subscripts and the > numerators and denominators of the precomposed fractions they contain, > could be purely esthetical without any idea of using superscripts as > numerators, subscripts as denominators. > > The remaining question would then be: What was the idea when at font > design, the fraction slash was given left and right kerning, so that a > preceding superscript digit will take exactly the place it has as a part of > a precomposed fraction, and a following subscript takes place like if it > were a denominator in one of the precomposed fractions? If Unicode really > never targeted such a usage and always thought of the fraction slash as a > mere formatting flag with some glyph to make the user aware of its > presence, this kerning idea was, as I outlined yesterday, the merit of a > caring and innovative font designer. (We should get some testimony, surely > a Latin font designer on this List would be glad to share his experience, > given that because of the lack of Arabic super- and subscripts in the UCS, > IMHO you were not given this peculiar opportunity.) Then it would be > ungrateful not to make use of his invention whenever the font complies with > this alternate scheme, additionally to its support of the standard scheme. > > Perhaps should we consider plain text rendering too, because many > situations require that all the needed information be given in plain text. > Especially in these cases, it could be interesting to be able to enter > fractions that look like if they were formatted. However, keyboard layout > considerations can lead to not officially recommend this input method, in > order not to bug people who will complain not to have super- and subscripts > along with the accompanying fraction slash right on their keyboard. > Yesterday I explained that this is very easy to enter, at least on Windows > (but on Linux too we have AltGr layers on the Numpad, except that these are > used for the simple and double arrows like they are engraved following the > legacy implementation of the caret commands). With an appropriate Windows > keyboard driver, it's enough to hold down the left Ctrl and Alt while > typing the numerator on the numpad followed by the numpad slash (a key that > in AltGr will produce 0x2044), and adding Shift while ending with the > denominator. > > As I outlined yesterday at this occasion, the default Windows keyboard > driver templates contain a warning to prevent developers from adding more > characters on the numpad. More precisely, the allocation tables are split > according to the number of shift states, and the numpad allocation table > contains the least number of shift states among all these split alloc > tables. Moreover, a comment says to "put this last", adding some > explanation based on internal processes. But experience, at least as it is > actually provided on Windows 7, proved that the numpad as well as all other > keys can be unified in *one* table containing all shift states (including > the Kana shift states, up to Shift + Ctrl + Alt + Kana). This is how I've > got the arrows, too. I simply press *all* keys to the left of the spacebar, > and I get simple or double arrows (the latter with Shift). So I must hold > down Shift with the left little finger, and Ctrl, Fn, Alt, Kana with the > four other fingers, while typing on the Numpad. For fractions, it's roughly > the same, except that Kana is not to be pressed. This may be somewhat > complicated, but I do believe that using character tables for super- and > subscripts is a less performative input method. > > As already outlined yesterday, I fear that much is done to prevent users > from getting plainly started with the worktool, in order to keep us > prisoners of some high-end software. I do not deny that this software is > sometimes or often indispensible at work. But I do wish that everybody come > into the benefit of *all* performative input methods, including those which > do not require more than a complete keyboard layout. > > Thank you for your feedback. > > Best regards, > > Marcel > -- Denis Moyogo Jacquerye -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150723/235a1af3/attachment.html> From charupdate at orange.fr Thu Jul 23 03:50:11 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Thu, 23 Jul 2015 10:50:11 +0200 (CEST) Subject: Global apostrophe solution? (Part of: A new take on the English apostrophe in Unicode; Keyman Developer for free?; Input methods at the age of Unicode) Message-ID: <973533516.5116.1437641411369.JavaMail.www@wwinf1f21> As I don?t know if the apostrophe issue** has been satisfactorily resolved, I?d like to briefly check that up, making a few statements to agree or not to agree with: 1 - We are all allowed to use U+02BC for the English apostrophe.? U+2019 is only a de facto preference, mainly with respect to end-users and wysiwyg word processing.? Unicode is thus a user-oriented standard.? However we must also take into consideration the font-related issues: U+02BC missing, or varying in shape following different expectations, like in these three sans-serif fonts (tested in LibreOffice): 2 - UAX?#29 is not intended to work fine for English, so English implementations need to be tailored. These two statements are inferred from the Notes at ??4.1.1. This tailoring is however often not completed, as we can deduce from the behavior of word processors applying the UAX?#29 recommendation: | A further complication is the use of the same character as an apostrophe | and as a quotation mark. Therefore leading or trailing apostrophes | are best excluded from the default definition of a word. 3 - As in English, a leading U+2019 is never a quotation mark (as opposed to Scandinavian usage), leading apostrophes should be included in word definition, at the same level as in-word apostrophes.? Only the possessive mark apostrophe would end up to be left out when trailing.? This however is inconsistent, so a complete tailoring of UAX?#29 for English must include algorithms that take a trailing U+2019 as a quote only if preceded by U+2018 within a number of words... but this too is uncomplete. 4 - Conversion of British single quotes to double quotes needs special processing to identify the close-quotes: applying a number of search rules, submitting each instance to the operator for validation.? This routine task is very annoying but remains limited to technicians (editors, typesetters), while the disambiguation of the apostrophe would affect the public on the whole.? As Marc?Davis wrote on Mon, Jun 15, 2015 at 10:19 AM: > In practice, whenever characters are essentially identical?and by that I mean that the overlap between the acceptable glyphs for each character is very high?people will inevitably mix up the characters on entry. So any processing that depends on that distinction is forced to correct the data anyway. Consequently, the introduction of U+02BC in English usage would not produce reliable data. 5 - The use of angle quotation marks for quotations in English (both British and American) would eliminate the apostrophe problem and bring a number of substantial advantages: + Quotations, especially when consisting in single words, are better highlighted and are no longer confusable with the use of scare quotes. + This may result in a move inside the psychological relationship towards quotations and quoting, which could eventually improve the handling of intellectual property.? A certain menace in this domain, due to word processing and internet, has been detected by Roman linguist Raffaele?Simone. + British and American English would use the same quotes convention, so no quotes conversion would be necessary any longer.? This process streamlining could facilitate exchanges, locale barriers being overcome while locales? ?flavour? (I?m quoting, not scaring, here?s my source: http://babelstone.blogspot.fr/2006/03/unicode-character-names-part-2-name-is.html) will be preserved trough word orthography. + Scare quotes would always have the same appearance, inside as well as outside of quotations. Their meaning is independent of quotation, so it seems consistent that they be not affected by their environment. 6 - Additionally, the use of U+0027 could be preferred for highlighting words, a usage found in technical documents like the Unicode documentation.? (However, even the inword apostrophe is in most cases represented by U+0027.) As a result, the use of U+2018 is not needed any longer and should be strongly discouraged, at least in lanquages like English and French, to prevent U+2019 from being used as a quotation mark.? This is far easier and better feasible than completing all fonts with U+02BC, urge users to deal with *two* different but identically looking ?squiggles? (quotation), and track incorrect use. Having then an old and a new quotation marks convention visibly side by side, would probably be less cumbersome than having two apostrophes that look identical in most of the complete fonts but behave differently. 7 - As an input method for angle quotation marks, we can use the autocorrect while waiting that this and nested quotes management is implemented in word processing.? To achieve this, six entries may be required: < ?? ? ?< ? ? ?< ? < >? ? ? ?> ? ? ?> ? > In Microsoft Word (supporting punctuations and symbols as autocorrect triggers), this will result in getting the double quotes with one keystroke, the single quotes (less used) with two keystrokes, and finally the less-than/greater-than signs with three keystrokes. Following user preferences, the latter may be raised, and four entries only would be required: << ? ? ?< ? ? >> ? ? ?> ? ? For a solution working in *all* applications, we can program extended keyboard layouts, notably using Keyman Developer, a software that I see as an important part of Unicode implementation by its easy-to-understand and flexible layout programming, matching expectations that were uttered soon after the first releases of the Unicode Standard. 8 - I (or even: We) still not know why the apostrophe has not been disambiguated with one of the quotation marks, while the hyphen-minus (mentioned in the parent thread) has been (U+2010 vs U+2212).? I?m not sure to buy the argument that ?essential identity? (this is derived quotation, not scaring!) can be deduced from glyphic resemblence.? And indeed it hasn?t been much times in Unicode history, given that the purpose is ?to encode characters, not glyphs.?? The following quotation of TUS has not exactly this meaning:?(??1.3, p.?6) ?the standard defines how characters are interpreted, not how glyphs are rendered?.? In the case of ?that squiggle? '?', TUS doesn?t fully define how it is interpreted, only whether it?s a letter (U+02BC) or a punctuation (U+2019), but *not* whether it?s an apostrophe or a single closing quote, even while the two are essentially different (not in appearance, but in what philosophers called ?essentia?, which is ?the?being?).? They ?are the same in outward form but different in essence.?? To prove that to ourselves, we may look at German usage: single quotes are U+201A and U+2018, apostrophe is U+2019.? If the same principles had been applied, U+201A should have been merged with the comma, because we can?t tell the difference: ?,?,?,?(the 1st, 3rd and 5th are quotation marks).? And here at least, the semantics would have been legible even for computers: leading comma is quote, trailing comma is comma.? The actual apostrophe convention in English is illegible semantics. The curly apostrophe?s misfortune might have been to be encoded at the same time as the curly quote, while the (curly) comma was pre-existent to it?s curly quote counterpart.? Ultimately, the punctuation apostrophe has *not* been encoded in Unicode.? Hence the *original* recommendation to use the letter apostrophe, which is very consistent with English usage.? Even more, we already learned that since 1983, the apostrophe may be considered as the 27th letter of the Latin alphabet: http://unicode.org/pipermail/unicode/2015-June/001914.html 9 - By not encoding the punctuation apostrophe, Unicode could rely upon the typographical tradition, realizing some scale economies and making the Standard more end-user friendly in some way.? This reflects however a tendency that prioritizes the appearance.? In Unicode this tendency is far from being omnipresent, it is surely very marginal in Unicode, and it?s presence is due to the influence of the software industry where that tendency is naturally more widespread, for economical reasons, that is mainly because the demand on users? side has already a component (among others) which handles appearance as a satisfactory good and not asking for more than that a given item looks fine, no matter what?s behind... Actually, as far as the English apostrophe is concerned, the process burden is moved from input to treatment.? Users can enter text without bothering, while on the other side, other people must work hard to fix a number of recurrent problems... Now the goal would be to know if a part of the problem is conveniently resolved, and if there is an agreement on some of the different points listed above.? Ted?Clancy and all who launched and responded the parent thread, are invited to share their feelings and how they see the topic today. Best regards, Marcel ** Note for archive readers:? Please refer to Ted?Clancy?s blogpost and the subsequent discussion: http://www.unicode.org/mail-arch/unicode-ml/y2015-m06/0047.html -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150723/02902414/attachment.html> From charupdate at orange.fr Thu Jul 23 04:11:32 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Thu, 23 Jul 2015 11:11:32 +0200 (CEST) Subject: Plain text custom fraction input In-Reply-To: <CAJKta0xVrZ5a7e1DDPf7ZM-s81Y9yo-K2RAh3ECqgP3nZ_OKyA@mail.gmail.com> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722130143.GA29225@khaled-laptop> <220949466.4439.1437639922889.JavaMail.www@wwinf1f21> <CAJKta0xVrZ5a7e1DDPf7ZM-s81Y9yo-K2RAh3ECqgP3nZ_OKyA@mail.gmail.com> Message-ID: <977808179.5607.1437642693041.JavaMail.www@wwinf1f21> On 23 Jul 2015, at 10;48, Denis Jacquerye wrote: > Many font designers do not differentiate between superscript and numerator, subscript and denominator because it?s easier to design glyphs once and can work fine in some cases. > In some fonts, the superscript and subscript figures are completely different from the numerators and denominators, or are at different heights, because this is better in some cases. > In the end it's a design issue but you cannot expect either behaviour in every font. > Using the recommended figures with the fraction slash will not work everywhere or with every font, but abusing the superscript and subscript will not either. Is it really an abuse, to use the kerning of the fraction slash? Perhaps should we ask from which point of view it is an abuse. The huge majority of designers having built complete fonts, matched all little digits together, as stated. Giving the fraction slash an appropriate kerning would then be a natural reflex. Font designers who did that, won't probably refer to this usage as an abuse. I'm still afraid that this qualification comes from vendors who represent high-end layout software. I'm fully aware however that the plain text input method for fractions does not work with all fonts, and that it requires the use of a font that authorizes this usage. This seems however the standard behavior of complete proportional fonts. I'm curious to see a font which has the superscripts differ from the numerators. I?see it may be useful, and word processors allow to choose the relative size of the superscript and subscript formatted characters, as well as their position. Thank you for this hint. Best regards, Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150723/40cf7c9b/attachment.html> From charupdate at orange.fr Thu Jul 23 04:45:14 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Thu, 23 Jul 2015 11:45:14 +0200 (CEST) Subject: Plain text custom fraction input In-Reply-To: <20150722235402.7770e30a@JRWUBU2> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> Message-ID: <323648769.6397.1437644714380.JavaMail.www@wwinf1f21> On 23 Jul 2015, at 01:06, Richard Wordingham wrote: > On Wed, 22 Jul 2015 12:21:32 +0200 (CEST) > Marcel Schneider wrote: > > > We never thought of common hieroglyphs otherwise as running LTR, > > while on monuments the great liberty of the script allows to run in > > amost all directions. IMO monumental transcription is always > > difficult to deal with, whenever exact rendering is expected. > > However, since Unicode's purpose is plain text encoding, we must > > stick with what I consider as a convention in egyptology... > > Which means that Ancient Egyptian hieroglyphs are unencoded! Their > default direction is right-to-left, Sorry, I didn't know it, I must have forgotten. However, as Hans Aberg notes, they're facing writing direction, I?remember that looking at the writing signs representing living creatures from the side, we can detect writing direction. I don't remember however that we'd to write ancient hieroglyphs from right to left. But one may do it without problems, except if... > but that's only the start of the > trouble. The encoded hieroglyphs aren't Bidi-mirrored, That's really a pity. Hieroglyphs *must* be bidi-mirroring enabled to ensure the plain usefulness of the encoded characters. > so if I embed > then in a right-to-left override, I should get retrograde characters. > Now these aren't totally useless, but at present we seem to need a > duplicate set of right-to-left hieroglyphs for unstacked text. There > is work in progress to allow normal Egyptological hieroglyphic text. > > There seems to have been a change in the notion of what the Egyptian > scripts are. Hieratic texts are normally printed in hieroglyphs for > general study, so it had seemed that it would be legitimate to use a > font that rendered a hieratic style rather than a hieroglyphic style. > (Some 'hieroglyphs' only occurred in the hieratic style.) The > hieratic style is strictly right-to-left, so rendering the text in a > hieratic style would not be compliant with Unicode. However, it seems > that the hieratic style is now a separate script, so any such > rendering would now be doubly non-compliant. > > > ...which brings us back to plain text fractions, which by an apparent > > but tacit convention we can input as an *unlimited* string of > > superscript digits, followed by U+2044, followed by an *unlimited* > > string of subscript digits. What are you referring to when talking > > about implementing the fraction slash? > > If you are happy with that style, I was wrong, I wasn't being clever > enough. It's a matter of practice! I wouldn't bother typing in super- and subscripts if I hadn't them on the keyboard layout :-) > In a left to right context, the conversion of digits to the > numerator and denominator forms can progress from right to left for the > numerator by conditioning on the following character being a fraction > slash or converted digit, and similarly from left to right for the > denominator. I'm not sure what should happen in right to left > contexts. Sorry again, I wasn't really thinking about, even when yesterday I denied bidi-mirroring (I?regretted soon), since the keyboard layout I'm programming is dedicated for use with Latin script. But I believe that the principles are portable to support other scripts, ideally *all* scripts. > I've a feeling the numerator should come before the > denominator, but the bidi algorithm doesn't swap them - it keeps the > first number on the left. Note that subscript and superscript digits > are only available for those of us who use the Western Arabic digits. As I wrote to Khaled Hosny a few moments ago, I understand that fraction formatting is indispensible with Arabic (read: actual Arabic) digits. > > However, I believe there is a real problem for the 'nut' style, where > the numerator and denominator are separated by a horizontal line - in > Western Asia westwards. I'm having trouble finding examples of > fractions using Indic scripts - apparently they originally stacked the > numerator above the denominator, but I don't know what happens nowadays. IMHO it would be hard to input fractions in nut style while using plain text or normal formatting, at the extent that we need the special Maths applications we know, from LibreOffice as far as I am concerned. But that isn't plain text. With the font-supported plain text fraction input as suggested, we can never get nut style, unfortunately. This is inimaginable *in plain text*. > > > > If this input method is not encouraged, what's the use of U+215F > > FRACTION NUMERATOR ONE? > > It's for temporarily storing a character defined in some other coding > standard. It would be interesting to know more about this standard, and what was the use of this character in that standard, which seems to be hard to retrieve. What do you mean by "temporarily", given that Unicode code point allocations are stable? I'm very puzzled. I'd rather think that the inverse value as a "vulgar" fraction is so important that an input facility is provided, intended to be completed with subscript digits. Best regards, Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150723/c6a82178/attachment.html> From frederic.grosshans at gmail.com Thu Jul 23 05:00:06 2015 From: frederic.grosshans at gmail.com (=?windows-1252?Q?Fr=E9d=E9ric_Grosshans?=) Date: Thu, 23 Jul 2015 12:00:06 +0200 Subject: BidiMirrored property and ancient scripts (Was Re: Plain text custom fraction input) In-Reply-To: <20150722235402.7770e30a@JRWUBU2> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> Message-ID: <55B0BB26.5080601@gmail.com> Le 23/07/2015 00:54, Richard Wordingham a ?crit : > Which means that Ancient Egyptian hieroglyphs are unencoded! Their > default direction is right-to-left, but that's only the start of the > trouble. The encoded hieroglyphs aren't Bidi-mirrored, so if I embed > then in a right-to-left override, I should get retrograde characters. The text of the standard say that they should be mirrored in this case. The version 7.0.0. has the following comment on Egyptian hieroglyphs : (p424, p9 of pdf) : ?When left-to-right directionality is overridden to display Egyptian hieroglyphic text right to left, the glyphs should be mirrored from those shown in the code charts.? Similar comments are present for other historic script (Italic, Runic), but also Old North Arabian, which is encoded as RTL but ?Glyphs may be mirrored in lines whenthey have left-to-right directionality?. This kind of implementation at the font level is perfectly possible and is indeed done sometimes (see e.g. Andrew West?s anglo-saxon runic fonts http://babelstone.co.uk/Fonts/AngloSaxon.html). The BidiMirrored property is not adapted in this case because, it is for a few ?characters such as parentheses? (Unicode8.0.0, ?4.7 p180=pf 23 of ch04.pdf), and it is thought for a LTR default : it can in no way consider the case of Old North Arabian. Extending this property for whole scripts would be a lot of work, and should be more than a Y/N property as currently, since it should account for cases where the glyph are 1. always mirrored (Egyptian, Italic, Runic. Greek ?), 2. sometimes mirrored (I have examples of both cases in Latin. North Arabian seems to be in this case too), 3. never mirrored (Han), 4. not exactly mirrored ( like for U+2232 CLOCKWISE CONTOUR INTEGRAL and U+221B CUBE ROOT ) 5. And also when the behaviour under direction change is undefined (I have difficulties to guess what it means to have LTR Arabic or Syriac, or RTL Devanagari. Maybe there are some traditions for some complex scripts, but it makes no sense to invent a uniform behaviour for them) Currently a BidiMirrorred=N can mean anything of the above, and BidiMirrored=Y means (1. or 4.). By the way, I think a comment should be added in the ?4.7 of the standard to clarify that the BidiMirrored property is not intended for cases like hieroglyphs or italic. Fr?d?ric From khaledhosny at eglug.org Thu Jul 23 07:50:59 2015 From: khaledhosny at eglug.org (Khaled Hosny) Date: Thu, 23 Jul 2015 14:50:59 +0200 Subject: Plain text custom fraction input In-Reply-To: <220949466.4439.1437639922889.JavaMail.www@wwinf1f21> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722130143.GA29225@khaled-laptop> <220949466.4439.1437639922889.JavaMail.www@wwinf1f21> Message-ID: <20150723125059.GA26732@khaled-laptop> On Thu, Jul 23, 2015 at 10:25:22AM +0200, Marcel Schneider wrote: > The remaining question would then be: What was the idea when at font > design, the fraction slash was given left and right kerning, so that a > preceding superscript digit will take exactly the place it has as a > part of a precomposed fraction, and a following subscript takes place > like if it were a denominator in one of the precomposed fractions? What says that this kerning is there for super/subscript glyphs, it can be equally (and more likely) be there for the numerator and denominator glyphs. Regards, Khaled From khaledhosny at eglug.org Thu Jul 23 07:59:20 2015 From: khaledhosny at eglug.org (Khaled Hosny) Date: Thu, 23 Jul 2015 14:59:20 +0200 Subject: Plain text custom fraction input In-Reply-To: <20150722235402.7770e30a@JRWUBU2> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> Message-ID: <20150723125920.GC26732@khaled-laptop> On Wed, Jul 22, 2015 at 11:54:02PM +0100, Richard Wordingham wrote: > On Wed, 22 Jul 2015 12:21:32 +0200 (CEST) > Marcel Schneider <charupdate at orange.fr> wrote: > > > On 22 Jul 2015, at 09:52, Richard Wordingham wrote: > > > We never thought of common hieroglyphs otherwise as running LTR, > > while on monuments the great liberty of the script allows to run in > > amost all directions. IMO monumental transcription is always > > difficult to deal with, whenever exact rendering is expected. > > However, since Unicode's purpose is plain text encoding, we must > > stick with what I consider as a convention in egyptology... > > Which means that Ancient Egyptian hieroglyphs are unencoded! Their > default direction is right-to-left, but that's only the start of the > trouble. The encoded hieroglyphs aren't Bidi-mirrored, so if I embed > then in a right-to-left override, I should get retrograde characters. At least in OpenType, you can have mirrored glyphs in the font (which you will need in any case) and use a ?rtlm? feature which should be applied when the text is being typeset right-to-left (naturally or forced). Regards, Khaled From charupdate at orange.fr Thu Jul 23 09:47:58 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Thu, 23 Jul 2015 16:47:58 +0200 (CEST) Subject: Plain text custom fraction input In-Reply-To: <20150723125059.GA26732@khaled-laptop> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722130143.GA29225@khaled-laptop> <220949466.4439.1437639922889.JavaMail.www@wwinf1f21> <20150723125059.GA26732@khaled-laptop> Message-ID: <1182631931.12000.1437662879081.JavaMail.www@wwinf1n18> On 23 Jul 2015, at 14:57, ?Khaled Hosny wrote: > On Thu, Jul 23, 2015 at 10:25:22AM +0200, Marcel Schneider wrote: > > The remaining question would then be: What was the idea when at font > > design, the fraction slash was given left and right kerning, so that a > > preceding superscript digit will take exactly the place it has as a > > part of a precomposed fraction, and a following subscript takes place > > like if it were a denominator in one of the precomposed fractions? > > What says that this kerning is there for super/subscript glyphs, it can > be equally (and more likely) be there for the numerator and denominator > glyphs. You are right, the fraction slash's kerning helps the rendering engine when it's flagged to use the numerators and denominators. I should be able to look inside a font with Western Arabic super- and subscripts and with glyphs for numerator and for denominator, to see whether the numerator glyphs are mapped to the superscript glyphs, and the denominator glyphs to the subscript glyphs. As Denis Jacquerye wrote, this is, if ever, not the case in all fonts, some of them having different glyphs for the two classes. The fraction formatting works also when the slash is not a fraction slash but a common slash. Here too it would be interesting to know whether the slash is then mapped to U+2044, or the rendering engine performs the whole. if the synergy between the fraction slash and the super- and subscripts is purely fortuitous, plain text fraction input would be categorized as a hack, a shortcut which works around the legal process. I would be glad if that weren't true, because I think that the shortest way, if correct, is the best. Again, this short way is practicable only under certain circumstances. Regards, Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150723/f9aa12d3/attachment.html> From doug at ewellic.org Thu Jul 23 11:00:46 2015 From: doug at ewellic.org (Doug Ewell) Date: Thu, 23 Jul 2015 09:00:46 -0700 Subject: Plain text custom fraction input Message-ID: <20150723090046.665a7a7059d7ee80bb4d670165c8327d.1347f3f300.wbe@email03.secureserver.net> Sorry, everyone: > On the other hand, the harmonization inside the fonts, between super- > and subscripts and the numerators and denominators of the precomposed > fractions they contain, could be purely esthetical without any idea of > using superscripts as numerators, subscripts as denominators. [...] > The fraction formatting works also when the slash is not a fraction > slash but a common slash. [...] What you have discovered is that under certain circumstances, with certain fonts, you can get the visual results you want by using characters other than those recommended in the Standard -- by using characters simply because they "look right." This is not plain text encoding, and it is not a matter of Unicode failing to consider a particular usage scenario or failing to "complete" some part of the Standard. It is about having an incomplete understanding of the Unicode Standard. Read, listen, learn. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From kenwhistler at att.net Thu Jul 23 11:23:23 2015 From: kenwhistler at att.net (Ken Whistler) Date: Thu, 23 Jul 2015 09:23:23 -0700 Subject: BidiMirrored property and ancient scripts (Was Re: Plain text custom fraction input) In-Reply-To: <55B0BB26.5080601@gmail.com> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com> Message-ID: <55B114FB.2000000@att.net> On 7/23/2015 3:00 AM, Fr?d?ric Grosshans wrote: > > By the way, I think a comment should be added in the ?4.7 of the > standard to clarify that the BidiMirrored property is not intended for > cases like hieroglyphs or italic. > > This eminently sensible suggestion has been passed along to the Unicode editorial committee for consideration. --Ken From richard.wordingham at ntlworld.com Thu Jul 23 13:42:50 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Thu, 23 Jul 2015 19:42:50 +0100 Subject: BidiMirrored property and ancient scripts (Was Re: Plain text custom fraction input) In-Reply-To: <55B0BB26.5080601@gmail.com> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com> Message-ID: <20150723194250.1cc05710@JRWUBU2> On Thu, 23 Jul 2015 12:00:06 +0200 Fr?d?ric Grosshans <frederic.grosshans at gmail.com> wrote: > Le 23/07/2015 00:54, Richard Wordingham a ?crit : > > Which means that Ancient Egyptian hieroglyphs are unencoded! Their > > default direction is right-to-left, but that's only the start of the > > trouble. The encoded hieroglyphs aren't Bidi-mirrored, so if I > > embed then in a right-to-left override, I should get retrograde > > characters. > The text of the standard say that they should be mirrored in this > case. The version 7.0.0. has the following comment on Egyptian > hieroglyphs : (p424, p9 of pdf) : > > ?When left-to-right directionality is overridden to display > Egyptian hieroglyphic text right to left, the glyphs should be > mirrored from those shown in the code charts.? The UCD may trump the core specification; I'm expecting to be advised not to trust anything in the core specification. > Similar comments are present for other historic script (Italic, > Runic), but also Old North Arabian, which is encoded as RTL but > ?Glyphs may be mirrored in lines whenthey have left-to-right > directionality?. This kind of implementation at the font level is > perfectly possible and is indeed done sometimes (see e.g. Andrew > West?s anglo-saxon runic fonts > http://babelstone.co.uk/Fonts/AngloSaxon.html). > The BidiMirrored property is not adapted in this case because, it is > for a few ?characters such as parentheses? (Unicode8.0.0, ?4.7 > p180=pf 23 of ch04.pdf), and it is thought for a LTR default : it can > in no way consider the case of Old North Arabian. There had been hope until today. > Extending this property for whole scripts would be a lot of work, and > should be more than a Y/N property as currently, since it should > account for cases where the glyph are > > 1. always mirrored (Egyptian, Italic, Runic. Greek ?), > 2. sometimes mirrored (I have examples of both cases in Latin. North > Arabian seems to be in this case too), > 3. never mirrored (Han), > 4. not exactly mirrored ( like for U+2232 CLOCKWISE CONTOUR INTEGRAL > and U+221B CUBE ROOT ) > 5. And also when the behaviour under direction change is undefined (I > have difficulties to guess what it means to have LTR Arabic or > Syriac, or RTL Devanagari. Maybe there are some traditions for > some complex scripts, but it makes no sense to invent a uniform > behaviour for them) > Currently a BidiMirrorred=N can mean anything of the above, and > BidiMirrored=Y means (1. or 4.). To be precise, having reread the Bidi algorithm, in particular L4 and HL6: 1) If resolved directionality is R and Bidi_Mirrored=Yes, mirroring is mandatory. 2) If resolved directionality is L and bidirectional type is not R or AL, mirroring is prohibited. 3) Otherwise, mirroring is optional. It's odd that a font that reverses all the Hebrew letters is compliant with the Unicode standard. So, I was wrong. Not marking hieroglyphs as Bidi_Mirrored didn't stop them being used for Ancient Egyptian in marked up text. > By the way, I think a comment should be added in the ?4.7 of the > standard to clarify that the BidiMirrored property is not intended > for cases like hieroglyphs or italic. That is a stupid and dangerous remark. If the hieroglyphs had had the BidiMirrored property corrected to Yes, one could have had, in plain text, once fonts had caught up: <U+132B9 EGYPTIAN HIEROGLYPH R008> for nt?r in normal left-to-right text <U+202B RIGHT-TO-LEFT EMBEDDING, U+132B9, U+202C POP DIRECTIONAL FORMATTING> for nt?r in retrograde left-to-right text and embed whole paragraphs in <U+202B>...<U+202C> for right-to-left text. Once your remark has been adopted in the Unicode Standard, the only way to get consistently oriented Ancient Egyptian in plain text is to: a) Add a complete set of right-to-left hieroglyphs. b) Add the retrograde hieroglyphs to each set. One hopes that Egyptian Hieroglyphs is the only script for which mirroring or not has meaning. Richard. From richard.wordingham at ntlworld.com Thu Jul 23 15:25:40 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Thu, 23 Jul 2015 21:25:40 +0100 Subject: Plain text custom fraction input In-Reply-To: <323648769.6397.1437644714380.JavaMail.www@wwinf1f21> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> <323648769.6397.1437644714380.JavaMail.www@wwinf1f21> Message-ID: <20150723212540.0d02f7f4@JRWUBU2> On Thu, 23 Jul 2015 11:45:14 +0200 (CEST) Marcel Schneider <charupdate at orange.fr> wrote: > On 23 Jul 2015, at 01:06, Richard Wordingham wrote: > IMHO it would be hard to input fractions in nut style while using > plain text or normal formatting, at the extent that we need the > special Maths applications we know, from LibreOffice as far as I am > concerned. But that isn't plain text. With the font-supported plain > text fraction input as suggested, we can never get nut style, > unfortunately. This is inimaginable *in plain text*. The Unicode does not distinguish 'nut' style and the 'slash'-based style. The problem is entirely one of rendering. A renderer could support the 'nut' style, just as renderers typically support underlining and strike-out with just a few numeric parameters from the font. 'Plain text' just means no formatting commands associated with the text - it doesn't prevent immense quantities of information being taken from a font, but it does prevent specification of which font to use. > > > If this input method is not encouraged, what's the use of U+215F > > > FRACTION NUMERATOR ONE? > > It's for temporarily storing a character defined in some other > > coding standard. > It would be interesting to know more about this standard, and what > was the use of this character in that standard, which seems to be > hard to retrieve. What do you mean by "temporarily", given that > Unicode code point allocations are stable? The idea is that data is read in from an old encoding, manipulated, and written out in the old encoding. For long term use, it would be better to convert the data, though conversion may have to do more than just change the character sequence. You are correct in that the unconverted data may be held as such indefinitely. > I'm very puzzled. I'd > rather think that the inverse value as a "vulgar" fraction is so > important that an input facility is provided, intended to be > completed with subscript digits. The standard answer is that in the Unicode scheme, that sort of capability should belong to the input mechanism. An example is the general refusal to encode new precomposed characters. Indeed, if renderers supported U+2044 (rather than just treating it as an ordinary character), input resources would be better employed supporting the input of U+2044. Richard. From charupdate at orange.fr Fri Jul 24 04:23:59 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Fri, 24 Jul 2015 11:23:59 +0200 (CEST) Subject: Plain text custom fraction input Message-ID: <303748076.8726.1437729839437.JavaMail.www@wwinf1f21> On 23 Jul 2015, at 18:00, Doug Ewell wrote: > What you have discovered ? Alas, I'd better done a search on the internet prior to solliciting some new advice and feedback, with respect to other peoples' time. Indeed I've "discovered" (quotation) that for myself, but as I learned *after* my last reply yesterday, this "new" (scare quotes!) way of input fractions is already a more or less well established practice. Please read the information with my apologies two e-mails later. ? > is that under certain circumstances, with > certain fonts, you can get the visual results you want by using > characters other than those recommended in the Standard -- by using > characters simply because they "look right." This might be the case of the apostrophe too, for which a quotation mark is used for its looking the same. Yesterday I criticized this proceeding when I wrote in the thread ?Global apostrophe solution??: >> This reflects however a tendency that prioritizes the appearance. In Unicode this tendency is far from being omnipresent, it is surely very marginal in Unicode, and it?s presence is due to the influence of the software industry where that tendency is naturally more widespread, for economical reasons, that is mainly because the demand on users? side has already a component (among others) which handles appearance as a satisfactory good and not asking for more than that a given item looks fine, no matter what?s behind... Really I understand now that for the fractions I suggest to do exactly the same: using characters that are intended to be used as superscripts/subscripts, to represent digits that are numerators/denominators, not superscripts/subscripts. From the beginning on, my view was based solely on appearance, and the samples I provided use only one single font. > > This is not plain text encoding, and it is not a matter of Unicode > failing to consider a particular usage scenario or failing to "complete" > some part of the Standard. It is about having an incomplete > understanding of the Unicode Standard. I'm truly far, very far from knowing thoroughly the least part of the Standard, and often I started mailing while the requested information would have been at hand by simply uplooking TUS... About plain text, I simply know for having read it somewhere, that this is the base purpose of Unicode. Representing fractions as U+2044 is known as a compatibility mapping, equally like representing a superscript as , while (I go on checking my knowledge...) representing a precomposed diacriticized letter as is known as a decomposition mapping. The difference between the two ways of getting the same thing is in plain text. With decomposition we stay in plain text, while compatibility mappings need formatting, thus leaving the field of plain text. So in fact, what I'm suggesting for fractions, is to use a decomposition rather than a compat mapping. And to use this decomposition scheme to compose arbitrary fractions without leaving plain text. The problem is, as you point it out, that this is *not* defined in the Standard. Therefore a font can be compliant to the Standard without allowing this usage. That is the case of at least *all* monospaced fonts. By contrast, for example combining diacritics work in *all* Unicode compliant fonts if the decomposition mapping is defined. Overlay combining diacritics however sometimes don't work fine. Their usage is not defined in the Standard for decomposition (precomposed letters with overlay diacritics are not decomposed), *because* they don't work always fine. From this we might infer that plain text custom fraction input is not a part of TUS because it doesn't always work fine. > > Read, listen, learn. Thank you for your answer. I've been given the opportunity of learning a certain amount of things by reading the actual replies and by doing some searches in the Archive. I?confess however that I'm somewhat unprepared. It's very hard for me to work up all that's required within a useful timelap, unfortunately. Best regards, Marcel ? > Message du 23/07/15 18:10 > De : "Doug Ewell" > A : "Unicode Mailing List" > Copie ? : "Marcel Schneider" > Objet : RE: Plain text custom fraction input > > Sorry, everyone: > > > On the other hand, the harmonization inside the fonts, between super- > > and subscripts and the numerators and denominators of the precomposed > > fractions they contain, could be purely esthetical without any idea of > > using superscripts as numerators, subscripts as denominators. [...] > > > The fraction formatting works also when the slash is not a fraction > > slash but a common slash. [...] > > What you have discovered is that under certain circumstances, with > certain fonts, you can get the visual results you want by using > characters other than those recommended in the Standard -- by using > characters simply because they "look right." > > This is not plain text encoding, and it is not a matter of Unicode > failing to consider a particular usage scenario or failing to "complete" > some part of the Standard. It is about having an incomplete > understanding of the Unicode Standard. > > Read, listen, learn. > > -- > Doug Ewell | http://ewellic.org | Thornton, CO ???? > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150724/fcd27956/attachment.html> From charupdate at orange.fr Fri Jul 24 04:28:07 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Fri, 24 Jul 2015 11:28:07 +0200 (CEST) Subject: Plain text custom fraction input Message-ID: <1672445198.8924.1437730087572.JavaMail.www@wwinf1f21> On 23 Jul 2015, at 22;35, Richard Wordingham wrote: > > IMHO it would be hard to input fractions in nut style while using > > plain text or normal formatting, at the extent that we need the > > special Maths applications we know, from LibreOffice as far as I am > > concerned. But that isn't plain text. With the font-supported plain > > text fraction input as suggested, we can never get nut style, > > unfortunately. This is inimaginable *in plain text*. > > The Unicode does not distinguish 'nut' style and the 'slash'-based > style. The problem is entirely one of rendering. A renderer could > support the 'nut' style, just as renderers typically support > underlining and strike-out with just a few numeric parameters from the > font. 'Plain text' just means no formatting commands associated with > the text - it doesn't prevent immense quantities of information being > taken from a font, but it does prevent specification of which font to > use. ? I fully agree, even without knowing much about how a font works, precisely. > > > > > If this input method is not encouraged, what's the use of U+215F > > > > FRACTION NUMERATOR ONE? > > > > It's for temporarily storing a character defined in some other > > > coding standard. > > > It would be interesting to know more about this standard, and what > > was the use of this character in that standard, which seems to be > > hard to retrieve. What do you mean by "temporarily", given that > > Unicode code point allocations are stable? > > The idea is that data is read in from an old encoding, manipulated, and > written out in the old encoding. For long term use, it would be > better to convert the data, though conversion may have to do more than > just change the character sequence. You are correct in that the > unconverted data may be held as such indefinitely. ? Indeed Unicode was forced to encode a number of characters for the unique reason that these characters are a part of preceding standards with which backwards compatibility is to be ensured. That's the case for example of U+0149. This character looks a bit different when input as recommended and specified in the compatibility mapping, with letter apostrophe. This is more distant from the letter than the apostrophe as a part of the all-in-one apostrophe-en glyph. But that's a font issue, not a Unicode concern. As for the fraction numerator one, I'm still unsure about how it was used in this old standard. Perhaps subscripts were used to complete, so the plain text custom fraction input we're discussing would be compliant to this legacy standard. That's very interesting. > > > I'm very puzzled. I'd > > rather think that the inverse value as a "vulgar" fraction is so > > important that an input facility is provided, intended to be > > completed with subscript digits. > > The standard answer is that in the Unicode scheme, that sort of > capability should belong to the input mechanism. An example is > the general refusal to encode new precomposed characters. Indeed, if > renderers supported U+2044 (rather than just treating it as an ordinary > character), input resources would be better employed supporting the > input of U+2044. ? I don't deny the usefulness of automatized fraction formatting following the detection of the presence of U+2044. Encoding any *new* precomposed characters or *new* characters that can be obtained by formatting some existing ones, is useless and resource-wasting. This is why it is refused. By contrast, plain text custom fractions input wholly relies on existing and largely implemented characters, by combining them in a "new" way. This time the quotes are scare quotes, because as I learned *after* my last reply yesterday, this is already a more or less well established practice. Please read the information with my apologies in the next e-mail. ? Best regards, ? Marcel ? > Message du 23/07/15 22:35 > De : "Richard Wordingham" > A : "Unicode Mailing List" > Copie ? : > Objet : Re: Plain text custom fraction input > > On Thu, 23 Jul 2015 11:45:14 +0200 (CEST) > Marcel Schneider wrote: > > > On 23 Jul 2015, at 01:06, Richard Wordingham wrote: > > > IMHO it would be hard to input fractions in nut style while using > > plain text or normal formatting, at the extent that we need the > > special Maths applications we know, from LibreOffice as far as I am > > concerned. But that isn't plain text. With the font-supported plain > > text fraction input as suggested, we can never get nut style, > > unfortunately. This is inimaginable *in plain text*. > > The Unicode does not distinguish 'nut' style and the 'slash'-based > style. The problem is entirely one of rendering. A renderer could > support the 'nut' style, just as renderers typically support > underlining and strike-out with just a few numeric parameters from the > font. 'Plain text' just means no formatting commands associated with > the text - it doesn't prevent immense quantities of information being > taken from a font, but it does prevent specification of which font to > use. > > > > > If this input method is not encouraged, what's the use of U+215F > > > > FRACTION NUMERATOR ONE? > > > > It's for temporarily storing a character defined in some other > > > coding standard. > > > It would be interesting to know more about this standard, and what > > was the use of this character in that standard, which seems to be > > hard to retrieve. What do you mean by "temporarily", given that > > Unicode code point allocations are stable? > > The idea is that data is read in from an old encoding, manipulated, and > written out in the old encoding. For long term use, it would be > better to convert the data, though conversion may have to do more than > just change the character sequence. You are correct in that the > unconverted data may be held as such indefinitely. > > > I'm very puzzled. I'd > > rather think that the inverse value as a "vulgar" fraction is so > > important that an input facility is provided, intended to be > > completed with subscript digits. > > The standard answer is that in the Unicode scheme, that sort of > capability should belong to the input mechanism. An example is > the general refusal to encode new precomposed characters. Indeed, if > renderers supported U+2044 (rather than just treating it as an ordinary > character), input resources would be better employed supporting the > input of U+2044. > > Richard. > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150724/b91ee933/attachment.html> From charupdate at orange.fr Fri Jul 24 04:33:46 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Fri, 24 Jul 2015 11:33:46 +0200 (CEST) Subject: Plain text custom fraction input Message-ID: <1326602911.9053.1437730426498.JavaMail.www@wwinf1f21> The Plain text custom fraction input issue IMHO has so far been resolved at a certain level and to some extent.? It?s a bit complicated for me to explain.? As you already know, I?m still lacking the reflex of doing first a search on the internet.? Only after my last yesterday?s e-mail I did and was given the link to a Microsoft?Community wiki: http://answers.microsoft.com/en-us/office/wiki/office_2013_release-word/styled-fractions-in-windows/4a07d5fa-2484-4e39-b1f3-70bb3eb0c332 where we find some information written up for Microsoft?Office users about the input of fractions using Unicode super- and subscripts along with the fraction slash.? For practice, very detailed step-by-step instructions show how to use the Special Characters dialog for this purpose as well as how to program in VBA the addition of a huge set of autocorrect entries, so that the user does not need to do more than to type a digits-slash-digits sequence to get it converted to a plain text fraction.? Macros are provided for download. >From there on I?understood fortunately that Microsoft must really be one of the most user friendly IT?companies, given that it allows people to publish on its websites very detailed information about how to get ?styled fractions? [I'm now using the angle quotation marks, instead of mentioning that this is a quotation to make sure that nobody reads a submeaning in the quotes; please see my suggestion in the thread ?Global apostrophe solution??], well, how to get ?styled fractions? without using any formatting feature, just in Unicode-enriched plain text (by what I mean plain text using Unicode characters without any restriction), using fonts that wholly implement Unicode *and* are proportional (which point seems not to be specially mentioned). By this search, I found also another page, where a cheerful Lady presents to the users of a given software not less than five methods of formatting digit-slash-digit sequences as fractions, but not mentioning by a single word the plain text input method.? As my goal is not to blame marketing strategies?and even less, to criticize the work of anyone who cares for the instruction and edification of the users?but to enhance user experience, neither the URL, nor the product name, nor the keywords nor the name of the search engine are disclosed here. I?m very sorry to bring this information so late, after?not before?solliciting feedback from the List Subscribers, whom I thank for their kind replies and the many pieces of information I?would not have got aware of by just doing a search on the internet. But honestly it would have been correct to start the thread by bringing in *all* the information that can be at my reach. My apologies... ??? To complete this thread on fraction input, part of ?Input methods at the age of Unicode?, I?d like to mention one more way of using the keyboard.? As far as I understand, smart keyboard frameworks, of whom the only one I know is Keyman, allow to automatize what in Windows keyboard drivers is changing the shift state three times.? Along with all other useful toggles we can implement and figure out, Keyman lets us create what I'm calling a Fraction toggle.? Once the Fraction flag set, the layout converts all digits to superscripts, and the slash to U+2044.? The slash then sets the layout to another state where all digits are converted to subscripts, and typing a non-digit character would then set the keyboard back to its normal state. I recall that this works in plain text, like this: ???, ???, ??????????????????????.? The font must contain the complete range of super- and subscripts (which it does normally when the fraction slash is present).? In fonts that have different glyphs for numerator/denominator and for superscript/subscript, the use of the precomposed fractions is discouraged for harmony and consistency if plain text custom fractions are input in the same document. Font designers who have created superscript and subscript digits glyphs in OpenType fonts, are welcome to unveil the relationship between these and the numerator/denominator glyphs.? Developers who have programmed a fraction formatting feature in a rendering engine, are equally welcome to share how the common slash is given the slant of a fraction slash. Best regards, Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150724/8bb55698/attachment.html> From charupdate at orange.fr Fri Jul 24 04:53:28 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Fri, 24 Jul 2015 11:53:28 +0200 (CEST) Subject: Plain text custom fraction input Message-ID: <2028339936.9524.1437731608194.JavaMail.www@wwinf1f21> Sorry, I'd forgotten to add two Addressees who had responded on this thread. ? ? The Plain text custom fraction input issue IMHO has so far been resolved at a certain level and to some extent.? It?s a bit complicated for me to explain.? As you already know, I?m still lacking the reflex of doing first a search on the internet.? Only after my last yesterday?s e-mail I did and was given the link to a Microsoft?Community wiki: http://answers.microsoft.com/en-us/office/wiki/office_2013_release-word/styled-fractions-in-windows/4a07d5fa-2484-4e39-b1f3-70bb3eb0c332 where we find some information written up for Microsoft?Office users about the input of fractions using Unicode super- and subscripts along with the fraction slash.? For practice, very detailed step-by-step instructions show how to use the Special Characters dialog for this purpose as well as how to program in VBA the addition of a huge set of autocorrect entries, so that the user does not need to do more than to type a digits-slash-digits sequence to get it converted to a plain text fraction.? Macros are provided for download. >From there on I?understood fortunately that Microsoft must really be one of the most user friendly IT?companies, given that it allows people to publish on its websites very detailed information about how to get ?styled fractions? [I'm now using the angle quotation marks, instead of mentioning that this is a quotation to make sure that nobody reads a submeaning in the quotes; please see my suggestion in the thread ?Global apostrophe solution??], well, how to get ?styled fractions? without using any formatting feature, just in Unicode-enriched plain text (by what I mean plain text using Unicode characters without any restriction), using fonts that wholly implement Unicode *and* are proportional (which point seems not to be specially mentioned). By this search, I found also another page, where a cheerful Lady presents to the users of a given software not less than five methods of formatting digit-slash-digit sequences as fractions, but not mentioning by a single word the plain text input method.? As my goal is not to blame marketing strategies?and even less, to criticize the work of anyone who cares for the instruction and edification of the users?but to enhance user experience, neither the URL, nor the product name, nor the keywords nor the name of the search engine are disclosed here. I?m very sorry to bring this information so late, after?not before?solliciting feedback from the List Subscribers, whom I thank for their kind replies and the many pieces of information I?would not have got aware of by just doing a search on the internet. But honestly it would have been correct to start the thread by bringing in *all* the information that can be at my reach. My apologies... ??? To complete this thread on fraction input, part of ?Input methods at the age of Unicode?, I?d like to mention one more way of using the keyboard.? As far as I understand, smart keyboard frameworks, of whom the only one I know is Keyman, allow to automatize what in Windows keyboard drivers is changing the shift state three times.? Along with all other useful toggles we can implement and figure out, Keyman lets us create what I'm calling a Fraction toggle.? Once the Fraction flag set, the layout converts all digits to superscripts, and the slash to U+2044.? The slash then sets the layout to another state where all digits are converted to subscripts, and typing a non-digit character would then set the keyboard back to its normal state. I recall that this works in plain text, like this: ???, ???, ??????????????????????.? The font must contain the complete range of super- and subscripts (which it does normally when the fraction slash is present).? In fonts that have different glyphs for numerator/denominator and for superscript/subscript, the use of the precomposed fractions is discouraged for harmony and consistency if plain text custom fractions are input in the same document. Font designers who have created superscript and subscript digits glyphs in OpenType fonts, are welcome to unveil the relationship between these and the numerator/denominator glyphs.? Developers who have programmed a fraction formatting feature in a rendering engine, are equally welcome to share how the common slash is given the slant of a fraction slash. Best regards, Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150724/2fcb9a1d/attachment.html> From frederic.grosshans at gmail.com Fri Jul 24 04:59:23 2015 From: frederic.grosshans at gmail.com (=?UTF-8?B?RnLDqWTDqXJpYyBHcm9zc2hhbnM=?=) Date: Fri, 24 Jul 2015 11:59:23 +0200 Subject: BidiMirrored property and ancient scripts (Was Re: Plain text custom fraction input) In-Reply-To: <20150723194250.1cc05710@JRWUBU2> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com> <20150723194250.1cc05710@JRWUBU2> Message-ID: <55B20C7B.5020000@gmail.com> Le 23/07/2015 20:42, Richard Wordingham a ?crit : > On Thu, 23 Jul 2015 12:00:06 +0200 > Fr?d?ric Grosshans <frederic.grosshans at gmail.com> wrote: > >> Le 23/07/2015 00:54, Richard Wordingham a ?crit : >>> Which means that Ancient Egyptian hieroglyphs are unencoded! Their >>> default direction is right-to-left, but that's only the start of the >>> trouble. The encoded hieroglyphs aren't Bidi-mirrored, so if I >>> embed then in a right-to-left override, I should get retrograde >>> characters. >> The text of the standard say that they should be mirrored in this >> case. The version 7.0.0. has the following comment on Egyptian >> hieroglyphs : (p424, p9 of pdf) : >> >> ?When left-to-right directionality is overridden to display >> Egyptian hieroglyphic text right to left, the glyphs should be >> mirrored from those shown in the code charts.? > The UCD may trump the core specification; I'm expecting to be advised > not to trust anything in the core specification. Would I be wrong in saying that ?which trumps which ?? is a short term question. However, in the long term, a disgareement between the UCD and the core specification is either a bug to be corrected or a misunderstanding to be clarified. >> Similar comments are present for other historic script (Italic, >> Runic), but also Old North Arabian, which is encoded as RTL but >> ?Glyphs may be mirrored in lines whenthey have left-to-right >> directionality?. This kind of implementation at the font level is >> perfectly possible and is indeed done sometimes (see e.g. Andrew >> West?s anglo-saxon runic fonts >> http://babelstone.co.uk/Fonts/AngloSaxon.html). >> The BidiMirrored property is not adapted in this case because, it is >> for a few ?characters such as parentheses? (Unicode8.0.0, ?4.7 >> p180=pf 23 of ch04.pdf), and it is thought for a LTR default : it can >> in no way consider the case of Old North Arabian. > There had been hope until today. Well there is still hope, if the BidiMirrored property is amended or supplemented with another mechanism. What I meant is ?The current Y/N values of BidiMirrored cannot be used for mirroring scripts which are RTL by default, and at lest one such script exists in Unicode 7.0.0? >> Extending this property for whole scripts would be a lot of work, and >> should be more than a Y/N property as currently, [...] > >> Currently a BidiMirrorred=N can mean anything of the above, and >> BidiMirrored=Y means (1. or 4.). > To be precise, having reread the Bidi algorithm, in particular L4 and > HL6: > > 1) If resolved directionality is R and Bidi_Mirrored=Yes, > mirroring is mandatory. > > 2) If resolved directionality is L and bidirectional type is not R > or AL, mirroring is prohibited. > > 3) Otherwise, mirroring is optional. Thanks for the check. > > It's odd that a font that reverses all the Hebrew letters is compliant > with the Unicode standard. Indeed ! >> he way, I think a comment should be added in the ?4.7 of the >> standard to clarify that the BidiMirrored property is not intended >> for cases like hieroglyphs or italic. > That is a stupid and dangerous remark. My remark was on the BidiMirrored property itself, it was not intended to mean ?mirroring of ancient script is forbidden?. I wanted to say ?Don?t trust the BidiMirrored=N for ancient script : it does not mean that they should not be mirrored.? > If the hieroglyphs had had the BidiMirrored property corrected to Yes, > one could have had, in plain text, once fonts had caught up: [...] Agreed. But you don?t need to have the BidiMirrored property to let the font catch up: Andrew West?s anglo-saxon runic font behave correctly when mirrored, and are Unicode compliant. > Once your remark has been adopted in the Unicode Standard, the only > way to get consistently oriented Ancient Egyptian in plain text is to: > > a) Add a complete set of right-to-left hieroglyphs. > b) Add the retrograde hieroglyphs to each set. That would be a very bad idea ! > One hopes that Egyptian Hieroglyphs is the only script for which > mirroring or not has meaning. You also have mirroring in Italic, Runic, Old North Arabian and probably many other scripts. Let me rephrase my remark in a less ?stupid and dangerous? way. If a LTR character has the BidiMirrored=No property, it may either be mirrored or not when typeset in RTL, depending on other factors. Specifically, the BidiMirrored property has not been specified for ancient LTR scripts which are mirrored when RTL or boustrephodon, like Italic, Runic, Archaic Greek, Archaic Latin, Egyptian Hieroglyphs. Note that some RTL script, like Old North Arabian, are mirrored when LTR. Is that better ? Once again, I agree that forbidding ancient Egyptian to be mirrored when ?stupid and dangerous? I (maybe naively) thought that the BidiMirrored=No property for hieroglyphs, runes, etc. in the UCD was volunteer. If it was not, do you think that the unicode consortium would consider some (if not all) of the following actions : * accepting proposals to ?BidiMirror? relevant ancient scripts with no modern usage * changing the BidiAlgorithm and BidiMirrored property (or BidiMirroredv2) to take into account Mirrored RTL scripts * Distinguish between ?never mirrored? caracters (Han), and ?Sometimes mirrored, unknown mirrored? (Latin? Most Indic ? Cyrillic ?) * Look into the security implication of all this for modern scripts Of course, all that is a non negligible work. Fr?d?ric From richard.wordingham at ntlworld.com Fri Jul 24 11:17:28 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Fri, 24 Jul 2015 17:17:28 +0100 Subject: BidiMirrored property and ancient scripts (Was Re: Plain text custom fraction input) In-Reply-To: <20150723194250.1cc05710@JRWUBU2> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com> <20150723194250.1cc05710@JRWUBU2> Message-ID: <20150724171728.287a5e32@JRWUBU2> On Thu, 23 Jul 2015 19:42:50 +0100 Richard Wordingham <richard.wordingham at ntlworld.com> wrote: > If the hieroglyphs had had the BidiMirrored property corrected to Yes, > one could have had, in plain text, once fonts had caught up: > > <U+132B9 EGYPTIAN HIEROGLYPH R008> for nt?r in normal left-to-right > text <U+202B RIGHT-TO-LEFT EMBEDDING, U+132B9, U+202C POP DIRECTIONAL > FORMATTING> for nt?r in retrograde left-to-right text > > and embed whole paragraphs in <U+202B>...<U+202C> for right-to-left > text. Correction: Use U+202E RIGHT-TO-LEFT OVERRIDE, not U+202B! Richard. From kenwhistler at att.net Fri Jul 24 11:28:05 2015 From: kenwhistler at att.net (Ken Whistler) Date: Fri, 24 Jul 2015 09:28:05 -0700 Subject: BidiMirrored property and ancient scripts (Was Re: Plain text custom fraction input) In-Reply-To: <55B20C7B.5020000@gmail.com> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com> <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com> Message-ID: <55B26795.3020209@att.net> On 7/24/2015 2:59 AM, Fr?d?ric Grosshans wrote: > > > Is that better ? Once again, I agree that forbidding ancient Egyptian > to be mirrored when ?stupid and dangerous? I can see that this thread seems to have gone off the rails a bit. The Unicode Standard does not forbid Egyptian hieroglyphs from being "mirrored" in a RTL layout context. The Unicode Bidirectional Algorithm neither requires nor forbids that. It is simply out of scope. First there is a general issue of general mirroring of body text for some ancient scripts, which in paleographic contexts often followed conventions (no longer seen, except in rare edge cases) of having the direction of glyph orientation switch depending on line orientation. This is particularly noted in epigraphic contexts in ancient scripts of the greater Mediterranean area, but also occurs occasionally elsewhere. This general mirroring of body text is *not* part of Unicode plain text. There are no UCD properties defined for this, normative or informative, with either granularity at the per-character basis or the per-script basis. And there is no algorithm defined in the Unicode Standard to deal with this issue of paleography. Note that for the most part, this general mirroring is not a *bi*directional problem at all. It is a dextroverse versus sinistroverse layout issue, as nearly all of this kind of epigraphic text does not occur in *bi*directional contexts at all -- but rather in text where everything goes one direction. (Lest the nitpickers immediately cite boustrophedon -- boustrophedon is *also* not *bi*directional text -- it is a convention that alternates dextroverse lines with sinistroverse lines, but does not mix directions on single lines.) Then there is the *specific* issue of bidirectional mirroring. That is *different*. It is a normative part of the Unicode Bidirectional Algorithm, it is controlled in applicability by specific rules and by exact specification of the set of characters that have the Bidi_Mirrored=Y property in the UCD. That property applies to all paired brackets (except 2 Arabic ornate parentheses, for legacy reasons) and a set of non-symmetric mathematical operators (but not to arrow symbols). The applicability of bidirectional mirroring is mandatory and required by the Unicode Bidirectional Algorithm, and is essential in the layout of *modern* text, because of the very general problem of the interpretation of opening and closing for directionally oriented brackets occurring in pairs, in text where mixed directional runs may occur together on the same line of text. These two concerns are *not* the same and should not be confused. They are, however, commonly confused, because they both involve "mirroring" of glyphs and have something to do with line layout direction. > > I (maybe naively) thought that the BidiMirrored=No property for > hieroglyphs, runes, etc. in the UCD was volunteer. It is not "volunteer". It is out of scope. > If it was not, do you think that the unicode consortium would consider > some (if not all) of the following actions : > > * accepting proposals to ?BidiMirror? relevant ancient scripts with > no modern usage This will not happen. > * changing the BidiAlgorithm and BidiMirrored property (or > BidiMirroredv2) to take into account Mirrored RTL scripts This will not happen. > * Distinguish between ?never mirrored? caracters (Han), and ?Sometimes > mirrored, unknown mirrored? (Latin? Most Indic ? Cyrillic ?) That is an issue for how to deal with the paleographic issues of reversed direction body text. People can certainly head down that direction and create databases of information about which scripts do this, in which contexts and time periods. But it is completely out of scope for the UBA. Note that even in scripts that have this behavior paleographically, the occurrence of RTL versus LTR versions may differ statistically over time and eventually die out in favor of one direction or the other. See Old Italic. For that matter, see ancient Greek, which had RTL, LTR, and boustrophedon, but which eventually settled on strictly LTR layout. --Ken > > From asmusf at ix.netcom.com Fri Jul 24 12:09:18 2015 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Fri, 24 Jul 2015 10:09:18 -0700 Subject: BidiMirrored property and ancient scripts (Was Re: Plain text custom fraction input) In-Reply-To: <55B20C7B.5020000@gmail.com> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com> <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com> Message-ID: <55B2713E.4030006@ix.netcom.com> On 7/24/2015 2:59 AM, Fr?d?ric Grosshans wrote: > Let me rephrase my remark in a less ?stupid and dangerous? way. > > If a LTR character has the BidiMirrored=No property, it may either > be mirrored or not when typeset in RTL, depending on other factors. > Specifically, the BidiMirrored property has not been specified for > ancient LTR scripts which are mirrored when RTL or boustrephodon, > like Italic, Runic, Archaic Greek, Archaic Latin, Egyptian > Hieroglyphs. Note that some RTL script, like Old North Arabian, are > mirrored when LTR. We do want "BidiMirrorred=No" to be honored; for example for the arrows and the ornate parens. And we do not want that to be overridden The issue with the ancient scripts (or any script used to capture paleographic texts) seems to be primarily with letter shapes, not punctuation, and further would apply only to unpaired forms. A carefully written note would keep in scope all paired characters. It would be nice if there was a property that covered them, but I'm afraid that BidiMirroringGlyph does not cover the character pairs to use when BidiMirrored=No and code points need to be substituted to get the RTL layout correct. That kind of property would be useful for modern text, e.g. to allow support for automatic re-layout from RTL to LTR and vice versa for texts containing arrows. Declaring all unpaired code points overridable "in certain contexts" or "depending on other factors" might then work. A./ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150724/373e7a80/attachment.html> From doug at ewellic.org Fri Jul 24 14:13:28 2015 From: doug at ewellic.org (Doug Ewell) Date: Fri, 24 Jul 2015 12:13:28 -0700 Subject: Plain text custom fraction input Message-ID: <20150724121328.665a7a7059d7ee80bb4d670165c8327d.f4e5f76ddd.wbe@email03.secureserver.net> Marcel Schneider <charupdate at orange dot fr> wrote: > Representing fractions as U+2044 is known as a compatibility mapping, > equally like representing a superscript as , while (I go on checking > my knowledge...) representing a precomposed diacriticized letter as is > known as a decomposition mapping. The difference between the two ways > of getting the same thing is in plain text. With decomposition we stay > in plain text, while compatibility mappings need formatting, thus > leaving the field of plain text. It's not a matter of one being plain text and the other not. Read Section 3.7, "Decomposition" [1] to learn about canonical and compatibility decomposition. In general, the Glossary [2] and FAQ [3] are useful resources. [1] http://www.unicode.org/versions/Unicode7.0.0/ch03.pdf#G729 [2] http://www.unicode.org/glossary/ [3] http://www.unicode.org/faq/ -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From richard.wordingham at ntlworld.com Fri Jul 24 14:29:58 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Fri, 24 Jul 2015 20:29:58 +0100 Subject: BidiMirrored property and ancient scripts (Was Re: Plain text custom fraction input) In-Reply-To: <55B26795.3020209@att.net> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com> <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com> <55B26795.3020209@att.net> Message-ID: <20150724202958.5bf3399f@JRWUBU2> On Fri, 24 Jul 2015 09:28:05 -0700 Ken Whistler <kenwhistler at att.net> wrote: > First there is a general issue of general mirroring of body text for > some ancient scripts, which in paleographic contexts often followed > conventions (no longer seen, except in rare edge cases) of having the > direction of glyph orientation switch depending on line orientation. Direction switching is commonplace in didactic text for Ancient Egyptian in modern texts. Right-to-left text is also natural when showing how to normalise hieratic or demotic to hieroglyphs. > It is a dextroverse versus > sinistroverse layout issue, as nearly all of this kind of epigraphic > text does not occur in *bi*directional contexts at all -- but rather > in text where everything goes one direction. Remember that parentheses in pure Arabic or Hebrew text without numbers are also mirrored. The same would apply for N'ko, where numbers are also right-to-left. Please remind us of the purpose of RLO and LRO. Are you suggesting that their use may be 'out of scope' in some contexts? Recall Bidi rule L4: "A character is depicted by a mirrored glyph if and only if (a) the resolved directionality of that character is R, and (b) the Bidi_Mirrored property value of that character is Yes. The Bidi_Mirrored property is defined by Section 4.7, Bidi Mirrored of [Unicode]; the property values are specified in [UCD]. This rule can be overridden in certain cases; see HL6." The higher-level protocols are beyond the control of a supplier of plain text. It is not good that they may be kept secret from the user displaying the text, as would often be the case defined by a protocol that says that the font automatically selected defines the mirroring or not. > Note that even in scripts that have this > behavior paleographically, the occurrence of RTL versus LTR versions > may differ statistically over time and eventually die out in favor > of one direction or the other. See Old Italic. For that matter, > see ancient Greek, which had RTL, LTR, and boustrophedon, but > which eventually settled on strictly LTR layout. The question is about controlling mirroring when the 'abnormal' direction (largely as defined by the UCD) is used, not whether it is used. Richard. From richard.wordingham at ntlworld.com Fri Jul 24 15:23:52 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Fri, 24 Jul 2015 21:23:52 +0100 Subject: BidiMirrored property and ancient scripts (Was Re: Plain text custom fraction input) In-Reply-To: <55B2713E.4030006@ix.netcom.com> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com> <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com> <55B2713E.4030006@ix.netcom.com> Message-ID: <20150724212352.101e5030@JRWUBU2> On Fri, 24 Jul 2015 10:09:18 -0700 Asmus Freytag <asmusf at ix.netcom.com> wrote: > On 7/24/2015 2:59 AM, Fr?d?ric Grosshans wrote: > > Let me rephrase my remark in a less ?stupid and dangerous? way. > > > > If a LTR character has the BidiMirrored=No property, it may > > either be mirrored or not when typeset in RTL, depending on other > > factors. Specifically, the BidiMirrored property has not been > > specified for ancient LTR scripts which are mirrored when RTL or > > boustrephodon, like Italic, Runic, Archaic Greek, Archaic Latin, > > Egyptian Hieroglyphs. Note that some RTL script, like Old North > > Arabian, are mirrored when LTR. > We do want "BidiMirrorred=No" to be honored; for example for the > arrows and the ornate parens. And we do not want that to be overridden And at present, that may be overridden in a right-to-left context! I think Fr?d?ric meant Bidi_Class=Left_To_Right by 'LTR', in which case the only paired arrows included are U+2347 APL FUNCTIONAL SYMBOL QUAD LEFTWARDS ARROW and U+2348 APL FUNCTIONAL SYMBOL QUAD RIGHTWARDS ARROW. It's definitely appropriate for U+101D9 PHAISTOS DISC SIGN ARROW. > The issue with the ancient scripts (or any script used to capture > paleographic texts) seems to be primarily with letter shapes, not > punctuation, > and further would apply only to unpaired forms. > > A carefully written note would keep in scope all paired characters. > > It would be nice if there was a property that covered them, but I'm > afraid that BidiMirroringGlyph does not cover the character pairs to > use when BidiMirrored=No and code points need to be substituted to > get the RTL layout correct. That kind of property would be useful for > modern text, e.g. to allow support for automatic re-layout from RTL > to LTR and vice versa for texts containing arrows. Microsoft has frozen BidiMirroringGlyph. Text rendering honours it up to Unicode 5.1 (I think), but thereafter it's up to the font. That may be appropriate for some bidirectional writing systems - I dimly recall that mirroring had a tendency to fail with some letters. > Declaring all unpaired code points overridable "in certain contexts" > or "depending on other factors" might then work. I think a stronger indication is needed. U+2044 FRACTION SLASH had better not be overridable between European numbers or between Arabic numbers, for with a generally linear layout the number on the left is the numerator and the number on the right is the denominator. Am I missing something on the options for this character in a wider right-to-left context? A sequence looking like (numerator, on right) (backslash) (denominator, on left) seems to be known in Arabic maths. I think it is useful to gather the information together in one list, albeit informative. Richard. From eliz at gnu.org Sat Jul 25 02:14:58 2015 From: eliz at gnu.org (Eli Zaretskii) Date: Sat, 25 Jul 2015 10:14:58 +0300 Subject: BidiMirrored property and ancient scripts (Was Re: Plain text custom fraction input) In-Reply-To: <20150724212352.101e5030@JRWUBU2> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com> <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com> <55B2713E.4030006@ix.netcom.com> <20150724212352.101e5030@JRWUBU2> Message-ID: <83380c3dzh.fsf@gnu.org> > Date: Fri, 24 Jul 2015 21:23:52 +0100 > From: Richard Wordingham <richard.wordingham at ntlworld.com> > > On Fri, 24 Jul 2015 10:09:18 -0700 > Asmus Freytag <asmusf at ix.netcom.com> wrote: > > > On 7/24/2015 2:59 AM, Fr?d?ric Grosshans wrote: > > > Let me rephrase my remark in a less ?stupid and dangerous? way. > > > > > > If a LTR character has the BidiMirrored=No property, it may > > > either be mirrored or not when typeset in RTL, depending on other > > > factors. Specifically, the BidiMirrored property has not been > > > specified for ancient LTR scripts which are mirrored when RTL or > > > boustrephodon, like Italic, Runic, Archaic Greek, Archaic Latin, > > > Egyptian Hieroglyphs. Note that some RTL script, like Old North > > > Arabian, are mirrored when LTR. > > > We do want "BidiMirrorred=No" to be honored; for example for the > > arrows and the ornate parens. And we do not want that to be overridden > > And at present, that may be overridden in a right-to-left context! What do you mean by "overridden" in this context? AFAIK, mirroring indeed depends on context, but a character whose BidiMirrorred property is No will _never_ be mirrored, according to the UBA. There are no overrides for that property, AFAIK. From richard.wordingham at ntlworld.com Sat Jul 25 02:17:07 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sat, 25 Jul 2015 08:17:07 +0100 Subject: BidiMirrored property and ancient scripts (Was Re: Plain text custom fraction input) In-Reply-To: <9f243f7007da4b13aece8f6226cf3a2c@DFM-TK5MBX15-06.exchange.corp.microsoft.com> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com> <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com> <55B26795.3020209@att.net> <20150724202958.5bf3399f@JRWUBU2> <9f243f7007da4b13aece8f6226cf3a2c@DFM-TK5MBX15-06.exchange.corp.microsoft.com> Message-ID: <20150725081707.66c86ac6@JRWUBU2> On Fri, 24 Jul 2015 23:11:24 +0000 Murray Sargent <murrays at exchange.microsoft.com> wrote: > Richard questions when mirroring is used. As Ken points out, in > modern BiDi text, such as Arabic and Hebrew, the answer is given by > the Unicode BiDi Algorithm and associated tables. In ancient scripts > and in Boustrophedon, it's given by a higher level protocol. Do you just mean it's determined by the font? Please give me an actual example of any other higher level protocol. So far as I am aware, in OpenType, anything beyond the Unicode 5.1 Bidi Mirroring Glyph property actually resides in the font, in features ltrm, ltra, rtlm and rtla. According to the documentation (https://www.microsoft.com/typography/otspec/TTOCHAP1.htm#ltrrtl), these features are applied automatically whenever mirroring appears to be appropriate for a run. I'd guess that this means for a resolved level greater than zero. Boustrophedon could be given by a higher level protocol. Are there any examples of such a higher level protocol? There are issues with DIY implementations - text with commas and contour integrals would come unstuck! More seriously, I believe there may be a minority of letters which don't mirror in writing systems where mirroring otherwise happens. > The UBA wasn't designed to handle mirroring for those scripts. It wasn't designed for N'ko or Kharoshthi either. It just happens to work for them as well. What is true is that the properties of Egyptian hieroglyphs weren't designed to work with the UBA. It is also true that they weren't set up to work as a script - they're currently more like mathematical symbols. As I've noted, there is work in progress to enable the writing of 'plain-text' Egyptian. The UBA's got LRO and RLO. While not ideal, they ought to work for scripts where both directions are regularly used. Of course, footnote numbers would complicate matters, but here we would be getting away from plain text. Richard. From richard.wordingham at ntlworld.com Sat Jul 25 02:44:22 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sat, 25 Jul 2015 08:44:22 +0100 Subject: BidiMirrored property and ancient scripts (Was Re: Plain text custom fraction input) In-Reply-To: <83380c3dzh.fsf@gnu.org> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com> <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com> <55B2713E.4030006@ix.netcom.com> <20150724212352.101e5030@JRWUBU2> <83380c3dzh.fsf@gnu.org> Message-ID: <20150725084422.2b96491b@JRWUBU2> On Sat, 25 Jul 2015 10:14:58 +0300 Eli Zaretskii <eliz at gnu.org> wrote: > From: Richard Wordingham <richard.wordingham at ntlworld.com> > > Asmus Freytag <asmusf at ix.netcom.com> wrote: > > > We do want "BidiMirrorred=No" to be honored; for example for the > > > arrows and the ornate parens. And we do not want that to be > > > overridden > > And at present, that may be overridden in a right-to-left context! > What do you mean by "overridden" in this context? AFAIK, mirroring > indeed depends on context, but a character whose BidiMirrorred > property is No will _never_ be mirrored, according to the UBA. There > are no overrides for that property, AFAIK. Reread the Bidi algorithm, especially http://www.unicode.org/reports/tr9/#L4 and http://www.unicode.org/reports/tr9/#HL6. In principle, I could have a higher-level protocol that mirrors lamedh on Wednesdays, but I must follow the rules for parentheses. It's part of the tendency to write specifications as 'Do what you want, but we recommend...'. It eliminates non-compliances without increasing compatibility. Richard, From eliz at gnu.org Sat Jul 25 02:51:19 2015 From: eliz at gnu.org (Eli Zaretskii) Date: Sat, 25 Jul 2015 10:51:19 +0300 Subject: BidiMirrored property and ancient scripts (Was Re: Plain text custom fraction input) In-Reply-To: <20150725084422.2b96491b@JRWUBU2> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com> <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com> <55B2713E.4030006@ix.netcom.com> <20150724212352.101e5030@JRWUBU2> <83380c3dzh.fsf@gnu.org> <20150725084422.2b96491b@JRWUBU2> Message-ID: <83r3nw1xqg.fsf@gnu.org> > Date: Sat, 25 Jul 2015 08:44:22 +0100 > From: Richard Wordingham <richard.wordingham at ntlworld.com> > > On Sat, 25 Jul 2015 10:14:58 +0300 > Eli Zaretskii <eliz at gnu.org> wrote: > > > From: Richard Wordingham <richard.wordingham at ntlworld.com> > > > Asmus Freytag <asmusf at ix.netcom.com> wrote: > > > > > We do want "BidiMirrorred=No" to be honored; for example for the > > > > arrows and the ornate parens. And we do not want that to be > > > > overridden > > > > And at present, that may be overridden in a right-to-left context! > > > What do you mean by "overridden" in this context? AFAIK, mirroring > > indeed depends on context, but a character whose BidiMirrorred > > property is No will _never_ be mirrored, according to the UBA. There > > are no overrides for that property, AFAIK. > > Reread the Bidi algorithm, especially > http://www.unicode.org/reports/tr9/#L4 and > http://www.unicode.org/reports/tr9/#HL6. > > In principle, I could have a higher-level protocol that mirrors lamedh > on Wednesdays, but I must follow the rules for parentheses. I don't see how this is related. What HL6 describes is something that should make sense. For example, Emacs uses '/' as a kind of "mirrored" '\', when it needs to indicate that a line in an R2L paragraph is continued on the next screen line. By contrast, indiscriminately mirroring random characters that don't really have mirrored glyphs, in the context of modern scripts, doesn't make any sense, IMO, so it should never be done. > It's part of the tendency to write specifications as 'Do what you want, > but we recommend...'. It eliminates non-compliances without increasing > compatibility. Just say no. From richard.wordingham at ntlworld.com Sat Jul 25 04:11:02 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sat, 25 Jul 2015 10:11:02 +0100 Subject: BidiMirrored property and ancient scripts (Was Re: Plain text custom fraction input) In-Reply-To: <83r3nw1xqg.fsf@gnu.org> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com> <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com> <55B2713E.4030006@ix.netcom.com> <20150724212352.101e5030@JRWUBU2> <83380c3dzh.fsf@gnu.org> <20150725084422.2b96491b@JRWUBU2> <83r3nw1xqg.fsf@gnu.org> Message-ID: <20150725101102.16dbf4ed@JRWUBU2> On Sat, 25 Jul 2015 10:51:19 +0300 Eli Zaretskii <eliz at gnu.org> wrote: > > Reread the Bidi algorithm, especially > > http://www.unicode.org/reports/tr9/#L4 and > > http://www.unicode.org/reports/tr9/#HL6. > > > > In principle, I could have a higher-level protocol that mirrors > > lamedh on Wednesdays, but I must follow the rules for parentheses. > > I don't see how this is related. What HL6 describes is something that > should make sense. For example, Emacs uses '/' as a kind of > "mirrored" '\', when it needs to indicate that a line in an R2L > paragraph is continued on the next screen line. HL6 reads: "Certain characters that do not have the Bidi_Mirrored property can also be depicted by a mirrored glyph in specialized contexts. Such contexts include, but are not limited to, historic scripts and associated punctuation, private-use characters, and characters in mathematical expressions. (See Section 7, Mirroring.) These characters are those that fit at least one of the following conditions: 1) Characters with a resolved directionality of R 2) Characters with a resolved directionality of L and whose bidirectional type is R or AL" The logic of my statement is as follows: a) 'Specialised contexts' is undefined; 'specialised context' may therefore include 'whenever I see fit'. b) The bidirectional type of lamedh is 'R', and it will always have a resolved directionality. The resolved directionalities are 'L' and 'R'. c) Therefore I may choose to mirror all lamedhs on Wednesdays. Similarly, an arrow with a resolved directionality of R may be mirrored if a higher level protocol so dictates. The issue lies with the wording of condition (1). One might expect it to apply only to characters with a bidirectional type of L. That should work for text whose directionality is known when written. It would be interesting to hear the rationale for the wording. My surmise is that it attempts to address text whose directionality is not known before rendering. The most obvious example would be where an application is laying out boustrophedon text in. The author would not be able to correctly choose between COMMA and REVERSED COMMA (an anachronistic example) depending on text direction if line-breaks were not fixed. Richard. From eliz at gnu.org Sat Jul 25 04:52:53 2015 From: eliz at gnu.org (Eli Zaretskii) Date: Sat, 25 Jul 2015 12:52:53 +0300 Subject: BidiMirrored property and ancient scripts (Was Re: Plain text custom fraction input) In-Reply-To: <20150725101102.16dbf4ed@JRWUBU2> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com> <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com> <55B2713E.4030006@ix.netcom.com> <20150724212352.101e5030@JRWUBU2> <83380c3dzh.fsf@gnu.org> <20150725084422.2b96491b@JRWUBU2> <83r3nw1xqg.fsf@gnu.org> <20150725101102.16dbf4ed@JRWUBU2> Message-ID: <83mvyk1s3u.fsf@gnu.org> > Date: Sat, 25 Jul 2015 10:11:02 +0100 > From: Richard Wordingham <richard.wordingham at ntlworld.com> > > On Sat, 25 Jul 2015 10:51:19 +0300 > Eli Zaretskii <eliz at gnu.org> wrote: > > > > Reread the Bidi algorithm, especially > > > http://www.unicode.org/reports/tr9/#L4 and > > > http://www.unicode.org/reports/tr9/#HL6. > > > > > > In principle, I could have a higher-level protocol that mirrors > > > lamedh on Wednesdays, but I must follow the rules for parentheses. > > > > I don't see how this is related. What HL6 describes is something that > > should make sense. For example, Emacs uses '/' as a kind of > > "mirrored" '\', when it needs to indicate that a line in an R2L > > paragraph is continued on the next screen line. > > HL6 reads: > > "Certain characters that do not have the Bidi_Mirrored property can also > be depicted by a mirrored glyph in specialized contexts. Such contexts > include, but are not limited to, historic scripts and associated > punctuation, private-use characters, and characters in mathematical > expressions. (See Section 7, Mirroring.) These characters are those > that fit at least one of the following conditions: > > 1) Characters with a resolved directionality of R > 2) Characters with a resolved directionality of L and whose > bidirectional type is R or AL" Yes. > The logic of my statement is as follows: > > a) 'Specialised contexts' is undefined; 'specialised context' may > therefore include 'whenever I see fit'. No. HLn clauses are for implementations that use their specialized logic on top of the UBA-mandated behavior. That logic must make sense, in the context of the implemented functionality. "Whenever I see fit" doesn't fulfill that requirement, certainly not when the implementation has anything to do with presenting human-readable text. > b) The bidirectional type of lamedh is 'R', and it will always have > a resolved directionality. The resolved directionalities are 'L' and > 'R'. But it doesn't have a mirrored glyph, at least not in most fonts. > c) Therefore I may choose to mirror all lamedhs on Wednesdays. If your implementation's purpose is to illustrate random permutations of glyphs, or artificially scrambling the text appearance, maybe. But if the implementation's purpose is to present a legible text using that character in some modern script, then no, it makes no sense and would be perceived as a bug. Although it'd probably be rendered "not guilty for lack of evidence" in a court of UBA law. > Similarly, an arrow with a resolved directionality of R may be mirrored > if a higher level protocol so dictates. Again, you'd have to present a protocol that makes sense in the context of the specific implementation. Otherwise, it's a bug. > The issue lies with the wording of condition (1). One might expect it > to apply only to characters with a bidirectional type of L. I see no reason to restrict this to L characters. I'd be interested to hear your rationale for that. > My surmise is that it attempts to address text whose directionality > is not known before rendering. Indeed, UBA mirroring is only relevant to neutral characters. > The most obvious example would be where an application is laying out > boustrophedon text in. I don't think so. I agree with those who maintain that boustrophedon is unidirectional text, and so out of scope for the UBA. From charupdate at orange.fr Sat Jul 25 05:38:39 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Sat, 25 Jul 2015 12:38:39 +0200 (CEST) Subject: Plain text custom fraction input Message-ID: <1685728552.8207.1437820719699.JavaMail.www@wwinf1e23> On 24 Jul 2015, at 21:24, Doug Ewell wrote: > It's not a matter of one being plain text and the other not. Read > Section 3.7, "Decomposition" [1] to learn about canonical and > compatibility decomposition. > > In general, the Glossary [2] and FAQ [3] are useful resources. > > [1] http://www.unicode.org/versions/Unicode7.0.0/ch03.pdf#G729 > [2] http://www.unicode.org/glossary/ > [3] http://www.unicode.org/faq/ Thank you, this is indeed indispensible to know, I'll try to get the time of learning thoroughly how it works and how not to abuse of terminology. Best regards, Marcel P.S.: Below I'll try to recomplete my last e-mail as it was when I wrote it in plain text, before applying the font formatting. The use of lt/gt as angle brackets is very tricky because engines may confuse them with valid HTML tags and make disappear the whole. We can type them as & l t ;? and? & g t ; but this is not safe. Now I'll use curly and square brackets. ? > Marcel Schneider wrote: > > > Representing fractions as {fraction} [digit] U+2044 [digit] is known as a compatibility mapping, > > equally like representing a superscript as {super} [digit] , while (I go on checking > > my knowledge...) representing a precomposed diacriticized letter as [letter] [combining diacritic] is > > known as a decomposition mapping. The difference between the two ways > > of getting the same thing is in plain text. With decomposition we stay > > in plain text, while compatibility mappings need formatting, thus > > leaving the field of plain text. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150725/b432abc7/attachment.html> From richard.wordingham at ntlworld.com Sat Jul 25 08:36:51 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sat, 25 Jul 2015 14:36:51 +0100 Subject: BidiMirrored property and ancient scripts (Was Re: Plain text custom fraction input) In-Reply-To: <83mvyk1s3u.fsf@gnu.org> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com> <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com> <55B2713E.4030006@ix.netcom.com> <20150724212352.101e5030@JRWUBU2> <83380c3dzh.fsf@gnu.org> <20150725084422.2b96491b@JRWUBU2> <83r3nw1xqg.fsf@gnu.org> <20150725101102.16dbf4ed@JRWUBU2> <83mvyk1s3u.fsf@gnu.org> Message-ID: <20150725143651.059e466a@JRWUBU2> On Sat, 25 Jul 2015 12:52:53 +0300 Eli Zaretskii <eliz at gnu.org> wrote: > > Date: Sat, 25 Jul 2015 10:11:02 +0100 > > From: Richard Wordingham <richard.wordingham at ntlworld.com> > > > > On Sat, 25 Jul 2015 10:51:19 +0300 > > Eli Zaretskii <eliz at gnu.org> wrote: > If your implementation's purpose is to illustrate random permutations > of glyphs, or artificially scrambling the text appearance, maybe. Obviously the purpose would be to demonstrate that a cart and horses can be driven through the Unicode standard. > But > if the implementation's purpose is to present a legible text using > that character in some modern script, then no, it makes no sense and > would be perceived as a bug. Although it'd probably be rendered "not > guilty for lack of evidence" in a court of UBA law. No, it should be "not guilty because acting lawfully". > > Similarly, an arrow with a resolved directionality of R may be > > mirrored if a higher level protocol so dictates. > > Again, you'd have to present a protocol that makes sense in the > context of the specific implementation. Otherwise, it's a bug. No, it's a feature. :-) It's only a bug if there's a requirement to be fit for purpose. If the purpose of the implementation is to gobble up disk space, then it's not a bug. > > The issue lies with the wording of condition (1). One might expect > > it to apply only to characters with a bidirectional type of L. > I see no reason to restrict this to L characters. I'd be interested > to hear your rationale for that. A) A strong character's form in the corresponding directional context is the form identified by the Unicode charts. If it is of type AL or R, it will , by definition, not be mirrored. B) A weak or neutral character's form in the charts is the form that occurs in the left-to-right direction. Such a character has Bidi-mirrored set to Yes if it has different forms for left-to-right and right-to-left. By rule L4, it will be mirrored if it receives a resolved direction of R. C) A character of type L may need to be mirrored if it receives a resolved directionality of R. The most notable example is Egyptian hieroglyphs, but the same applies to Greek. There is a definite hole in my argument for non-spacing marks; marks used primarily in the Arabic script are shown in a form they take in a right-to-left context. > > > My surmise is that it attempts to address text whose directionality > > is not known before rendering. > > Indeed, UBA mirroring is only relevant to neutral characters. Then how do you explain condition (2): "Characters with a resolved directionality of L and whose bidirectional type is R or AL" Obviously these characters are not neutral characters. The only way they can acquire a resolved directionality of R is by application of RLO. > I don't think so. I agree with those who maintain that boustrophedon > is unidirectional text, and so out of scope for the UBA. There are three main parts to the UBA: 1) Interpreting the text as nested runs of text in the same order. 2) Sorting out the left-to-right order in which to write them (L2) 3) Sorting out mirroring (L4) Interpreting LRO and RLO is part of (1). I'd like to know what the justification for have directionality overrides is. Now, ancient boustrophedon text, to the best of my knowledge, does not need parts 1 to 2. Modern numerical place notation should be a problem when writing boustrophedon. Boustrophedon starts from the assumption that text has an order from start to finish, but numbers in place notation have a left and a right. Where we may part company is in our view of Hebrew text (no Arabic numbers) with parentheses in a right-to-left paragraph. I think such text is really just as unidirectional as equivalent Latin text in a left-to-right paragraph. However, one needs the UBA to sort out the rendering of the parentheses in the Hebrew text. Indeed, one may rely on the bidi algorithm to declare the Latin example unidirectional. If one can determine that text to be rendered boustrophedon is genuinely 'unidirectional', it seems entirely reasonable to call upon the Bidi algorithm to sort out the mirroring of glyphs on a *line* once one has chosen the direction of a line. Where we may have a problem is that the Latin and Hebrew commas have the same codepoint, *despite* having the same appearance. I can accept is that the handling a mixture of boustrophedon, left-to-right and right-to-left text is to much to ask of the Bidi algorithm. The very first problem is that of defining what would constitute unidirectional boustrophedon text Richard. From eliz at gnu.org Sat Jul 25 09:26:14 2015 From: eliz at gnu.org (Eli Zaretskii) Date: Sat, 25 Jul 2015 17:26:14 +0300 Subject: BidiMirrored property and ancient scripts (Was Re: Plain text custom fraction input) In-Reply-To: <20150725143651.059e466a@JRWUBU2> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com> <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com> <55B2713E.4030006@ix.netcom.com> <20150724212352.101e5030@JRWUBU2> <83380c3dzh.fsf@gnu.org> <20150725084422.2b96491b@JRWUBU2> <83r3nw1xqg.fsf@gnu.org> <20150725101102.16dbf4ed@JRWUBU2> <83mvyk1s3u.fsf@gnu.org> <20150725143651.059e466a@JRWUBU2> Message-ID: <83fv4c1fg9.fsf@gnu.org> > Date: Sat, 25 Jul 2015 14:36:51 +0100 > From: Richard Wordingham <richard.wordingham at ntlworld.com> > > > > The issue lies with the wording of condition (1). One might expect > > > it to apply only to characters with a bidirectional type of L. > > > I see no reason to restrict this to L characters. I'd be interested > > to hear your rationale for that. > > A) A strong character's form in the corresponding directional context > is the form identified by the Unicode charts. If it is of type AL or > R, it will , by definition, not be mirrored. > > B) A weak or neutral character's form in the charts is the form that > occurs in the left-to-right direction. Such a character has > Bidi-mirrored set to Yes if it has different forms for left-to-right and > right-to-left. By rule L4, it will be mirrored if it receives a > resolved direction of R. > > C) A character of type L may need to be mirrored if it receives a > resolved directionality of R. The most notable example is Egyptian > hieroglyphs, but the same applies to Greek. Mirroring is not changing a character's shape. It is a replacement of a character's glyph with a glyph of a different character. Thus, your reasons make no sense to me, because a character's shape, any character's shape, be it L, R, AL, or anything else, is immutable. > There is a definite hole in my argument for non-spacing marks; marks > used primarily in the Arabic script are shown in a form they take in a > right-to-left context. I don't think it's a hole. I think your interpretation of this is entirely wrong. > > > My surmise is that it attempts to address text whose directionality > > > is not known before rendering. > > > > Indeed, UBA mirroring is only relevant to neutral characters. > > Then how do you explain condition (2): > > "Characters with a resolved directionality of L and whose > bidirectional type is R or AL" I never saw an example of it. Can you show something like that? Note that those conditions are "at least one of", so they are not all required to be true at the same time. > Obviously these characters are not neutral characters. The only way > they can acquire a resolved directionality of R is by application of > RLO. You mean, resolved directionality of L and LRO, right? Anyway, let's talk about a concrete example of applying this rule, shall we? I'm guessing this is for some very specific characters in a script I never used. > > I don't think so. I agree with those who maintain that boustrophedon > > is unidirectional text, and so out of scope for the UBA. > > There are three main parts to the UBA: > > 1) Interpreting the text as nested runs of text in the same order. I take it that by this you mean resolving the level of each character. To me, that is the main part of the UBA; all the rest is almost trivial. > 2) Sorting out the left-to-right order in which to write them (L2) > > 3) Sorting out mirroring (L4) > > Interpreting LRO and RLO is part of (1). I'd like to know what the > justification for have directionality overrides is. One justification is when you want to present characters in some particular order that overrides their innate bidirectional properties. For example, imagine you want to tell your readers what will some bidirectional text look like after reordering by the UBA, and you want to do that without relying on the UBA implementation of whatever software is used to view your presentation. > Where we may part company is in our view of Hebrew text (no Arabic > numbers) with parentheses in a right-to-left paragraph. I think such > text is really just as unidirectional as equivalent Latin text in a > left-to-right paragraph. No, not as soon as numbers or Latin characters are involved, IMO. > However, one needs the UBA to sort out the rendering of the > parentheses in the Hebrew text. Not really, you can short-cut it, the same as in strictly left-to-right text. > Indeed, one may rely on the bidi algorithm to declare the Latin > example unidirectional. One might, but to what purpose and goal? > If one can determine that text to be rendered boustrophedon is genuinely > 'unidirectional', it seems entirely reasonable to call upon the Bidi > algorithm to sort out the mirroring of glyphs on a *line* once one has > chosen the direction of a line. No, not as soon as characters of different or weak/neutral directionality are involved, IMO. From richard.wordingham at ntlworld.com Sat Jul 25 12:27:26 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sat, 25 Jul 2015 18:27:26 +0100 Subject: BidiMirrored property and ancient scripts In-Reply-To: <83fv4c1fg9.fsf@gnu.org> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com> <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com> <55B2713E.4030006@ix.netcom.com> <20150724212352.101e5030@JRWUBU2> <83380c3dzh.fsf@gnu.org> <20150725084422.2b96491b@JRWUBU2> <83r3nw1xqg.fsf@gnu.org> <20150725101102.16dbf4ed@JRWUBU2> <83mvyk1s3u.fsf@gnu.org> <20150725143651.059e466a@JRWUBU2> <83fv4c1fg9.fsf@gnu.org> Message-ID: <20150725182726.533d4b78@JRWUBU2> On Sat, 25 Jul 2015 17:26:14 +0300 Eli Zaretskii <eliz at gnu.org> wrote: > > Date: Sat, 25 Jul 2015 14:36:51 +0100 > > From: Richard Wordingham <richard.wordingham at ntlworld.com> > > > > > > The issue lies with the wording of condition (1). One might > > > > expect it to apply only to characters with a bidirectional type > > > > of L. > > > > > I see no reason to restrict this to L characters. I'd be > > > interested to hear your rationale for that. > > > > A) A strong character's form in the corresponding directional > > context is the form identified by the Unicode charts. If it is of > > type AL or R, it will , by definition, not be mirrored. > > > > B) A weak or neutral character's form in the charts is the form that > > occurs in the left-to-right direction. Such a character has > > Bidi-mirrored set to Yes if it has different forms for > > left-to-right and right-to-left. By rule L4, it will be mirrored > > if it receives a resolved direction of R. > > > > C) A character of type L may need to be mirrored if it receives a > > resolved directionality of R. The most notable example is Egyptian > > hieroglyphs, but the same applies to Greek. > > Mirroring is not changing a character's shape. It is a replacement of > a character's glyph with a glyph of a different character. Mirroring is changing a glyph to suitable for reading in the other direction. Note the following extract from BidiMirroring.txt in the Unicode Character Database: <quote> # The following characters have no appropriate mirroring character. # For these characters it is up to the rendering system # to provide mirrored glyphs. # 2140; DOUBLE-STRUCK N-ARY SUMMATION # 2201; COMPLEMENT # 2202; PARTIAL DIFFERENTIAL <snip/> </quote> > Thus, your reasons make no sense to me, because a character's shape, > any character's shape, be it L, R, AL, or anything else, is immutable. So go back and reread. > > There is a definite hole in my argument for non-spacing marks; marks > > used primarily in the Arabic script are shown in a form they take > > in a right-to-left context. > I don't think it's a hole. I think your interpretation of this is > entirely wrong. > > > > My surmise is that it attempts to address text whose > > > > directionality is not known before rendering. > > > > > > Indeed, UBA mirroring is only relevant to neutral characters. > > > > Then how do you explain condition (2): > > > > "Characters with a resolved directionality of L and whose > > bidirectional type is R or AL" > > I never saw an example of it. Can you show something like that? Fr?d?ric gave the example of Old North Arabian - there are samples at http://www.mnh.si.edu/epigraphy/e_pre-islamic/safaitic.htm > Note that those conditions are "at least one of", so they are not all > required to be true at the same time. Obviously, since a character cannot simultaneously have both resolved directions. > > Obviously these characters are not neutral characters. The only way > > they can acquire a resolved directionality of R is by application of > > RLO. > > You mean, resolved directionality of L and LRO, right? Sorry, you're correct. > Anyway, let's talk about a concrete example of applying this rule, > shall we? I'm guessing this is for some very specific characters in a > script I never used. I rather suspect it's for all current characters in a script you never used. Given half a chance, a script with weak directionality will be encoded with Bidi-class L letters. Old North Arabian has squeezed in as a right-to-left script. > > > I don't think so. I agree with those who maintain that > > > boustrophedon is unidirectional text, and so out of scope for the > > > UBA. > > > > There are three main parts to the UBA: > > > > 1) Interpreting the text as nested runs of text in the same order. > > I take it that by this you mean resolving the level of each > character. To me, that is the main part of the UBA; all the rest is > almost trivial. The nesting is implied by the levels, but the levels are just a means to store the nesting and an elegant way of storing the direction. There is a distressing tendency of Unicode algorithms to just record the algorithm, rather than to explain what is being done. Perfectly intelligible steps can end up looking like an arcane dance. > > 2) Sorting out the left-to-right order in which to write them (L2) > > > > 3) Sorting out mirroring (L4) > > > > Interpreting LRO and RLO is part of (1). I'd like to know what the > > justification for have directionality overrides is. > > One justification is when you want to present characters in some > particular order that overrides their innate bidirectional properties. > For example, imagine you want to tell your readers what will some > bidirectional text look like after reordering by the UBA, and you want > to do that without relying on the UBA implementation of whatever > software is used to view your presentation. Brute force layout! That makes it seem that overriding strong types was an error that leaves people hoping for support for switching text direction. > > Where we may part company is in our view of Hebrew text (no Arabic > > numbers) with parentheses in a right-to-left paragraph. I think > > such text is really just as unidirectional as equivalent Latin text > > in a left-to-right paragraph. > > No, not as soon as numbers or Latin characters are involved, IMO. My example, which your e-mail client may take as being in a left-to-right paragraph, is: ????? ????? / ???? (?? ?????? ?????? ??? ??????) > > However, one needs the UBA to sort out the rendering of the > > parentheses in the Hebrew text. > Not really, you can short-cut it, the same as in strictly > left-to-right text. It's the UBA that mandates that the opening and closing parentheses be rendered like right and left parentheses respectively rather than like left and right parentheses. I think it may be compatible with the character identity for the U+0028 glyph to be marked with a tiny 'o' regardless of whether it broadly looks like a left or a right parenthesis. > > Indeed, one may rely on the bidi algorithm to declare the Latin > > example unidirectional. > > One might, but to what purpose and goal? A right-to-left paragraph consisting of the two characters "(a" would be bidirectional and have a parenthesis on the right; a left-to-right paragraph with the same content would have a parenthesis on the left. The e-mail client I'm using has no higher-level protocol to determine whether a paragraph is left-to-right or right-to-left, but uses the first strong character. Notepad (Windows 7, at least) seems to have two options - all paragraphs are left-to-right, or all paragraphs are right-to-left. > > If one can determine that text to be rendered boustrophedon is > > genuinely 'unidirectional', it seems entirely reasonable to call > > upon the Bidi algorithm to sort out the mirroring of glyphs on a > > *line* once one has chosen the direction of a line. > > No, not as soon as characters of different or weak/neutral > directionality are involved, IMO. If the paragraph contains any digits, it is not genuinely unidirectional. If it is, and there are no unmatched PDF characters, one can just prefix LRO or RLO to each line to get the right directionality. If there are strong characters of different directionalities, then it is unlikely that the paragraph is genuinely unidirectional. The full tridirectional (left, right and boustrophedon) algorithm is likely to be extremely fiddly, as well as dependent on non-existent information. Richard. From eliz at gnu.org Sat Jul 25 13:05:41 2015 From: eliz at gnu.org (Eli Zaretskii) Date: Sat, 25 Jul 2015 21:05:41 +0300 Subject: BidiMirrored property and ancient scripts In-Reply-To: <20150725182726.533d4b78@JRWUBU2> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com> <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com> <55B2713E.4030006@ix.netcom.com> <20150724212352.101e5030@JRWUBU2> <83380c3dzh.fsf@gnu.org> <20150725084422.2b96491b@JRWUBU2> <83r3nw1xqg.fsf@gnu.org> <20150725101102.16dbf4ed@JRWUBU2> <83mvyk1s3u.fsf@gnu.org> <20150725143651.059e466a@JRWUBU2> <83fv4c1fg9.fsf@gnu.org> <20150725182726.533d4b78@JRWUBU2> Message-ID: <831tfw15ai.fsf@gnu.org> > Date: Sat, 25 Jul 2015 18:27:26 +0100 > From: Richard Wordingham <richard.wordingham at ntlworld.com> > > > Mirroring is not changing a character's shape. It is a replacement of > > a character's glyph with a glyph of a different character. > > Mirroring is changing a glyph to suitable for reading in the other > direction. Sorry, I disagree. > Note the following extract from BidiMirroring.txt in the > Unicode Character Database: > > <quote> > # The following characters have no appropriate mirroring character. > # For these characters it is up to the rendering system > # to provide mirrored glyphs. How's that a contradiction to what I said? > > Thus, your reasons make no sense to me, because a character's shape, > > any character's shape, be it L, R, AL, or anything else, is immutable. > > So go back and reread. Did that; still no sense. > > > Interpreting LRO and RLO is part of (1). I'd like to know what the > > > justification for have directionality overrides is. > > > > One justification is when you want to present characters in some > > particular order that overrides their innate bidirectional properties. > > For example, imagine you want to tell your readers what will some > > bidirectional text look like after reordering by the UBA, and you want > > to do that without relying on the UBA implementation of whatever > > software is used to view your presentation. > > Brute force layout! That makes it seem that overriding strong types > was an error that leaves people hoping for support for switching text > direction. No, not at all. Think various needs of presenting error messages that quote bidirectional text, etc. I had plenty of those problems in Emacs. > > > Where we may part company is in our view of Hebrew text (no Arabic > > > numbers) with parentheses in a right-to-left paragraph. I think > > > such text is really just as unidirectional as equivalent Latin text > > > in a left-to-right paragraph. > > > > No, not as soon as numbers or Latin characters are involved, IMO. > > My example, which your e-mail client may take as being in a > left-to-right paragraph, is: > ????? ????? / ???? (?? ?????? ?????? ??? ??????) I'm reading this in Emacs, so the layout is R2L, as it should be. But there are no numbers or Latin characters in this example, so it's not what I had in mind. > > > However, one needs the UBA to sort out the rendering of the > > > parentheses in the Hebrew text. > > > Not really, you can short-cut it, the same as in strictly > > left-to-right text. > > It's the UBA that mandates that the opening and closing parentheses be > rendered like right and left parentheses respectively rather than like > left and right parentheses. Mirroring comes after layout in the UBA, as you pointed out, and the short-cuts I mentioned are about layout, not about mirroring. > > > Indeed, one may rely on the bidi algorithm to declare the Latin > > > example unidirectional. > > > > One might, but to what purpose and goal? > > A right-to-left paragraph consisting of the two characters "(a" would > be bidirectional and have a parenthesis on the right; a left-to-right > paragraph with the same content would have a parenthesis on the left. I don't see how this answers my question. > The e-mail client I'm using has no higher-level protocol to determine > whether a paragraph is left-to-right or right-to-left, but uses the > first strong character. Notepad (Windows 7, at least) seems to have two > options - all paragraphs are left-to-right, or all paragraphs are > right-to-left. Emacs has those 3 options, and it also has a higher-level protocol, whereby the paragraph direction is only decided after an empty line. But I still don't see how this is relevant. From richard.wordingham at ntlworld.com Sat Jul 25 16:15:40 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sat, 25 Jul 2015 22:15:40 +0100 Subject: BidiMirrored property and ancient scripts In-Reply-To: <831tfw15ai.fsf@gnu.org> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com> <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com> <55B2713E.4030006@ix.netcom.com> <20150724212352.101e5030@JRWUBU2> <83380c3dzh.fsf@gnu.org> <20150725084422.2b96491b@JRWUBU2> <83r3nw1xqg.fsf@gnu.org> <20150725101102.16dbf4ed@JRWUBU2> <83mvyk1s3u.fsf@gnu.org> <20150725143651.059e466a@JRWUBU2> <83fv4c1fg9.fsf@gnu.org> <20150725182726.533d4b78@JRWUBU2> <831tfw15ai.fsf@gnu.org> Message-ID: <20150725221540.72e6ee48@JRWUBU2> On Sat, 25 Jul 2015 21:05:41 +0300 Eli Zaretskii <eliz at gnu.org> wrote: > > Date: Sat, 25 Jul 2015 18:27:26 +0100 > > From: Richard Wordingham <richard.wordingham at ntlworld.com> > > > > > Mirroring is not changing a character's shape. It is a > > > replacement of a character's glyph with a glyph of a different > > > character. > > > > Mirroring is changing a glyph to suitable for reading in the other > > direction. > > Sorry, I disagree. > > > Note the following extract from BidiMirroring.txt in the > > Unicode Character Database: > > > > <quote> > > # The following characters have no appropriate mirroring character. > > # For these characters it is up to the rendering system > > # to provide mirrored glyphs. > > How's that a contradiction to what I said? U+2140 DOUBLE-STRUCK N-ARY SUMMATION gets mirrored, but its glyph is not replaced by any other character's glyph. Or are you claiming that left-to-right U+2140 and right-to-left U+2140 are two different characters? > > > Thus, your reasons make no sense to me, because a character's > > > shape, any character's shape, be it L, R, AL, or anything else, > > > is immutable. > > > > So go back and reread. > > Did that; still no sense. Because you still seem not to understand the concept of mirroring. It isn't just for characters that have a Bidi_Mirroring_Glyph property value other than <none>. > > > > Where we may part company is in our view of Hebrew text (no > > > > Arabic numbers) with parentheses in a right-to-left paragraph. > > > > I think such text is really just as unidirectional as > > > > equivalent Latin text in a left-to-right paragraph. > > My example <snip> is: > > > ????? ????? / ???? (?? ?????? ?????? ??? ??????) > > > > > However, one needs the UBA to sort out the rendering of the > > > > parentheses in the Hebrew text. > > > > > Not really, you can short-cut it, the same as in strictly > > > left-to-right text. > > > > It's the UBA that mandates that the opening and closing parentheses > > be rendered like right and left parentheses respectively rather > > than like left and right parentheses. > > Mirroring comes after layout in the UBA, as you pointed out, and the > short-cuts I mentioned are about layout, not about mirroring. So irrelevant. I take it we now agree that the right shape for the parentheses for the unidirectional right-to-left example is derived by the UBA. > > > > Indeed, one may rely on the bidi algorithm to declare the Latin > > > > example unidirectional. > > > > > > One might, but to what purpose and goal? > > > > A right-to-left paragraph consisting of the two characters "(a" > > would be bidirectional and have a parenthesis on the right; a > > left-to-right paragraph with the same content would have a > > parenthesis on the left. If there is no higher-level protocol in effect, the 'first strong character' rule (Rules P2 and P3 of the UBA) declares that the paragraph will be a left-to-right paragraph and will look like "(a". Had it been declared a right-to-left paragraph by a higher-level protocol, it would look like "a)". Thus the UBA has a r?le even for unidirectional left-to-right text. Richard. From wjgo_10009 at btinternet.com Sat Jul 25 11:43:09 2015 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Sat, 25 Jul 2015 17:43:09 +0100 (BST) Subject: Emoji characters for food allergens Message-ID: <29292306.26076.1437842589469.JavaMail.defaultUser@defaultHost> Emoji characters for food allergens An interesting document entitled Preliminary proposal to add emoji characters for food allergens by Hiroyuki Komatsu was added into the UTC (Unicode Technical Committee) Document Register yesterday. http://www.unicode.org/L2/L2015/15197-emoji-food-allergens.pdf This is a welcome development. I suggest that, in view of the importance of precision in conveying information about food allergens, that the emoji characters for food allergens should be separate characters from other emoji characters. That is, encoded in a separate quite distinct block of code points far away in the character map from other emoji characters, with no dual meanings for any of the characters: a character for a food allergen should be quite separate and distinct from a character for any other meaning. I opine that having two separate meanings for the same character, one meaning as an everyday jolly good fun meaning in a text message and one meaning as a specialist food allergen meaning could be a source of confusion. Far better to encode a separate code block with separate characters right from the start than risk needless and perhaps medically dangerous confusion in the future. I suggest that for each allergen that there be two characters. The glyph for the first character of the pair goes from baseline to ascender. The glyph for the second character of the pair is a copy of the glyph for the first character of the pair augmented with a thick red line from lower left descender to higher right a little above the base line, the thick red line perhaps being at about thirty degrees from the horizontal. Thus the thick red line would go over the allergen part of the glyph yet just by clipping it a bit so that clarity is maintained. The glyphs are thus for the presence of the allergen and the absence of the allergen respectively. It is typical in the United Kingdom to label food packets not only with an ingredients list but also with a list of allergens in the food and also with a list of allergens not in the food. For example, a particular food may contain soya yet not gluten. Thus I opine that two characters are needed for each allergen. I have deliberately avoided a total strike through at forty-five degrees as I opine that that could lead to problems distinguishing clearly the glyph for the absence of one allergen from the glyph for the absence of another allergen. I have also wondered whether each glyph for an allergen should include within its glyph a number, maybe a three-digit number, so that clarity is precise. I opine that two separate characters for each allergen is desirable rather than some solution such as having one character for each allergen and a combining strike through character. The two separate characters approach keeps the system straightforward to use with many software packages. The matter of expressing food allergens is far too important to become entangled in problems for everyday users. For gluten, it might be necessary to have three distinct code points. In the United Kingdom there is a legal difference between "gluten-free" and "no gluten-containing ingredients". To be labelled gluten-free the product must have been tested. This is to ensure that there has been no cross-contamination of ingredients. For example, rice has no gluten, but was a particular load of rice transported in a lorry used for wheat on other days? Yet testing is not always possible in a restaurant situation. William Overington 25 July 2015 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150725/39c6d0cb/attachment.html> From gwalla at gmail.com Sun Jul 26 00:05:21 2015 From: gwalla at gmail.com (Garth Wallace) Date: Sat, 25 Jul 2015 22:05:21 -0700 Subject: Emoji characters for food allergens In-Reply-To: <29292306.26076.1437842589469.JavaMail.defaultUser@defaultHost> References: <29292306.26076.1437842589469.JavaMail.defaultUser@defaultHost> Message-ID: <CA+p4_H2HwubnE92C2Zoo7=_paj1VnZPR76Fs0AmZ+-OcsgJBiA@mail.gmail.com> On Sat, Jul 25, 2015 at 9:43 AM, William_J_G Overington <wjgo_10009 at btinternet.com> wrote: > Emoji characters for food allergens > > An interesting document entitled > > Preliminary proposal to add emoji characters for food allergens > > by Hiroyuki Komatsu > > was added into the UTC (Unicode Technical Committee) Document Register > yesterday. > > http://www.unicode.org/L2/L2015/15197-emoji-food-allergens.pdf > > This is a welcome development. I'm skeptical. I understand the rationale, but several of the proposed characters are essentially SMALL PILE OF BROWN DOTS and would be difficult to distinguish at typical sizes. From eliz at gnu.org Sun Jul 26 10:08:00 2015 From: eliz at gnu.org (Eli Zaretskii) Date: Sun, 26 Jul 2015 18:08:00 +0300 Subject: BidiMirrored property and ancient scripts In-Reply-To: <20150725221540.72e6ee48@JRWUBU2> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com> <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com> <55B2713E.4030006@ix.netcom.com> <20150724212352.101e5030@JRWUBU2> <83380c3dzh.fsf@gnu.org> <20150725084422.2b96491b@JRWUBU2> <83r3nw1xqg.fsf@gnu.org> <20150725101102.16dbf4ed@JRWUBU2> <83mvyk1s3u.fsf@gnu.org> <20150725143651.059e466a@JRWUBU2> <83fv4c1fg9.fsf@gnu.org> <20150725182726.533d4b78@JRWUBU2> <831tfw15ai.fsf@gnu.org> <20150725221540.72e6ee48@JRWUBU2> Message-ID: <83oaizyn1r.fsf@gnu.org> > Date: Sat, 25 Jul 2015 22:15:40 +0100 > From: Richard Wordingham <richard.wordingham at ntlworld.com> > > > > Mirroring is changing a glyph to suitable for reading in the other > > > direction. > > > > Sorry, I disagree. > > > > > Note the following extract from BidiMirroring.txt in the > > > Unicode Character Database: > > > > > > <quote> > > > # The following characters have no appropriate mirroring character. > > > # For these characters it is up to the rendering system > > > # to provide mirrored glyphs. > > > > How's that a contradiction to what I said? > > U+2140 DOUBLE-STRUCK N-ARY SUMMATION gets mirrored, but its glyph is > not replaced by any other character's glyph. Or are you claiming that > left-to-right U+2140 and right-to-left U+2140 are two different > characters? I'm saying that "providing a mirrored glyph" entails coming up with a character whose glyph can play that role, AFAIU. If you are saying that the "rendering system" here is the shaping engine using the rtlm OTF feature, then you are in fact saying that the mirroring of these characters cannot be implemented with most fonts in wide use today, and with most shaping engines. That would be a very strange claim, IMO, tantamount to saying that those characters cannot, or don't need to, be mirrored at all in most use cases. > > > > Thus, your reasons make no sense to me, because a character's > > > > shape, any character's shape, be it L, R, AL, or anything else, > > > > is immutable. > > > > > > So go back and reread. > > > > Did that; still no sense. > > Because you still seem not to understand the concept of mirroring. I think you will fare much better, and actually stand a chance of convincing you are right, if you assume your opponents do understand the issues, and just happen to disagree about their interpretation, or misinterpret what you write. > It isn't just for characters that have a Bidi_Mirroring_Glyph > property value other than <none>. Only "in specialized contexts", like "historic scripts and associated punctuation, private-use characters, and characters in mathematical expressions" (I believe the latter is only happening in Arabic context, if it ever does). IOW, in extremely rare and marginal use cases. And all that is only in HL6, which is really a fire escape meant for applications whose scope is beyond simple text. That's a far cry from boustrophedon, which was the trigger for most of this exchange. In all other cases: L4. A character is depicted by a mirrored glyph if and only if (a) the resolved directionality of that character is R, and (b) the Bidi_Mirrored property value of that character is Yes. That's normative and unequivocal. > > > > > However, one needs the UBA to sort out the rendering of the > > > > > parentheses in the Hebrew text. > > > > > > > Not really, you can short-cut it, the same as in strictly > > > > left-to-right text. > > > > > > It's the UBA that mandates that the opening and closing parentheses > > > be rendered like right and left parentheses respectively rather > > > than like left and right parentheses. > > > > Mirroring comes after layout in the UBA, as you pointed out, and the > > short-cuts I mentioned are about layout, not about mirroring. > > So irrelevant. No, not irrelevant. You can sort out rendering of parentheses in such text without applying the BPA, just by considering the parentheses as neutrals. That's one shortcut I alluded to. > I take it we now agree that the right shape for the parentheses for the > unidirectional right-to-left example is derived by the UBA. The mirroring is dictated by the UBA, yes. But that just delineates the difference between boustrophedon and bidirectional text, the latter being subject to the UBA, while the former isn't. > > > > > Indeed, one may rely on the bidi algorithm to declare the Latin > > > > > example unidirectional. > > > > > > > > One might, but to what purpose and goal? > > > > > > A right-to-left paragraph consisting of the two characters "(a" > > > would be bidirectional and have a parenthesis on the right; a > > > left-to-right paragraph with the same content would have a > > > parenthesis on the left. > > If there is no higher-level protocol in effect, the 'first strong > character' rule (Rules P2 and P3 of the UBA) declares that the > paragraph will be a left-to-right paragraph and will look > like "(a". Had it been declared a right-to-left paragraph by a > higher-level protocol, it would look like "a)". Thus the UBA has a > r?le even for unidirectional left-to-right text. Once the paragraph direction is overridden by a higher-level protocol, the text is no longer unidirectional. Such overriding is equivalent to enclosing the paragraph in RLE..PDF pair, which makes the text bidirectional by definition. From richard.wordingham at ntlworld.com Mon Jul 27 09:32:01 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Mon, 27 Jul 2015 15:32:01 +0100 Subject: BidiMirrored property and ancient scripts In-Reply-To: <83oaizyn1r.fsf@gnu.org> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com> <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com> <55B2713E.4030006@ix.netcom.com> <20150724212352.101e5030@JRWUBU2> <83380c3dzh.fsf@gnu.org> <20150725084422.2b96491b@JRWUBU2> <83r3nw1xqg.fsf@gnu.org> <20150725101102.16dbf4ed@JRWUBU2> <83mvyk1s3u.fsf@gnu.org> <20150725143651.059e466a@JRWUBU2> <83fv4c1fg9.fsf@gnu.org> <20150725182726.533d4b78@JRWUBU2> <831tfw15ai.fsf@gnu.org> <20150725221540.72e6ee48@JRWUBU2> <83oaizyn1r.fsf@gnu.org> Message-ID: <20150727153201.4f7cd9e1@JRWUBU2> On Sun, 26 Jul 2015 18:08:00 +0300 Eli Zaretskii <eliz at gnu.org> wrote: > > Date: Sat, 25 Jul 2015 22:15:40 +0100 > > From: Richard Wordingham <richard.wordingham at ntlworld.com> > > > > > > Mirroring is changing a glyph to suitable for reading in the > > > > other direction. > > > > > > Sorry, I disagree. > > > > > > > Note the following extract from BidiMirroring.txt in the > > > > Unicode Character Database: > > > > > > > > <quote> > > > > # The following characters have no appropriate mirroring > > > > character. # For these characters it is up to the rendering > > > > system # to provide mirrored glyphs. > > > > > > How's that a contradiction to what I said? > > > > U+2140 DOUBLE-STRUCK N-ARY SUMMATION gets mirrored, but its glyph is > > not replaced by any other character's glyph. Or are you claiming > > that left-to-right U+2140 and right-to-left U+2140 are two different > > characters? > > I'm saying that "providing a mirrored glyph" entails coming up with a > character whose glyph can play that role, AFAIU. I'll take that as 'No' - the left-to-right and right-to-left forms are the same character. (Unicode has no consistency in this matter.) > If you are saying that the "rendering system" here is the shaping > engine using the rtlm OTF feature, then you are in fact saying that > the mirroring of these characters cannot be implemented with most > fonts in wide use today, and with most shaping engines. That would be > a very strange claim, IMO, tantamount to saying that those characters > cannot, or don't need to, be mirrored at all in most use cases. OpenType can handle it - feature rtlm effectively provides a supplementary an RTL cmap, and ltrm an LTR cmap. It's conceivable that DirectWrite and Uniscribe don't support it, but that's unlikely. It looks as though the HarfBuzz implementation of OpenType also supports mirroring for right-to-left runs, but I can't find the code subsequent to tagging characters that weren't reversed using the Bidi_Mirroring_Glyph property. I have a similar lack of progress with finding the code for fractions, which also tags characters. Fractions using U+2044 are supported by HarfBuzz, for all that I can't find the code. I can't find any evidence of AAT support. The OpenType scheme for mirroring for right-to-left text is: 1) Apply Unicode 5.1 Bidi_Mirroring_Glyph property where applicable. 2) For other characters, apply the rtlm feature. This is intended to be applied character by character. 3) Apply the rtla feature to the resulting glyph sequence. Note that the font-writer is responsible for determining whether a character is to be mirrored at Step 2. Also note that there is no need for font support if all the Bidi mirrored characters it supports have the Bidi_Mirroring_Glyph property. There is similar logic for mirroring for left-to-right text, except that there is no Bidi_Mirroring_Glyph support from Unicode tables. The decision to mirror is entirely up to the font. Now, you may be right about font support being lacking, just as it is often lacking for U+2044 FRACTION SLASH. If you still don't believe me, please explain why U+222B INTEGRAL has Bidi_Mirrored=Yes but Bidi_Mirroring_Glyph=<none>. > > > > > Thus, your reasons make no sense to me, because a character's > > > > > shape, any character's shape, be it L, R, AL, or anything > > > > > else, is immutable. > > > > > > > > So go back and reread. > > > > > > Did that; still no sense. > > > > Because you still seem not to understand the concept of mirroring. > > I think you will fare much better, and actually stand a chance of > convincing you are right, if you assume your opponents do understand > the issues, and just happen to disagree about their interpretation, or > misinterpret what you write. You won't understand my reasoning unless you accept that Bidi mirroring can change a glyph's shape rather than substitute the glyph of another character. If you don't accept that, my argument will make no sense, because you don't accept the premisses. > > It isn't just for characters that have a Bidi_Mirroring_Glyph > > property value other than <none>. > > Only "in specialized contexts", like "historic scripts and associated > punctuation, private-use characters, and characters in mathematical > expressions" (I believe the latter is only happening in Arabic > context, if it ever does). IOW, in extremely rare and marginal use > cases. And all that is only in HL6, which is really a fire escape > meant for applications whose scope is beyond simple text. L4 calls for mandatory 'mirroring'. Note that mirroring is not exact mirroring. My interpretation works for both Arabic and Hebrew. The UBA Rule L4 calls for some mathematical symbols to take the form appropriate for a right-to-left context. (HL6 allows this set to be extended.) However, from what you say this form depends on the language. For example, the basic integral sign flips for Arabic maths, but from what you say, I think not for Hebrew maths. OpenType can make the mirrored shaped dependent on the language of the text. > That's a > far cry from boustrophedon, which was the trigger for most of this > exchange. In all other cases: > > L4. A character is depicted by a mirrored glyph if and only if (a) > the resolved directionality of that character is R, and (b) the > Bidi_Mirrored property value of that character is Yes. > > That's normative and unequivocal. And therefore applies to U+222B INTEGRAL. Formally, HL6 is irrelevant for this character. Now, you might wish for HL6 to be modified to allow it not to be mirrored, but I think we can stretch the definition of mirroring to handle it. UBA Section 7 "Mirroring" says: "Implementing rule L4 calls for mirrored glyphs. These glyphs may not be exact graphical mirror images. For example, clearly an italic parenthesis is not an exact mirror image of another? ?(? is not the mirror image of ?)?. Instead, mirror glyphs are those acceptable as mirrors within the normal parameters of the font in which they are represented." This opens up the possibility of the degree of mirroring depending on the language being supported. > > > > > > However, one needs the UBA to sort out the rendering of the > > > > > > parentheses in the Hebrew text. > > > > > > > > > Not really, you can short-cut it, the same as in strictly > > > > > left-to-right text. > > > > > > > > It's the UBA that mandates that the opening and closing > > > > parentheses be rendered like right and left parentheses > > > > respectively rather than like left and right parentheses. > > > > > > Mirroring comes after layout in the UBA, as you pointed out, and > > > the short-cuts I mentioned are about layout, not about mirroring. > > > > So irrelevant. > > No, not irrelevant. You can sort out rendering of parentheses in such > text without applying the BPA, just by considering the parentheses as > neutrals. That's one shortcut I alluded to. > > I take it we now agree that the right shape for the parentheses for > > the unidirectional right-to-left example is derived by the UBA. > The mirroring is dictated by the UBA, yes. Which was my point - the UBA applies to unidirectional text. > But that just delineates > the difference between boustrophedon and bidirectional text, the > latter being subject to the UBA, while the former isn't. I didn't say boustrophedon text was subject to the UBA. I said a boustrophedon renderer may modify the text to be rendered so that the UBA will layout the text properly. This modification is heavily dependent on line length. Ideally one would lay it out line-by-line. > > > > > > Indeed, one may rely on the bidi algorithm to declare the > > > > > > Latin example unidirectional. > > > > > > > > > > One might, but to what purpose and goal? > > > > > > > > A right-to-left paragraph consisting of the two characters "(a" > > > > would be bidirectional and have a parenthesis on the right; a > > > > left-to-right paragraph with the same content would have a > > > > parenthesis on the left. > > > > If there is no higher-level protocol in effect, the 'first strong > > character' rule (Rules P2 and P3 of the UBA) declares that the > > paragraph will be a left-to-right paragraph and will look > > like "(a". Had it been declared a right-to-left paragraph by a > > higher-level protocol, it would look like "a)". Thus the UBA has a > > r?le even for unidirectional left-to-right text. > Once the paragraph direction is overridden by a higher-level protocol, > the text is no longer unidirectional. Such overriding is equivalent > to enclosing the paragraph in RLE..PDF pair, which makes the text > bidirectional by definition. And if it isn't overridden, it is the UBA which makes it unidirectional. The UBA specifies the appearance of an opening parenthesis. Richard. From eliz at gnu.org Mon Jul 27 10:18:09 2015 From: eliz at gnu.org (Eli Zaretskii) Date: Mon, 27 Jul 2015 18:18:09 +0300 Subject: BidiMirrored property and ancient scripts In-Reply-To: <20150727153201.4f7cd9e1@JRWUBU2> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com> <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com> <55B2713E.4030006@ix.netcom.com> <20150724212352.101e5030@JRWUBU2> <83380c3dzh.fsf@gnu.org> <20150725084422.2b96491b@JRWUBU2> <83r3nw1xqg.fsf@gnu.org> <20150725101102.16dbf4ed@JRWUBU2> <83mvyk1s3u.fsf@gnu.org> <20150725143651.059e466a@JRWUBU2> <83fv4c1fg9.fsf@gnu.org> <20150725182726.533d4b78@JRWUBU2> <831tfw15ai.fsf@gnu.org> <20150725221540.72e6ee48@JRWUBU2> <83oaizyn1r.fsf@gnu.org> <20150727153201.4f7cd9e1@JRWUBU2> Message-ID: <83bnexzl1q.fsf@gnu.org> I no longer see where this is going. If there's still some goal, something you think we should agree or discuss, perhaps you could spell that out. Otherwise, I think it' time to quit. Some random comments: > Date: Mon, 27 Jul 2015 15:32:01 +0100 > From: Richard Wordingham <richard.wordingham at ntlworld.com> > Cc: unicode at unicode.org > > > U+2140 DOUBLE-STRUCK N-ARY SUMMATION gets mirrored, but its glyph is > > > not replaced by any other character's glyph. Or are you claiming > > > that left-to-right U+2140 and right-to-left U+2140 are two different > > > characters? > > > > I'm saying that "providing a mirrored glyph" entails coming up with a > > character whose glyph can play that role, AFAIU. > > I'll take that as 'No' - the left-to-right and right-to-left forms are > the same character. (Unicode has no consistency in this matter.) I don't know what is meant by "left-to-right and right-to-left forms" here. To me, a character has only one form. > > If you are saying that the "rendering system" here is the shaping > > engine using the rtlm OTF feature, then you are in fact saying that > > the mirroring of these characters cannot be implemented with most > > fonts in wide use today, and with most shaping engines. That would be > > a very strange claim, IMO, tantamount to saying that those characters > > cannot, or don't need to, be mirrored at all in most use cases. > > OpenType can handle it - feature rtlm effectively provides a > supplementary an RTL cmap, and ltrm an LTR cmap. It's conceivable that > DirectWrite and Uniscribe don't support it, but that's unlikely. Most popular fonts don't, so this support is basically useless, if it turns out to be a must. > The decision to mirror is entirely up to the font. Not at all. A display engine can make those decisions on its own, even if it consults the fonts while making those decisions. > If you still don't believe me, please explain why U+222B INTEGRAL has > Bidi_Mirrored=Yes but Bidi_Mirroring_Glyph=<none>. The explanation is in the file: there's no glyph for that. > > > Because you still seem not to understand the concept of mirroring. > > > > I think you will fare much better, and actually stand a chance of > > convincing you are right, if you assume your opponents do understand > > the issues, and just happen to disagree about their interpretation, or > > misinterpret what you write. > > You won't understand my reasoning unless you accept that Bidi mirroring > can change a glyph's shape rather than substitute the glyph of another > character. Try to convince me in that. > L4 calls for mandatory 'mirroring'. Note that mirroring is not exact > mirroring. My interpretation works for both Arabic and Hebrew. The > UBA Rule L4 calls for some mathematical symbols to take the form > appropriate for a right-to-left context. (HL6 allows this set > to be extended.) However, from what you say this form depends on the > language. For example, the basic integral sign flips for Arabic maths, > but from what you say, I think not for Hebrew maths. Hebrew always typesets math left to right, so no mirroring of math symbols, including U+222B INTEGRAL, is ever necessary. > OpenType can make the mirrored shaped dependent on the language of > the text. The language of the text is not always well defined, alas. > > L4. A character is depicted by a mirrored glyph if and only if (a) > > the resolved directionality of that character is R, and (b) the > > Bidi_Mirrored property value of that character is Yes. > > > > That's normative and unequivocal. > > And therefore applies to U+222B INTEGRAL. Yes, but since there's no glyph, it's a non-issue. > UBA Section 7 "Mirroring" says: > > "Implementing rule L4 calls for mirrored glyphs. These glyphs may not be > exact graphical mirror images. For example, clearly an italic > parenthesis is not an exact mirror image of another? ?(? is not the > mirror image of ?)?. Instead, mirror glyphs are those acceptable as > mirrors within the normal parameters of the font in which they are > represented." > > This opens up the possibility of the degree of mirroring depending on > the language being supported. My reading of that is that there's some freedom in choosing the shape of the mirrored glyph, but the degree of mirroring is non-negotiable. > > But that just delineates > > the difference between boustrophedon and bidirectional text, the > > latter being subject to the UBA, while the former isn't. > > I didn't say boustrophedon text was subject to the UBA. I said a > boustrophedon renderer may modify the text to be rendered so that the > UBA will layout the text properly. Given the directional overrides, this is a trivium, I think. > > > If there is no higher-level protocol in effect, the 'first strong > > > character' rule (Rules P2 and P3 of the UBA) declares that the > > > paragraph will be a left-to-right paragraph and will look > > > like "(a". Had it been declared a right-to-left paragraph by a > > > higher-level protocol, it would look like "a)". Thus the UBA has a > > > r?le even for unidirectional left-to-right text. > > > Once the paragraph direction is overridden by a higher-level protocol, > > the text is no longer unidirectional. Such overriding is equivalent > > to enclosing the paragraph in RLE..PDF pair, which makes the text > > bidirectional by definition. > > And if it isn't overridden, it is the UBA which makes it > unidirectional. No, it doesn't. > The UBA specifies the appearance of an opening parenthesis. That's bidirectional, not unidirectional. From charupdate at orange.fr Mon Jul 27 12:30:25 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Mon, 27 Jul 2015 19:30:25 +0200 (CEST) Subject: Emoji characters for food allergens Message-ID: <2128502277.13162.1438018225882.JavaMail.www@wwinf2215> On 26 Jul 2015, at 05:45, William_J_G Overington wrote: > Emoji characters for food allergens > An interesting document entitled > Preliminary proposal to add emoji characters for food allergens > by Hiroyuki Komatsu > was added into the UTC (Unicode Technical Committee) Document Register yesterday. > http://www.unicode.org/L2/L2015/15197-emoji-food-allergens.pdf > This is a welcome development. > I suggest that, in view of the importance of precision in conveying information about food allergens, that the emoji characters for food allergens should be separate characters from other emoji characters. That is, encoded in a separate quite distinct block of code points far away in the character map from other emoji characters, with no dual meanings for any of the characters: a character for a food allergen should be quite separate and distinct from a character for any other meaning. > I opine that having two separate meanings for the same character, one meaning as an everyday jolly good fun meaning in a text message and one meaning as a specialist food allergen meaning could be a source of confusion. Far better to encode a separate code block with separate characters right from the start than risk needless and perhaps medically dangerous confusion in the future. > I suggest that for each allergen that there be two characters. > The glyph for the first character of the pair goes from baseline to ascender. > The glyph for the second character of the pair is a copy of the glyph for the first character of the pair augmented with a thick red line from lower left descender to higher right a little above the base line, the thick red line perhaps being at about thirty degrees from the horizontal. Thus the thick red line would go over the allergen part of the glyph yet just by clipping it a bit so that clarity is maintained. > The glyphs are thus for the presence of the allergen and the absence of the allergen respectively. > It is typical in the United Kingdom to label food packets not only with an ingredients list but also with a list of allergens in the food and also with a list of allergens not in the food. > For example, a particular food may contain soya yet not gluten. > Thus I opine that two characters are needed for each allergen. > I have deliberately avoided a total strike through at forty-five degrees as I opine that that could lead to problems distinguishing clearly the glyph for the absence of one allergen from the glyph for the absence of another allergen. > I have also wondered whether each glyph for an allergen should include within its glyph a number, maybe a three-digit number, so that clarity is precise. I'm not sure whether another code would facilitate the handling of these warnings. IMHO the allergen name in natural language is more efficient in communication. This needs however to identify and learn the words prior to travelling into a foreign language country, while a code point is more obvious to read if it's meaning is at hand. > I opine that two separate characters for each allergen is desirable rather than some solution such as having one character for each allergen and a combining strike through character. This is consistent with the Unicode policy of not decomposing overlay diacritics in writing characters. Symbols however are intended for use with combining marks for symbols, like 20E0 COMBINING ENCLOSING CIRCLE BACKSLASH. We hope that the food allergens issue's importance make implement an efficient system of language-independent labelling. > The two separate characters approach keeps the system straightforward to use with many software packages. The matter of expressing food allergens is far too important to become entangled in problems for everyday users. > For gluten, it might be necessary to have three distinct code points. > In the United Kingdom there is a legal difference between "gluten-free" and "no gluten-containing ingredients". > To be labelled gluten-free the product must have been tested. This is to ensure that there has been no cross-contamination of ingredients. For example, rice has no gluten, but was a particular load of rice transported in a lorry used for wheat on other days? > Yet testing is not always possible in a restaurant situation. All the best, Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150727/235e098b/attachment.html> From charupdate at orange.fr Mon Jul 27 12:45:39 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Mon, 27 Jul 2015 19:45:39 +0200 (CEST) Subject: Emoji characters for food allergens In-Reply-To: <CA+p4_H2HwubnE92C2Zoo7=_paj1VnZPR76Fs0AmZ+-OcsgJBiA@mail.gmail.com> References: <29292306.26076.1437842589469.JavaMail.defaultUser@defaultHost> <CA+p4_H2HwubnE92C2Zoo7=_paj1VnZPR76Fs0AmZ+-OcsgJBiA@mail.gmail.com> Message-ID: <942647388.13410.1438019139444.JavaMail.www@wwinf2215> On 26 Jul 2015 at 07:14, Garth Wallace wrote: > On Sat, Jul 25, 2015 at 9:43 AM, William_J_G Overington > wrote: > > Emoji characters for food allergens > > > > An interesting document entitled > > > > Preliminary proposal to add emoji characters for food allergens > > > > by Hiroyuki Komatsu > > > > was added into the UTC (Unicode Technical Committee) Document Register > > yesterday. > > > > http://www.unicode.org/L2/L2015/15197-emoji-food-allergens.pdf > > > > This is a welcome development. > > I'm skeptical. I understand the rationale, but several of the proposed > characters are essentially SMALL PILE OF BROWN DOTS and would be > difficult to distinguish at typical sizes. Only two, buckwheat and sesame. As disclaimed, none is final. For buckwheat we can opt for an ear of buckwheat rather than an amount of grains. Typically the form of the buckwheat grain could be used, as it's resemblance with a beechnut lead to its German name "Buchweizen". But scaling a single grain to almost 1:1 might become hard to understand at a glyphic level. Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150727/92929047/attachment.html> From richard.wordingham at ntlworld.com Mon Jul 27 13:16:40 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Mon, 27 Jul 2015 19:16:40 +0100 Subject: BidiMirrored property and ancient scripts In-Reply-To: <83bnexzl1q.fsf@gnu.org> References: <20150721093317.665a7a7059d7ee80bb4d670165c8327d.360e33cb5e.wbe@email03.secureserver.net> <1481805038.2684.1437548438103.JavaMail.www@wwinf1f21> <20150722085240.00f61ba2@JRWUBU2> <1458488239.6582.1437560492379.JavaMail.www@wwinf1d31> <20150722235402.7770e30a@JRWUBU2> <55B0BB26.5080601@gmail.com> <20150723194250.1cc05710@JRWUBU2> <55B20C7B.5020000@gmail.com> <55B2713E.4030006@ix.netcom.com> <20150724212352.101e5030@JRWUBU2> <83380c3dzh.fsf@gnu.org> <20150725084422.2b96491b@JRWUBU2> <83r3nw1xqg.fsf@gnu.org> <20150725101102.16dbf4ed@JRWUBU2> <83mvyk1s3u.fsf@gnu.org> <20150725143651.059e466a@JRWUBU2> <83fv4c1fg9.fsf@gnu.org> <20150725182726.533d4b78@JRWUBU2> <831tfw15ai.fsf@gnu.org> <20150725221540.72e6ee48@JRWUBU2> <83oaizyn1r.fsf@gnu.org> <20150727153201.4f7cd9e1@JRWUBU2> <83bnexzl1q.fsf@gnu.org> Message-ID: <20150727191640.1257b5f1@JRWUBU2> On Mon, 27 Jul 2015 18:18:09 +0300 Eli Zaretskii <eliz at gnu.org> wrote: > I no longer see where this is going. If there's still some goal, > something you think we should agree or discuss, perhaps you could > spell that out. Otherwise, I think it' time to quit. It's basically to establish that for UBA-compliant bidirectional support of some characters, a font must have both a left-to-right and a right-to-left glyph for the character. > Some random comments: > > > Date: Mon, 27 Jul 2015 15:32:01 +0100 > > From: Richard Wordingham <richard.wordingham at ntlworld.com> > > Cc: unicode at unicode.org > > > > > U+2140 DOUBLE-STRUCK N-ARY SUMMATION gets mirrored, but its > > > > glyph is not replaced by any other character's glyph. Or are > > > > you claiming that left-to-right U+2140 and right-to-left U+2140 > > > > are two different characters? > > > > > > I'm saying that "providing a mirrored glyph" entails coming up > > > with a character whose glyph can play that role, AFAIU. > > > > I'll take that as 'No' - the left-to-right and right-to-left forms > > are the same character. (Unicode has no consistency in this > > matter.) > > I don't know what is meant by "left-to-right and right-to-left forms" > here. To me, a character has only one form. I trust you've just forgotten that that's not true. Soft-dotted characters like 'i' and 'j' lose their dot when a mark above (ccc=230) is attached, e.g. <U+0069 LATIN SMALL LETTER I, U+1DC4 COMBINING MACRON-ACUTE>. Indic scripts have some more spectacular variations. In a font that supports both left-to-right and Arabic right-to-left maths, U+222B INTEGRAL will have at least two forms, one for left-to-right and one for right-to-left. > > > If you are saying that the "rendering system" here is the shaping > > > engine using the rtlm OTF feature, then you are in fact saying > > > that the mirroring of these characters cannot be implemented with > > > most fonts in wide use today, and with most shaping engines. > > > That would be a very strange claim, IMO, tantamount to saying > > > that those characters cannot, or don't need to, be mirrored at > > > all in most use cases. Is this an expression of disbelief, or a lament that the UBA demands too much? If it's a lament, I believe I've made my point. > > OpenType can handle it - feature rtlm effectively provides a > > supplementary an RTL cmap, and ltrm an LTR cmap. It's conceivable > > that DirectWrite and Uniscribe don't support it, but that's > > unlikely. > > Most popular fonts don't, so this support is basically useless, if it > turns out to be a must. No, it's a 'shall'. One won't be arrested for not doing it. > > The decision to mirror is entirely up to the font. > > Not at all. A display engine can make those decisions on its own, > even if it consults the fonts while making those decisions. If application of the rtlm and rtla features do not change the glyph used for U+222B INTEGRAL, then the font has refused to mirror the character. Now it is possible, in this circumstance, that the rendering enging might synthesise a reflected glyph. The font could then deceive the rendering engine by substituting an identical glyph. > > If you still don't believe me, please explain why U+222B INTEGRAL > > has Bidi_Mirrored=Yes but Bidi_Mirroring_Glyph=<none>. > > The explanation is in the file: there's no glyph for that. You mean, I hope, that there's no other character with the glyph for that r?le. > > I didn't say boustrophedon text was subject to the UBA. I said a > > boustrophedon renderer may modify the text to be rendered so that > > the UBA will layout the text properly. > > Given the directional overrides, this is a trivium, I think. Yes. I couldn't see why you were making such a fuss about it. > > The UBA specifies the appearance of an opening parenthesis. > > That's bidirectional, not unidirectional There may not be any more point in arguing about what is unidirectional and what is bidirectional. Richard. From gwalla at gmail.com Mon Jul 27 13:49:47 2015 From: gwalla at gmail.com (Garth Wallace) Date: Mon, 27 Jul 2015 11:49:47 -0700 Subject: Olympic sports emoji Message-ID: <CA+p4_H0UnA9-teCtZ-Sak1-Fa3j1MiJyVD4D1MLG53tWLScZcw@mail.gmail.com> I read this proposal and was a little confused. Why aren't they proposing the actual sports pictograms that are in use for international events like the Olympics? Those are generally stylized human figures shown engaging in sports, but the suggested symbols in this proposal seem to mostly be pictures of sports equipment. It seems like reinventing the wheel. Are the Olympic-style pictograms not felt to be sufficiently emoji-like? Singling out modern pentathlon, water polo, and team handball to be encoded as ZWJ sequences instead of atomic characters also seems arbitrary. Why would PERSON WITH BALL plus GOAL NET specifically imply team handball? It seems like that combination covers a lot of sports. Why should modern pentathlon require nine characters for a single symbol? From doug at ewellic.org Mon Jul 27 15:10:06 2015 From: doug at ewellic.org (Doug Ewell) Date: Mon, 27 Jul 2015 13:10:06 -0700 Subject: Olympic sports emoji Message-ID: <20150727131006.665a7a7059d7ee80bb4d670165c8327d.83211e9477.wbe@email03.secureserver.net> Garth Wallace <gwalla at gmail dot com> wrote: > I read this proposal [L2/15-196R] and was a little confused. Why > aren't they proposing the actual sports pictograms that are in use for > international events like the Olympics? Those are generally stylized > human figures shown engaging in sports, but the suggested symbols in > this proposal seem to mostly be pictures of sports equipment. It seems > like reinventing the wheel. Are the Olympic-style pictograms not felt > to be sufficiently emoji-like? The official Summer Olympics pictograms change each time the Games are held: http://www.olympic.org/Assets/OSC%20Section/pdf/QR_sports_pictograms_of_the_olympic_summer_games_1964_2016.pdf Although the symbols introduced for the 1972 Munich Games were particularly influential and are often thought to be canonical, these symbols have been styled quite differently since 1992. Additionally, the images are copyrighted, for the most part by the International Olympic Committee (see page 2 of the PDF document). -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From leob at mailcom.com Mon Jul 27 15:40:04 2015 From: leob at mailcom.com (Leo Broukhis) Date: Mon, 27 Jul 2015 13:40:04 -0700 Subject: Olympic sports emoji In-Reply-To: <20150727131006.665a7a7059d7ee80bb4d670165c8327d.83211e9477.wbe@email03.secureserver.net> References: <20150727131006.665a7a7059d7ee80bb4d670165c8327d.83211e9477.wbe@email03.secureserver.net> Message-ID: <CAFmvRsfW-b4auFvORCTQ4QGCtBEzA9CR8xGJDyvkfgJ29Bw=mA@mail.gmail.com> Fonts vary and can be copyrighted, no doubt, but Unicode is not about fonts. Leo On Mon, Jul 27, 2015 at 1:10 PM, Doug Ewell <doug at ewellic.org> wrote: > Garth Wallace <gwalla at gmail dot com> wrote: > >> I read this proposal [L2/15-196R] and was a little confused. Why >> aren't they proposing the actual sports pictograms that are in use for >> international events like the Olympics? Those are generally stylized >> human figures shown engaging in sports, but the suggested symbols in >> this proposal seem to mostly be pictures of sports equipment. It seems >> like reinventing the wheel. Are the Olympic-style pictograms not felt >> to be sufficiently emoji-like? > > The official Summer Olympics pictograms change each time the Games are > held: > > http://www.olympic.org/Assets/OSC%20Section/pdf/QR_sports_pictograms_of_the_olympic_summer_games_1964_2016.pdf > > Although the symbols introduced for the 1972 Munich Games were > particularly influential and are often thought to be canonical, these > symbols have been styled quite differently since 1992. > > Additionally, the images are copyrighted, for the most part by the > International Olympic Committee (see page 2 of the PDF document). > > -- > Doug Ewell | http://ewellic.org | Thornton, CO ???? > > From gwalla at gmail.com Mon Jul 27 15:41:11 2015 From: gwalla at gmail.com (Garth Wallace) Date: Mon, 27 Jul 2015 13:41:11 -0700 Subject: Olympic sports emoji In-Reply-To: <20150727131006.665a7a7059d7ee80bb4d670165c8327d.83211e9477.wbe@email03.secureserver.net> References: <20150727131006.665a7a7059d7ee80bb4d670165c8327d.83211e9477.wbe@email03.secureserver.net> Message-ID: <CA+p4_H2umcUvZ91hzJ66bUhGbebkDOpYgWPS8MwObZGYTLw0Bg@mail.gmail.com> On Mon, Jul 27, 2015 at 1:10 PM, Doug Ewell <doug at ewellic.org> wrote: > Garth Wallace <gwalla at gmail dot com> wrote: > >> I read this proposal [L2/15-196R] and was a little confused. Why >> aren't they proposing the actual sports pictograms that are in use for >> international events like the Olympics? Those are generally stylized >> human figures shown engaging in sports, but the suggested symbols in >> this proposal seem to mostly be pictures of sports equipment. It seems >> like reinventing the wheel. Are the Olympic-style pictograms not felt >> to be sufficiently emoji-like? > > The official Summer Olympics pictograms change each time the Games are > held: > > http://www.olympic.org/Assets/OSC%20Section/pdf/QR_sports_pictograms_of_the_olympic_summer_games_1964_2016.pdf > > Although the symbols introduced for the 1972 Munich Games were > particularly influential and are often thought to be canonical, these > symbols have been styled quite differently since 1992. > > Additionally, the images are copyrighted, for the most part by the > International Olympic Committee (see page 2 of the PDF document). The style of them changes with each Games, but the identities do not. To my mind, this is equivalent to the glyph/character distinction. The individual Olympiad-specific images are copyrighted but not even the IOC can copyright the idea of "stick figure playing hockey, used to symbolize the sport of ice hockey". The UCS even includes a few of them already: U+26F7 SKIER is the symbol for Alpine skiing. From doug at ewellic.org Mon Jul 27 17:12:00 2015 From: doug at ewellic.org (Doug Ewell) Date: Mon, 27 Jul 2015 15:12:00 -0700 Subject: Olympic sports emoji Message-ID: <20150727151200.665a7a7059d7ee80bb4d670165c8327d.8d64c4d981.wbe@email03.secureserver.net> Leo Broukhis <leob at mailcom dot com> wrote: > Fonts vary and can be copyrighted, no doubt, but Unicode is not about > fonts. I was going to bust out the Apple logo as an analogy to the Olympic symbols, but apparently the Apple logo is trademarked and not merely copyrighted, so never mind. In any case, if this is just a character/glyph thing, then there shouldn't be a problem using either the existing emoji or the ones proposed in L2/15-196R for Olympic sports, since the glyphs can simply be styled as needed. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From gwalla at gmail.com Mon Jul 27 18:46:52 2015 From: gwalla at gmail.com (Garth Wallace) Date: Mon, 27 Jul 2015 16:46:52 -0700 Subject: Hentaigana and the Kana Supplement block Message-ID: <CA+p4_H1GmoXt=ZaFPNr-hKpXKjqWi8TEvyhcSwe+4xH1Y4y_Zg@mail.gmail.com> The recent hentaigana proposal requests that they be encoded as Standardized Variation Sequences of hiragana. This seems like a good idea, since fallback in the absence of font support would be to the standard hiragana, so the results would still be readable. But where does that leave the Kana Supplement block? That block contains only two encoded characters, but was allocated 256 code points, presumably for the future encoding of hentaigana. With hentaigana handled by SVSes, it seems unlikely that many of those points would ever get filled. I realize there's no shortage of code points in the UCS, but still. One thing I noticed: the hentaigana proposal contains a duplicate of an existing character. MJ090014 (? variant with mother ideograph ?) looks like it's already encoded in the Kana Supplement block as U+1B001 HIRAGANA LETTER ARCHAIC YE. From markus.icu at gmail.com Mon Jul 27 18:59:44 2015 From: markus.icu at gmail.com (Markus Scherer) Date: Mon, 27 Jul 2015 16:59:44 -0700 Subject: Hentaigana and the Kana Supplement block In-Reply-To: <CA+p4_H1GmoXt=ZaFPNr-hKpXKjqWi8TEvyhcSwe+4xH1Y4y_Zg@mail.gmail.com> References: <CA+p4_H1GmoXt=ZaFPNr-hKpXKjqWi8TEvyhcSwe+4xH1Y4y_Zg@mail.gmail.com> Message-ID: <CAN49p6pKe2NTeu0iFZMnd94HSNUCS3Eu-xR0=iXAZdk8=Cbf3w@mail.gmail.com> On Mon, Jul 27, 2015 at 4:46 PM, Garth Wallace <gwalla at gmail.com> wrote: > where > does that leave the Kana Supplement block? That block contains only > two encoded characters, but was allocated 256 code points, presumably > for the future encoding of hentaigana. With hentaigana handled by > SVSes, it seems unlikely that many of those points would ever get > filled. I realize there's no shortage of code points in the UCS, but > still. > I don't think the committee fills blocks with characters just because there is space and some glyphs are related :-) One thing I noticed: the hentaigana proposal contains a duplicate of > an existing character. MJ090014 (? variant with mother ideograph ?) > looks like it's already encoded in the Kana Supplement block as > U+1B001 HIRAGANA LETTER ARCHAIC YE. > Please submit this via http://www.unicode.org/reporting.html Best regards, markus -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150727/c99e29ba/attachment.html> From gwalla at gmail.com Tue Jul 28 00:55:09 2015 From: gwalla at gmail.com (Garth Wallace) Date: Mon, 27 Jul 2015 22:55:09 -0700 Subject: Fwd: Olympic sports emoji In-Reply-To: <CA+p4_H2=tw5J_TQDAr-irvnJ0V-8Cy_Q=Hp4j928kfmTufC1=g@mail.gmail.com> References: <20150727151200.665a7a7059d7ee80bb4d670165c8327d.8d64c4d981.wbe@email03.secureserver.net> <CA+p4_H2=tw5J_TQDAr-irvnJ0V-8Cy_Q=Hp4j928kfmTufC1=g@mail.gmail.com> Message-ID: <CA+p4_H1p2LPbm7-CZjFxTFgx6STFTv2gQ425=drMQiB0H1kJig@mail.gmail.com> (sorry, meant to send this to the list) On Mon, Jul 27, 2015 at 3:12 PM, Doug Ewell <doug at ewellic.org> wrote: > Leo Broukhis <leob at mailcom dot com> wrote: > >> Fonts vary and can be copyrighted, no doubt, but Unicode is not about >> fonts. > > I was going to bust out the Apple logo as an analogy to the Olympic > symbols, but apparently the Apple logo is trademarked and not merely > copyrighted, so never mind. > > In any case, if this is just a character/glyph thing, then there > shouldn't be a problem using either the existing emoji or the ones > proposed in L2/15-196R for Olympic sports, since the glyphs can simply > be styled as needed. Would this be considered within the normal range of glyphic variation? Would an icon of two pugilists fighting be an acceptable rendering of a BOXING GLOVE emoji? BTW, speaking as a martial artist myself, I have to say an empty dogi is an odd representation for martial arts, even specifically Japanese ones. The proposal says that it could be used for judo, karate, and tae kwon do; it at least matches the first two (they are distinct, but not in a way that would , and practice uniforms for TKD are similar, but competitive TKD under WTF rules (including Olympic competition) uses several pieces of protective equipment (helmet, gloves, chest guard) with colored padding over the dobok. From gwalla at gmail.com Tue Jul 28 01:11:36 2015 From: gwalla at gmail.com (Garth Wallace) Date: Mon, 27 Jul 2015 23:11:36 -0700 Subject: Hentaigana and the Kana Supplement block In-Reply-To: <CAN49p6pKe2NTeu0iFZMnd94HSNUCS3Eu-xR0=iXAZdk8=Cbf3w@mail.gmail.com> References: <CA+p4_H1GmoXt=ZaFPNr-hKpXKjqWi8TEvyhcSwe+4xH1Y4y_Zg@mail.gmail.com> <CAN49p6pKe2NTeu0iFZMnd94HSNUCS3Eu-xR0=iXAZdk8=Cbf3w@mail.gmail.com> Message-ID: <CA+p4_H2o050Qr3sqmhu1GDfAuuzXc022MMtYdpqOLEyiw3po9A@mail.gmail.com> On Mon, Jul 27, 2015 at 4:59 PM, Markus Scherer <markus.icu at gmail.com> wrote: > On Mon, Jul 27, 2015 at 4:46 PM, Garth Wallace <gwalla at gmail.com> wrote: >> >> where >> does that leave the Kana Supplement block? That block contains only >> two encoded characters, but was allocated 256 code points, presumably >> for the future encoding of hentaigana. With hentaigana handled by >> SVSes, it seems unlikely that many of those points would ever get >> filled. I realize there's no shortage of code points in the UCS, but >> still. > > > I don't think the committee fills blocks with characters just because there > is space and some glyphs are related :-) Yes, but it looked like that was the intent. I'm not saying the hentaigana should be encoded as atomic characters in that block just because there is space; I think the SVS approach sounds like the right one (though I'm hardly an expert on hentaigana). I'm just wondering what's to be done with all of those code points if they won't be used for hentaigana, since it seems unlikely that there would be many other kana that couldn't be handled by existing characters or the proposed SVSes. Is it possible for a block to be later renamed as something more general to allow for some non-kana, or even to carve out some of the empty columns for a new block? Or does the stability policy apply to block allocations? From everson at evertype.com Tue Jul 28 08:00:03 2015 From: everson at evertype.com (Michael Everson) Date: Tue, 28 Jul 2015 14:00:03 +0100 Subject: Emoji characters for food allergens In-Reply-To: <CA+p4_H2HwubnE92C2Zoo7=_paj1VnZPR76Fs0AmZ+-OcsgJBiA@mail.gmail.com> References: <29292306.26076.1437842589469.JavaMail.defaultUser@defaultHost> <CA+p4_H2HwubnE92C2Zoo7=_paj1VnZPR76Fs0AmZ+-OcsgJBiA@mail.gmail.com> Message-ID: <474B9A36-2A8C-449A-8019-60B373459914@evertype.com> I do NOT understand the rationale. Emojis are not for labelling things. They?re for the playful expression of emotions. Standardized symbols for allergens might be useful, if there were a textual use for them. > On 26 Jul 2015, at 06:05, Garth Wallace <gwalla at gmail.com> wrote: > > On Sat, Jul 25, 2015 at 9:43 AM, William_J_G Overington > <wjgo_10009 at btinternet.com> wrote: >> Emoji characters for food allergens >> >> An interesting document entitled >> >> Preliminary proposal to add emoji characters for food allergens >> >> by Hiroyuki Komatsu >> >> was added into the UTC (Unicode Technical Committee) Document Register >> yesterday. >> >> http://www.unicode.org/L2/L2015/15197-emoji-food-allergens.pdf >> >> This is a welcome development. > > I'm skeptical. I understand the rationale, but several of the proposed > characters are essentially SMALL PILE OF BROWN DOTS and would be > difficult to distinguish at typical sizes. Michael Everson * http://www.evertype.com/ From wjgo_10009 at btinternet.com Tue Jul 28 05:19:21 2015 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Tue, 28 Jul 2015 11:19:21 +0100 (BST) Subject: Emoji characters for food allergens In-Reply-To: <2128502277.13162.1438018225882.JavaMail.www@wwinf2215> References: <2128502277.13162.1438018225882.JavaMail.www@wwinf2215> Message-ID: <15994373.18168.1438078761970.JavaMail.defaultUser@defaultHost> Hi Marcel >> I have also wondered whether each glyph for an allergen should include within its glyph a number, maybe a three-digit number, so that clarity is precise. > I'm not sure whether another code would facilitate the handling of these warnings. IMHO the allergen name in natural language is more efficient in communication. This needs however to identify and learn the words prior to travelling into a foreign language country, while a code point is more obvious to read if it's meaning is at hand. Well a lot could be done information technology-wise to facilitate communication through the language barrier. For example in text messages, sent by email, or over a mobile telephone link or maybe thrown to a device nearby, to communicate dietary needs, using the emoji characters for food allergens that we are discussing in this thread: this information could then be localized into text automatically in the receiving device; For example, by using a smartphone by reading from an RFID tag (radio-frequency identification tag) on a shelf label in a supermarket display about a product . The RFID tag could contain the food allergen information about the food encoded using the emoji characters for food allergens that we are discussing in this thread: this information could then be localized into text automatically in the smartphone. Rest regards, William Overington 28 July 2015 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150728/1c459fd4/attachment.html> From rscook at wenlin.com Tue Jul 28 08:34:54 2015 From: rscook at wenlin.com (Richard Cook) Date: Tue, 28 Jul 2015 06:34:54 -0700 Subject: Emoji characters for food allergens In-Reply-To: <474B9A36-2A8C-449A-8019-60B373459914@evertype.com> References: <29292306.26076.1437842589469.JavaMail.defaultUser@defaultHost> <CA+p4_H2HwubnE92C2Zoo7=_paj1VnZPR76Fs0AmZ+-OcsgJBiA@mail.gmail.com> <474B9A36-2A8C-449A-8019-60B373459914@evertype.com> Message-ID: <4F3ECCC5-34D0-4D9F-9FC9-08926EB10885@wenlin.com> On Jul 28, 2015, at 6:00 AM, Michael Everson <everson at evertype.com> allegedly wrote: > > Emojis are not for labelling things. They?re for the playful expression of emotions. Is that what they're for? I thought they were (encoded) to satisfy certain device manufacturers. And, what is the emotion playfully expressed by ???? ? From eric.muller at efele.net Tue Jul 28 09:48:34 2015 From: eric.muller at efele.net (Eric Muller) Date: Tue, 28 Jul 2015 07:48:34 -0700 Subject: Toki Pona: A Language With a Hundred Words - The Atlantic Message-ID: <55B79642.9000103@efele.net> http://www.theatlantic.com/technology/archive/2015/07/toki-pona-smallest-language/398363/ Eric. From doug at ewellic.org Tue Jul 28 09:53:53 2015 From: doug at ewellic.org (Doug Ewell) Date: Tue, 28 Jul 2015 07:53:53 -0700 Subject: Emoji characters for food allergens Message-ID: <20150728075353.665a7a7059d7ee80bb4d670165c8327d.d31cd0be5e.wbe@email03.secureserver.net> Richard Cook <rscook at wenlin dot com> wrote: > And, what is the emotion playfully expressed by ???? ? "I'm having a burger and fries for lunch but can't be bothered to type all that into this text message lol" -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From rscook at wenlin.com Tue Jul 28 10:07:37 2015 From: rscook at wenlin.com (Richard Cook) Date: Tue, 28 Jul 2015 08:07:37 -0700 Subject: Emoji characters for food allergens In-Reply-To: <20150728075353.665a7a7059d7ee80bb4d670165c8327d.d31cd0be5e.wbe@email03.secureserver.net> References: <20150728075353.665a7a7059d7ee80bb4d670165c8327d.d31cd0be5e.wbe@email03.secureserver.net> Message-ID: <30F31338-7226-4EB9-ABAB-B55100D7ADDC@wenlin.com> On Jul 28, 2015, at 7:53 AM, Doug Ewell <doug at ewellic.org> wrote: > > Richard Cook <rscook at wenlin dot com> wrote: > >> And, what is the emotion playfully expressed by ???? ? > > "I'm having a burger and fries for lunch but can't be bothered to type > all that into this text message lol" > Is all that one emotion or two? > -- > Doug Ewell | http://ewellic.org | Thornton, CO ???? > > From asmusf at ix.netcom.com Tue Jul 28 10:56:33 2015 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Tue, 28 Jul 2015 08:56:33 -0700 Subject: Emoji characters for food allergens In-Reply-To: <30F31338-7226-4EB9-ABAB-B55100D7ADDC@wenlin.com> References: <20150728075353.665a7a7059d7ee80bb4d670165c8327d.d31cd0be5e.wbe@email03.secureserver.net> <30F31338-7226-4EB9-ABAB-B55100D7ADDC@wenlin.com> Message-ID: <55B7A631.7070301@ix.netcom.com> An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150728/98c62581/attachment.html> From c933103 at gmail.com Tue Jul 28 12:46:28 2015 From: c933103 at gmail.com (gfb hjjhjh) Date: Wed, 29 Jul 2015 01:46:28 +0800 Subject: Emoji characters for food allergens In-Reply-To: <474B9A36-2A8C-449A-8019-60B373459914@evertype.com> References: <29292306.26076.1437842589469.JavaMail.defaultUser@defaultHost> <CA+p4_H2HwubnE92C2Zoo7=_paj1VnZPR76Fs0AmZ+-OcsgJBiA@mail.gmail.com> <474B9A36-2A8C-449A-8019-60B373459914@evertype.com> Message-ID: <CAGHjPP+LUEkn=w3DEEcyfwhFen7Tq=jqV2DMuQ6cenQMShz1yg@mail.gmail.com> Probably if these symbols are to be added to unicode, it would better to allocate blocks that are not belong to emoji for them. Also, it should be noted that emoji can look very different across different places, see http://unicode.org/faq/emoji_dingbats.html and http://www.unicode.org/reports/tr51/index.html#Design_Guidelines -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150729/77a00611/attachment.html> From gwalla at gmail.com Tue Jul 28 13:26:35 2015 From: gwalla at gmail.com (Garth Wallace) Date: Tue, 28 Jul 2015 11:26:35 -0700 Subject: Emoji characters for food allergens In-Reply-To: <474B9A36-2A8C-449A-8019-60B373459914@evertype.com> References: <29292306.26076.1437842589469.JavaMail.defaultUser@defaultHost> <CA+p4_H2HwubnE92C2Zoo7=_paj1VnZPR76Fs0AmZ+-OcsgJBiA@mail.gmail.com> <474B9A36-2A8C-449A-8019-60B373459914@evertype.com> Message-ID: <CA+p4_H2+6QSLEEXWMSBrRGPamwT6uMa9iE+hjPe8XXFBrfYMAQ@mail.gmail.com> Well, there are several emoji for various items encountered in daily life, and I think the reasoning is that allergens are important things to refer to because of their health effects. It's a bit of a leap to say that means there's a need for dedicated pictograms though. I agree, it does seem to be putting the cart before the horse. On Tue, Jul 28, 2015 at 6:00 AM, Michael Everson <everson at evertype.com> wrote: > I do NOT understand the rationale. > > Emojis are not for labelling things. They?re for the playful expression of emotions. > > Standardized symbols for allergens might be useful, if there were a textual use for them. > >> On 26 Jul 2015, at 06:05, Garth Wallace <gwalla at gmail.com> wrote: >> >> On Sat, Jul 25, 2015 at 9:43 AM, William_J_G Overington >> <wjgo_10009 at btinternet.com> wrote: >>> Emoji characters for food allergens >>> >>> An interesting document entitled >>> >>> Preliminary proposal to add emoji characters for food allergens >>> >>> by Hiroyuki Komatsu >>> >>> was added into the UTC (Unicode Technical Committee) Document Register >>> yesterday. >>> >>> http://www.unicode.org/L2/L2015/15197-emoji-food-allergens.pdf >>> >>> This is a welcome development. >> >> I'm skeptical. I understand the rationale, but several of the proposed >> characters are essentially SMALL PILE OF BROWN DOTS and would be >> difficult to distinguish at typical sizes. > > Michael Everson * http://www.evertype.com/ > > From doug at ewellic.org Tue Jul 28 14:24:16 2015 From: doug at ewellic.org (Doug Ewell) Date: Tue, 28 Jul 2015 12:24:16 -0700 Subject: Emoji characters for food allergens Message-ID: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net> gfb hjjhjh <c933103 at gmail dot com> wrote: > Probably if these symbols are to be added to unicode, it would better > to allocate blocks that are not belong to emoji for them. I'm curious what this is supposed to accomplish. It's not as though people viewing such a symbol on a screen or in print, or entering it on a phone keypad, will know or care what its Unicode code point is, or what other types of symbols have nearby code points. The Miscellaneous Symbols block contains U+2620 SKULL AND CROSSBONES, U+2623 BIOHAZARD SIGN, and U+263A WHITE SMILING FACE. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From rscook at wenlin.com Tue Jul 28 15:07:26 2015 From: rscook at wenlin.com (Richard Cook) Date: Tue, 28 Jul 2015 13:07:26 -0700 Subject: Emoji characters for food allergens In-Reply-To: <55B7A631.7070301@ix.netcom.com> References: <20150728075353.665a7a7059d7ee80bb4d670165c8327d.d31cd0be5e.wbe@email03.secureserver.net> <30F31338-7226-4EB9-ABAB-B55100D7ADDC@wenlin.com> <55B7A631.7070301@ix.netcom.com> Message-ID: <F92549C7-A6F1-410E-9F82-336F79404E54@wenlin.com> On Jul 28, 2015, at 8:56 AM, Asmus Freytag <asmusf at ix.netcom.com> wrote: > >> On 7/28/2015 8:07 AM, Richard Cook wrote: >>> On Jul 28, 2015, at 7:53 AM, Doug Ewell <doug at ewellic.org> wrote: >>> Richard Cook <rscook at wenlin dot com> wrote: >>> >>>> And, what is the emotion playfully expressed by ???? ? >>> "I'm having a burger and fries for lunch but can't be bothered to type >>> all that into this text message lol" >>> >> Is all that one emotion or two? > > Remember: > e-moji == picto-graph > > and > > emoji != emoticon. > hey Michael, You want ?? with that? ?? -R > A./ >> >>> -- >>> Doug Ewell | http://ewellic.org | Thornton, CO ???? >>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150728/572fc633/attachment.html> From c933103 at gmail.com Tue Jul 28 15:26:21 2015 From: c933103 at gmail.com (gfb hjjhjh) Date: Wed, 29 Jul 2015 04:26:21 +0800 Subject: Emoji characters for food allergens In-Reply-To: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net> References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net> Message-ID: <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com> As according to http://unicode.org/faq/emoji_dingbats.html , emoji characters do not have single semantics. Which I think it is not what the original proposer want? Or were I misunderstanding that 2015?7?29? ??3:28? "Doug Ewell" <doug at ewellic.org>??? > gfb hjjhjh <c933103 at gmail dot com> wrote: > > > Probably if these symbols are to be added to unicode, it would better > > to allocate blocks that are not belong to emoji for them. > > I'm curious what this is supposed to accomplish. It's not as though > people viewing such a symbol on a screen or in print, or entering it on > a phone keypad, will know or care what its Unicode code point is, or > what other types of symbols have nearby code points. > > The Miscellaneous Symbols block contains U+2620 SKULL AND CROSSBONES, > U+2623 BIOHAZARD SIGN, and U+263A WHITE SMILING FACE. > > -- > Doug Ewell | http://ewellic.org | Thornton, CO ???? > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150729/b51bd4ae/attachment.html> From gwalla at gmail.com Tue Jul 28 17:27:08 2015 From: gwalla at gmail.com (Garth Wallace) Date: Tue, 28 Jul 2015 15:27:08 -0700 Subject: Emoji characters for food allergens In-Reply-To: <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com> References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net> <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com> Message-ID: <CA+p4_H1Y2FrCKv038Q+UxMZbp0Xcy0gURndkgASO88HkEwv2QQ@mail.gmail.com> That's what Mr. Overington wants, but he's not the original proposer. The proposal by Hiroyuki Komatsu <http://www.unicode.org/L2/L2015/15197r-emoji-food-allergens.pdf> does not say anything of the sort, and by unifying some with existing characters implies otherwise. On Tue, Jul 28, 2015 at 1:26 PM, gfb hjjhjh <c933103 at gmail.com> wrote: > As according to http://unicode.org/faq/emoji_dingbats.html , emoji > characters do not have single semantics. Which I think it is not what the > original proposer want? Or were I misunderstanding that > > 2015?7?29? ??3:28? "Doug Ewell" <doug at ewellic.org>??? >> >> gfb hjjhjh <c933103 at gmail dot com> wrote: >> >> > Probably if these symbols are to be added to unicode, it would better >> > to allocate blocks that are not belong to emoji for them. >> >> I'm curious what this is supposed to accomplish. It's not as though >> people viewing such a symbol on a screen or in print, or entering it on >> a phone keypad, will know or care what its Unicode code point is, or >> what other types of symbols have nearby code points. >> >> The Miscellaneous Symbols block contains U+2620 SKULL AND CROSSBONES, >> U+2623 BIOHAZARD SIGN, and U+263A WHITE SMILING FACE. >> >> -- >> Doug Ewell | http://ewellic.org | Thornton, CO ???? >> >> > From mark at kli.org Tue Jul 28 21:21:27 2015 From: mark at kli.org (Mark Shoulson) Date: Tue, 28 Jul 2015 22:21:27 -0400 Subject: Revenge of pIqaD Message-ID: <55B838A7.30603@kli.org> An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150728/a3fe7171/attachment.html> From Shawn.Steele at microsoft.com Tue Jul 28 21:50:19 2015 From: Shawn.Steele at microsoft.com (Shawn Steele) Date: Wed, 29 Jul 2015 02:50:19 +0000 Subject: Revenge of pIqaD In-Reply-To: <55B838A7.30603@kli.org> References: <55B838A7.30603@kli.org> Message-ID: <BLUPR03MB13789D75233C4ECAE4F5A40F828C0@BLUPR03MB1378.namprd03.prod.outlook.com> You missed Bing translate? http://www.bing.com/translator/?from=en&to=tlh-Qaak&text=Success - Shawn From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Mark Shoulson Sent: Tuesday, July 28, 2015 7:21 PM To: unicode at unicode.org; Chris Lipscombe <qurgh at wizage.net> Subject: Revenge of pIqaD OK! I'm freshly back from the qep'a' cha'maH cha'DIch in Chicago, and I have to report that Klingon pIqaD really is out there and getting some use, despite having been banished to the PUA. I've seen it on a wine-bottle label (commercially produced, not someone's homebrew), on the Klingon version of the Monopoly game, a book or two (NOT published by the KLI); there are websites using it (but then there were last time I mentioned this and that didn't seem to count then), and apparently support for it on several platforms, including a smartphone keypad, to say nothing of quite a few T-shirts. Apparently there is a small community actually using pIqaD to (*gasp*) exchange information via SMS. I'm copying Chris Lipscombe on this email; he is better plugged in to the use of pIqaD in Real Life? (don't forget to Reply All if you want to include him, since I think he isn't on the list at the moment). What has to be done to get this encoded? The proposal is likely still more or less what we need, and it probably has at least as much online information interchange as, say, Gondi does ("Well, what do you expect, Gondi isn't encoded yet!" "Neither is pIqaD.") Are we ready to revisit this question again? ~mark -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150729/8fbb91f4/attachment.html> From Shawn.Steele at microsoft.com Tue Jul 28 21:53:08 2015 From: Shawn.Steele at microsoft.com (Shawn Steele) Date: Wed, 29 Jul 2015 02:53:08 +0000 Subject: Revenge of pIqaD References: <55B838A7.30603@kli.org> Message-ID: <BLUPR03MB1378ACB02E3FAE9AA9E594F9828C0@BLUPR03MB1378.namprd03.prod.outlook.com> Ooo, I forgot that means everything is in pIqaD! http://www.microsofttranslator.com/bv.aspx?from=en&to=tlh-Qaak&a=http%3A%2F%2Fwww.cnn.com%2F From: Shawn Steele Sent: Tuesday, July 28, 2015 7:50 PM To: 'Mark Shoulson' <mark at kli.org>; unicode at unicode.org; Chris Lipscombe <qurgh at wizage.net> Subject: RE: Revenge of pIqaD You missed Bing translate? http://www.bing.com/translator/?from=en&to=tlh-Qaak&text=Success - Shawn From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Mark Shoulson Sent: Tuesday, July 28, 2015 7:21 PM To: unicode at unicode.org<mailto:unicode at unicode.org>; Chris Lipscombe <qurgh at wizage.net<mailto:qurgh at wizage.net>> Subject: Revenge of pIqaD OK! I'm freshly back from the qep'a' cha'maH cha'DIch in Chicago, and I have to report that Klingon pIqaD really is out there and getting some use, despite having been banished to the PUA. I've seen it on a wine-bottle label (commercially produced, not someone's homebrew), on the Klingon version of the Monopoly game, a book or two (NOT published by the KLI); there are websites using it (but then there were last time I mentioned this and that didn't seem to count then), and apparently support for it on several platforms, including a smartphone keypad, to say nothing of quite a few T-shirts. Apparently there is a small community actually using pIqaD to (*gasp*) exchange information via SMS. I'm copying Chris Lipscombe on this email; he is better plugged in to the use of pIqaD in Real Life? (don't forget to Reply All if you want to include him, since I think he isn't on the list at the moment). What has to be done to get this encoded? The proposal is likely still more or less what we need, and it probably has at least as much online information interchange as, say, Gondi does ("Well, what do you expect, Gondi isn't encoded yet!" "Neither is pIqaD.") Are we ready to revisit this question again? ~mark -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150729/f3a7ee11/attachment.html> From mark at kli.org Tue Jul 28 21:57:40 2015 From: mark at kli.org (Mark Shoulson) Date: Tue, 28 Jul 2015 22:57:40 -0400 Subject: Revenge of pIqaD In-Reply-To: <BLUPR03MB13789D75233C4ECAE4F5A40F828C0@BLUPR03MB1378.namprd03.prod.outlook.com> References: <55B838A7.30603@kli.org> <BLUPR03MB13789D75233C4ECAE4F5A40F828C0@BLUPR03MB1378.namprd03.prod.outlook.com> Message-ID: <55B84124.7030108@kli.org> For added amusement, type "Seqram" into Bing translate, translating from Klingon back to English, and see what you get. ~mark On 07/28/2015 10:50 PM, Shawn Steele wrote: > > You missed Bing translate? > http://www.bing.com/translator/?from=en&to=tlh-Qaak&text=Success > > - Shawn > > *From:*Unicode [mailto:unicode-bounces at unicode.org] *On Behalf Of > *Mark Shoulson > *Sent:* Tuesday, July 28, 2015 7:21 PM > *To:* unicode at unicode.org; Chris Lipscombe <qurgh at wizage.net> > *Subject:* Revenge of pIqaD > > OK! I'm freshly back from the qep'a' cha'maH cha'DIch in Chicago, and > I have to report that Klingon pIqaD really is out there and getting > some use, despite having been banished to the PUA. I've seen it on a > wine-bottle label (commercially produced, not someone's homebrew), on > the Klingon version of the Monopoly game, a book or two (NOT published > by the KLI); there are websites using it (but then there were last > time I mentioned this and that didn't seem to count then), and > apparently support for it on several platforms, including a smartphone > keypad, to say nothing of quite a few T-shirts. Apparently there is a > small community actually using pIqaD to (*gasp*) exchange information > via SMS. I'm copying Chris Lipscombe on this email; he is better > plugged in to the use of pIqaD in Real Life? (don't forget to Reply > All if you want to include him, since I think he isn't on the list at > the moment). > > What has to be done to get this encoded? The proposal is likely still > more or less what we need, and it probably has at least as much online > information interchange as, say, Gondi does ("Well, what do you > expect, Gondi isn't encoded yet!" "Neither is pIqaD.") Are we ready > to revisit this question again? > > ~mark > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150728/daadfd0e/attachment.html> From charupdate at orange.fr Wed Jul 29 02:48:22 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Wed, 29 Jul 2015 09:48:22 +0200 (CEST) Subject: Emoji characters for food allergens In-Reply-To: <15994373.18168.1438078761970.JavaMail.defaultUser@defaultHost> References: <2128502277.13162.1438018225882.JavaMail.www@wwinf2215> <15994373.18168.1438078761970.JavaMail.defaultUser@defaultHost> Message-ID: <1592423653.5282.1438156103036.JavaMail.www@wwinf1k37> Hi William, Sorry. On 28 Jul 2015, at 12:19, William_J_G Overington wrote: > Well a lot could be done information technology-wise to facilitate communication through the language barrier. > For example in text messages, sent by email, or over a mobile telephone link or maybe thrown to a device nearby, to communicate dietary needs, using the emoji characters for food allergens that we are discussing in this thread: this information could then be localized into text automatically in the receiving device; > For example, by using a smartphone by reading from an RFID tag (radio-frequency identification tag) on a shelf label in a supermarket display about a product . The RFID tag could contain the food allergen information about the food encoded using the emoji characters for food allergens that we are discussing in this thread: this information could then be localized into text automatically in the smartphone. Alternately, scanning the EAN barcode on the package could give access to a database intended for food information. This requires the use of a smartphone or other compatible device. Another use of allergen emojis would be to respond to an invitation by SMS. Somebody inviting to dinner at home, can gather information from guests about what allergens to keep away from the ingredients list when cooking. This is typically an emoji case. The emotions implied with food allergens are concern, fear and anxiety. But, as already discussed in this thread, emoticons/emojis must not necessarily convey an emotion, the term having become somehow a generic for symbols. Best regards, Marcel Schneider ? > Message du 28/07/15 12:19 > De : "William_J_G Overington" > A : "Marcel Schneider" > Copie ? : gwalla at gmail.com, unicode at unicode.org, komatsu at google.com > Objet : re: Emoji characters for food allergens > > > Hi Marcel > >> I have also wondered whether each glyph for an allergen should include within its glyph a number, maybe a three-digit number, so that clarity is precise. > > > I'm not sure whether another code would facilitate the handling of these warnings. IMHO the allergen name in natural language is more efficient in communication. This needs however to identify and learn the words prior to travelling into a foreign language country, while a code point is more obvious to read if it's meaning is at hand. > Well a lot could be done information technology-wise to facilitate communication through the language barrier. > For example in text messages, sent by email, or over a mobile telephone link or maybe thrown to a device nearby, to communicate dietary needs, using the emoji characters for food allergens that we are discussing in this thread: this information could then be localized into text automatically in the receiving device; > For example,?by using a smartphone by reading from an?RFID tag (radio-frequency identification tag) on a shelf?label in a supermarket display?about a product . The RFID tag could contain the food allergen information about the food encoded using the emoji characters for food allergens that we are discussing in this thread: this information could then be localized into text automatically in the smartphone. > Rest regards, > William Overington > 28 July 2015 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150729/9bb54540/attachment.html> From charupdate at orange.fr Wed Jul 29 03:10:02 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Wed, 29 Jul 2015 10:10:02 +0200 (CEST) Subject: Windows 10 release (is still: Re: WORD JOINER vs ZWNBSP) Message-ID: <264002605.5965.1438157402178.JavaMail.www@wwinf1k37> On 02 Jul 2015, at 12:22, I replied: > However, I believe that WJs being a part of plain text, they should be properly supported on all text handling applications. And they should be on the keyboard. > The solution I suggest is therefore to have the word joiner (and the sequences containing it) on Ctrl+Alt or Kana, and the zero width no-break space on Shift+Ctrl+Alt or Shift+Kana, so that users working efficently on good software may access the preferred character a bit easier than users who must use the deprecated character because their word processor does not properly support the preferred one. Unfortunately that doesn?t work on at least one recent version of Windows. An unambigous bug was due to the presence of 0x2060 in the Ligatures table. This has cost me a whole workday to retrieve, fix, and verify. The effect of the bug was that Word, Excel, Firefox and Zotero were unstartable. As a result, the WORD JOINER cannot be implemented on a driver based keyboard layout for general use on Windows. By contrast, the ZWNBSP can. Consequently we hope that such kind of bugs are being fixed on Windows 10, that is to be released today. If everybody using Windows 7 or 8 is being updated for free, Windows 10 will become the standard and we will be able to build upon. It needs to be underscored that this kind of keyboard driver related bugs is normally impossible when using Keyman. I don?t see any way for the OS to detect the presence of 0x2060 in a ligatures table in order to block the full execution of the system, when this character is a part of some keyboard layout software that is fully managed and executed by an additional framework like Keyman. Under the actual overall circumstances, and for ease and flexibility of development and use, Keyman appears to me as an indispensable software for thorough and complete Unicode implementations. Best regards, Marcel Schneider -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150729/5ec35f9e/attachment.html> From pandey at umich.edu Wed Jul 29 06:09:59 2015 From: pandey at umich.edu (Anshuman Pandey) Date: Wed, 29 Jul 2015 07:09:59 -0400 Subject: Revenge of pIqaD In-Reply-To: <55B838A7.30603@kli.org> References: <55B838A7.30603@kli.org> Message-ID: <7AB838BC-4DCD-4F39-8858-E39238F8E6B4@umich.edu> Dear Mark and Chris, I wonder if copyright or other IP issues might hinder the suitability of encoding Klingon, similar to the Tolkien scripts? And to be sure, Klingon certainly does have a larger digital presence than the Gondi scripts... All the best, Anshu > On Jul 28, 2015, at 10:21 PM, Mark Shoulson <mark at kli.org> wrote: > > OK! I'm freshly back from the qep'a' cha'maH cha'DIch in Chicago, and I have to report that Klingon pIqaD really is out there and getting some use, despite having been banished to the PUA. I've seen it on a wine-bottle label (commercially produced, not someone's homebrew), on the Klingon version of the Monopoly game, a book or two (NOT published by the KLI); there are websites using it (but then there were last time I mentioned this and that didn't seem to count then), and apparently support for it on several platforms, including a smartphone keypad, to say nothing of quite a few T-shirts. Apparently there is a small community actually using pIqaD to (*gasp*) exchange information via SMS. I'm copying Chris Lipscombe on this email; he is better plugged in to the use of pIqaD in Real Life? (don't forget to Reply All if you want to include him, since I think he isn't on the list at the moment). > > What has to be done to get this encoded? The proposal is likely still more or less what we need, and it probably has at least as much online information interchange as, say, Gondi does ("Well, what do you expect, Gondi isn't encoded yet!" "Neither is pIqaD.") Are we ready to revisit this question again? > > ~mark From wjgo_10009 at btinternet.com Wed Jul 29 03:21:17 2015 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Wed, 29 Jul 2015 09:21:17 +0100 (BST) Subject: Emoji characters for food allergens In-Reply-To: <1592423653.5282.1438156103036.JavaMail.www@wwinf1k37> References: <2128502277.13162.1438018225882.JavaMail.www@wwinf2215> <15994373.18168.1438078761970.JavaMail.defaultUser@defaultHost> <1592423653.5282.1438156103036.JavaMail.www@wwinf1k37> Message-ID: <19038120.7776.1438158077110.JavaMail.defaultUser@defaultHost> Hi Marcel > Alternately, scanning the EAN barcode on the package could give access to a database intended for food information. This requires the use of a smartphone or other compatible device. That is a good idea. In which case the emoji would not need to be encoded on the package, yet would be sent by the database facility. Using EAN barcode to database and the results sent to the end user would need a two-way communication link and that could possibly mean queueing problems as the database facility would possibly be answering requests from many people. Another possibility would be to encode the Unicode characters for the allergens contained in the food within a QR code (Quick Response Code) on the package. Decoding could then be local, in the device being used to scan the QR code. Both of these methods, EAN barcode and QR code, could be used to communicate through the language barrier, either by viewing the emoji, or by the emoji becoming converted to localized text in the device that is being used by the end user. Best regards, William Overington 29 July 2015 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150729/79bfc06d/attachment.html> From wjgo_10009 at btinternet.com Wed Jul 29 03:38:38 2015 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Wed, 29 Jul 2015 09:38:38 +0100 (BST) Subject: Emoji characters for food allergens Message-ID: <5794935.9139.1438159118596.JavaMail.defaultUser@defaultHost> >> Probably if these symbols are to be added to unicode, it would better to allocate blocks that are not belong to emoji for them. > I'm curious what this is supposed to accomplish. It's not as though people viewing such a symbol on a screen or in print, or entering it on a phone keypad, will know or care what its Unicode code point is, or what other types of symbols have nearby code points. Yet some people might be using a system with an Insert Symbol... facility to prepare an email or to design a label or whatever. In such Insert Symbol... facilities it is often the case that characters are listed in Unicode code point order. My original purpose of suggesting separate blocks of code points was to seek to avoid a symbol relating to a food allergen having more than one meaning, one precise and medical, one or more others just everyday chat. The issue of the meaning of an emoji character not being precisely defined that has been discussed in other posts in this thread makes having separate blocks and maybe not even terming the characters as emoji but as "precise emoji" or some other new term, become very important so as to avoid confusion in the application of the symbols. Also, suppose that a person programming an app wishes to have the software in the app notice whatever food allergen emoji characters are in a message. Having them all within two contiguous blocks of code points would assist the programming process. There was also a coding aesthetics aspect that separate blocks seems better to me as a way to organize such an encoding. William Overington 29 July 2015 From wjgo_10009 at btinternet.com Wed Jul 29 08:42:59 2015 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Wed, 29 Jul 2015 14:42:59 +0100 (BST) Subject: Emoji characters for food allergens In-Reply-To: <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com> References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net> <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com> Message-ID: <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost> > As according to http://unicode.org/faq/emoji_dingbats.html , emoji characters do not have single semantics. Which I think it is not what the original proposer want? Or were I misunderstanding that Garth Wallace has already indicated in his reply to your post that it was me, not the original proposer, who wanted single semantics. Thank you for the link. I have followed it and read in the document what it says about single semantics. Oh! Well, it seems to me that something has got to give in order for "Emoji characters for food allergies" to work effectively. The easiest thing appears to be to not call the items emoji. I opine that a new word is needed to mean the following. A character that looks like it is an emoji character yet has precise semantics. There is an issue here that is, in my opinion, quite fundamental to the future of encoding items that are currently all regarded as emoji: an issue that goes far beyond the matter of encoding emoji characters for food allergens. Communication through the language barrier is of huge importance and may become more so in the future. Emoji seemed like a wonderful way to achieve communication through the language barrier. Yet if semantics are not defined, then there is a problem. Please consider the matter of text to speech in the draft Unicode Technical Report 51. I remember years ago I was asked in this mailing list what chat means. I think that discussing the meaning of chat is some classic Unicode cultural matter. In English it is an informal talk between two or more people, in French it is a cat. So the sequence of Unicode characters only has meaning in the context that they are being used. Now the big opportunity with emoji could be to assist communication through the language barrier. >From reading about semantics in the linked document it appears that that opportunity might be disappearing or may have gone already. This, in my opinion, is unfortunate. The food allergen characters could, by being precisely defined with one and only one meaning, be either an exception to the general situation or could be the start of a trend. A name other than emoji is needed for such characters that have one and only one meaning, that meaning precisely defined. Those characters could still be colourful and could look emoji-ish. Maybe they could be double width so as to show their distinctiveness? Would double width characters be a problem as regards applying them in systems such as mobile telephones at present? Now, such precisely defined emoji could be entirely representationally pictures, yet there could also be abstract pictures and also pictures that are partly representational and partly abstract. For example, one such character could be used to be placed before a list of emoji characters for food allergens to indicate that that a list of dietary need follows. For example, My dietary need is no gluten no dairy no egg There could be a way to indicate the following. My diet can include soya There is a situation that affects further discussion of some aspects of this matter, though not all aspects of this matter, as a totally symbolic representation could still be discussed. http://www.unicode.org/mail-arch/unicode-ml/y2015-m06/0208.html However, there is also the following. http://www.oxforddictionaries.com/definition/english/moratorium Please note the use of the word temporary in the definition. So maybe all is not lost and discussion of all aspects will become possible at some future time. William Overington 29 July 2015 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150729/7e743275/attachment.html> From andrewcwest at gmail.com Wed Jul 29 09:27:13 2015 From: andrewcwest at gmail.com (Andrew West) Date: Wed, 29 Jul 2015 15:27:13 +0100 Subject: Emoji characters for food allergens In-Reply-To: <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost> References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net> <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com> <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost> Message-ID: <CALgEMhwZnoNTe1=xQa6R9X7N553vjqEmj5TRHbd2hHiMOK4whQ@mail.gmail.com> On 29 July 2015 at 14:42, William_J_G Overington <wjgo_10009 at btinternet.com> wrote: > > For example, one such character could be used to be placed before a list of > emoji characters for food allergens to indicate that that a list of dietary > need follows. > > For example, > > My dietary need is no gluten no dairy no egg > > There could be a way to indicate the following. > > My diet can include soya There already is, you can write "My diet can include soya". If you are likely to swell up and die if you eat a peanut (for example), you will not want to trust your life to an emoji picture of a peanut which could be mistaken for something else or rendered as a square box for the recipient. There may be a case to be made for encoding symbols for food allergens for labelling purposes, but there is no case for encoding such symbols as a form of symbolic language for communication of dietary requirements. Andrew From doug at ewellic.org Wed Jul 29 11:39:51 2015 From: doug at ewellic.org (Doug Ewell) Date: Wed, 29 Jul 2015 09:39:51 -0700 Subject: Emoji characters for food allergens Message-ID: <20150729093951.665a7a7059d7ee80bb4d670165c8327d.bef66cbee0.wbe@email03.secureserver.net> Andrew West <andrewcwest at gmail dot com> wrote: > There may be a case to be made for encoding symbols for food allergens > for labelling purposes, but there is no case for encoding such symbols > as a form of symbolic language for communication of dietary > requirements. For what little it is worth, I agree with Andrew on this. Earlier I mentioned U+2620 SKULL AND CROSSBONES and U+2623 BIOHAZARD SIGN, two symbols which have been in Unicode since the dawn of time. Both of these are Level 2 emoji, according to emoji-data.txt [1], and are accorded no special treatment, placement, or display guidelines beyond that. While communication about food allergens is undoubtedly important, it's hard to imagine that communication about poisons and biohazards is any less important. [1] http://www.unicode.org/Public/emoji/1.0//emoji-data.txt -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From richard.wordingham at ntlworld.com Wed Jul 29 13:48:00 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Wed, 29 Jul 2015 19:48:00 +0100 Subject: Windows 10 release (is still: Re: WORD JOINER vs ZWNBSP) In-Reply-To: <264002605.5965.1438157402178.JavaMail.www@wwinf1k37> References: <264002605.5965.1438157402178.JavaMail.www@wwinf1k37> Message-ID: <20150729194800.5dba3d0b@JRWUBU2> On Wed, 29 Jul 2015 10:10:02 +0200 (CEST) Marcel Schneider <charupdate at orange.fr> wrote: > On 02 Jul 2015, at 12:22, I replied: > > > However, I believe that WJs being a part of plain text, they should > > be properly supported on all text handling applications. And they > > should be on the keyboard. > > > The solution I suggest is therefore to have the word joiner (and > > the sequences containing it) on Ctrl+Alt or Kana, and the zero > > width no-break space on Shift+Ctrl+Alt or Shift+Kana, so that users > > working efficently on good software may access the preferred > > character a bit easier than users who must use the deprecated > > character because their word processor does not properly support > > the preferred one. > Unfortunately that doesn?t work on at least one recent version of > Windows. An unambigous bug was due to the presence of 0x2060 in the > Ligatures table. This has cost me a whole workday to retrieve, fix, > and verify. > The effect of the bug was that Word, Excel, Firefox and Zotero were > unstartable. > As a result, the WORD JOINER cannot be implemented on a driver based > keyboard layout for general use on Windows. By contrast, the ZWNBSP > can. Your lament is a bit vague - I'm not sure what U+2060 is doing in a 'ligature table'. I can say that a Windows keyboard mapping that maps AltGr-M to WJ which was created using MSKLC on Windows 7 in April 2011 still works. Richard. From duerst at it.aoyama.ac.jp Wed Jul 29 20:06:43 2015 From: duerst at it.aoyama.ac.jp (=?UTF-8?Q?Martin_J._D=c3=bcrst?=) Date: Thu, 30 Jul 2015 10:06:43 +0900 Subject: Emoji characters for food allergens In-Reply-To: <CALgEMhwZnoNTe1=xQa6R9X7N553vjqEmj5TRHbd2hHiMOK4whQ@mail.gmail.com> References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net> <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com> <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost> <CALgEMhwZnoNTe1=xQa6R9X7N553vjqEmj5TRHbd2hHiMOK4whQ@mail.gmail.com> Message-ID: <55B978A3.1010807@it.aoyama.ac.jp> On 2015/07/29 23:27, Andrew West wrote: > On 29 July 2015 at 14:42, William_J_G Overington >> My diet can include soya > > There already is, you can write "My diet can include soya". > > If you are likely to swell up and die if you eat a peanut (for > example), you will not want to trust your life to an emoji picture of > a peanut which could be mistaken for something else Yes, in the worst case for something like "I like peanuts". > or rendered as a > square box for the recipient. There may be a case to be made for > encoding symbols for food allergens for labelling purposes, but there > is no case for encoding such symbols as a form of symbolic language > for communication of dietary requirements. > > Andrew > . > From mark at kli.org Wed Jul 29 20:15:45 2015 From: mark at kli.org (Mark E. Shoulson) Date: Wed, 29 Jul 2015 21:15:45 -0400 Subject: Emoji characters for food allergens In-Reply-To: <CALgEMhwZnoNTe1=xQa6R9X7N553vjqEmj5TRHbd2hHiMOK4whQ@mail.gmail.com> References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net> <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com> <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost> <CALgEMhwZnoNTe1=xQa6R9X7N553vjqEmj5TRHbd2hHiMOK4whQ@mail.gmail.com> Message-ID: <55B97AC1.3080902@kli.org> Indeed; depending on special Emoji characters to convey unambiguously an crucial sentence beyond language barriers also treads very close to using those "localizable sentences" we mustn't talk about. ~mark On 07/29/2015 10:27 AM, Andrew West wrote: > On 29 July 2015 at 14:42, William_J_G Overington > <wjgo_10009 at btinternet.com> wrote: >> For example, one such character could be used to be placed before a list of >> emoji characters for food allergens to indicate that that a list of dietary >> need follows. >> >> For example, >> >> My dietary need is no gluten no dairy no egg >> >> There could be a way to indicate the following. >> >> My diet can include soya > There already is, you can write "My diet can include soya". > > If you are likely to swell up and die if you eat a peanut (for > example), you will not want to trust your life to an emoji picture of > a peanut which could be mistaken for something else or rendered as a > square box for the recipient. There may be a case to be made for > encoding symbols for food allergens for labelling purposes, but there > is no case for encoding such symbols as a form of symbolic language > for communication of dietary requirements. > > Andrew From mark at kli.org Wed Jul 29 23:01:06 2015 From: mark at kli.org (Mark E. Shoulson) Date: Thu, 30 Jul 2015 00:01:06 -0400 Subject: Emoji characters for food allergens In-Reply-To: <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost> References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net> <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com> <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost> Message-ID: <55B9A182.2030504@kli.org> On 07/29/2015 09:42 AM, William_J_G Overington wrote: > > The easiest thing appears to be to not call the items emoji. > > I opine that a new word is needed to mean the following. > > A character that looks like it is an emoji character yet has > precise semantics. > So, like, a localizable sentence character? Something that has a precise, sentence-level meaning that is not linguistically determined? We aren't doing those here, as far as I know. ~mark -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150730/a010f638/attachment.html> From wjgo_10009 at btinternet.com Thu Jul 30 03:51:35 2015 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Thu, 30 Jul 2015 09:51:35 +0100 (BST) Subject: Emoji characters for food allergens In-Reply-To: <55B9A182.2030504@kli.org> References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net> <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com> <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost> <55B9A182.2030504@kli.org> Message-ID: <32045577.9821.1438246295044.JavaMail.defaultUser@defaultHost> >> The easiest thing appears to be to not call the items emoji. >> I opine that a new word is needed to mean the following. >> A character that looks like it is an emoji character yet has precise semantics. > So, like, a localizable sentence character? Well, a localizable sentence character with an emoji-like symbol would indeed be an example of such a character. Yet not every character that looks like it is an emoji character yet has precise semantics would be a localizable sentence. Indeed, not every localizable sentence symbol would look like an emoji character. My research has used symbols 23 units in width by 7 units in height. For example, please consider an emoji symbol to mean "railway station" and, for example, please consider an emoji symbol to mean "peppermint tea". If, for example, an emoji symbol that starts off to mean "railway station" became used to mean "transportation station" then the way to express specifically a railway station as an emoji rather than expressing just a place that may be either or both of a railway station and a bus station would become lost. If, for example, a symbol that starts off to mean "peppermint tea" became used to mean "herbal tea", then the way to express specifically peppermint tea as an emoji rather than expressing just a cup of herbal tea that might be peppermint or one of many other flavours of herbal tea would become lost. The emoji characters for food allergens are not localizable sentences, yet they do need, in my opinion, precise definitions and should be encoded in a separate block and given a name not as emoji but as some other name that combines them looking like emoji yet emphasises the precision of their definition: maybe they should be double width so as to avoid confusion: maybe each glyph should include a surrounding landscape format ellipse so as to emphasise their difference from ordinary emoji. > Something that has a precise, sentence-level meaning that is not linguistically determined? We aren't doing those here, as far as I know. Well, I am not a linguist and I do not fully understand that question or the comment that follows it. I have just tried to state a problem that I feel exists and hope that people who are expert in such matters can consider it and hopefully find a solution. William Overington 30 July 2015 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150730/4440e211/attachment.html> From charupdate at orange.fr Thu Jul 30 10:56:11 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Thu, 30 Jul 2015 17:56:11 +0200 (CEST) Subject: Windows 10 release (is still: Re: WORD JOINER vs ZWNBSP) Message-ID: <792325334.18912.1438271771971.JavaMail.www@wwinf1e26> On Wen 29 Jul 2015, at 20:57, Richard Wordingham wrote: > On Wed, 29 Jul 2015 10:10:02 +0200 (CEST) > Marcel Schneider wrote: > > > On 02 Jul 2015, at 12:22, I replied: > > > > > However, I believe that WJs being a part of plain text, they should > > > be properly supported on all text handling applications. And they > > > should be on the keyboard. > > > > > The solution I suggest is therefore to have the word joiner (and > > > the sequences containing it) on Ctrl+Alt or Kana, and the zero > > > width no-break space on Shift+Ctrl+Alt or Shift+Kana, so that users > > > working efficently on good software may access the preferred > > > character a bit easier than users who must use the deprecated > > > character because their word processor does not properly support > > > the preferred one. > > > Unfortunately that doesn?t work on at least one recent version of > > Windows. An unambigous bug was due to the presence of 0x2060 in the > > Ligatures table. This has cost me a whole workday to retrieve, fix, > > and verify. > > > The effect of the bug was that Word, Excel, Firefox and Zotero were > > unstartable. > > > As a result, the WORD JOINER cannot be implemented on a driver based > > keyboard layout for general use on Windows. By contrast, the ZWNBSP > > can. > > Your lament is a bit vague - I'm not sure what U+2060 is doing in a > 'ligature table'. I can say that a Windows keyboard mapping that > maps AltGr-M to WJ which was created using MSKLC on Windows 7 in April > 2011 still works. I'm really pleased to learn about every initiative to implement Unicode in input practice, and I take notice that an MSKLC layout with U+2060 does not make Windows block heavy applications. Indeed I wasn't very clear, as in the deadlist I can keep 0x2060 without any problem (Compose, Space, G). This is just not very speedful. The so-called ligatures, by contrast, must not be constructed with 0x2060. This however was the case of three items: - A justifying no-break space emulation 0x2060 0x0020 0x2060, for use in word processors where the NBSP is not justifying, unlike as in desktop publishing and high-end editing software as Philippe Verdy referred to, where U+00A0 is justifying. It not being in word processing is consistent with the need of using U+00A0 along with punctuations in French, and the lack of U+202F in many fonts. - A colon with such a justifying no-break space, for use in documents that imitate the usage of at least a part, if not mainstream, old-fashioned typography: 0x2060 0x0020 0x2060 0x003a. - A punctuation apostrophe emulation 0x2060 0x0027 0x2060, mapped to Kana + I. I'm about to test on another Windows Edition. I wonder if there is a real issue or not, as you are suggesting. Nevertheless I believe that no such bugs must occur in whatever version and edition of Windows. Thank you for your feedback. Best regards, Marcel From charupdate at orange.fr Thu Jul 30 12:07:36 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Thu, 30 Jul 2015 19:07:36 +0200 (CEST) Subject: Emoji characters for food allergens In-Reply-To: <32045577.9821.1438246295044.JavaMail.defaultUser@defaultHost> References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net> <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com> <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost> <55B9A182.2030504@kli.org> <32045577.9821.1438246295044.JavaMail.defaultUser@defaultHost> Message-ID: <1369192884.16904.1438276057084.JavaMail.www@wwinf1h34> I'll try to respond to all, having not much time outside my main concerns, sorry. Indeed I agree that there are limits to the automatization of interhuman communication. In practice, whenever we are in contact with one another, the use of natural language is preferrable. Emoticons and other pictographs IMHO are intended to complete what written language cannot express in a reasonably little number of words, or for ready orientation. When at a moment or another we fall back to natural language, using this from the beginning on seems more efficient. My bad idea about responding to an invitation by a set of nutrition constraint pictographs ends up to rather prepare a predefined message in every language we're expecting invitations in. About reading packaging information, it might not be enough to avoid allergens, we should pay attention to the presence of palm oil because of the useless devastation of primates' habitats while enough fallow land exists in a concerned country for palm oil production until 2050, just as an example of how food choices are complex and need thorough awareness of numerous parameters, far beyond allergens, regardless of how life threatening these often are. Moreover, the lives of everybody on earth are threatened by imminent climate change (please see http://avaaz.org/en/ too). The Babel issue about how to communicate in language confusion might soon be resolved, if there is no more communication at all... Best regards, Marcel Schneider -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150730/861eee0c/attachment.html> From asmus-inc at ix.netcom.com Thu Jul 30 13:45:42 2015 From: asmus-inc at ix.netcom.com (Asmus Freytag (t)) Date: Thu, 30 Jul 2015 11:45:42 -0700 Subject: Emoji characters for food allergens In-Reply-To: <1369192884.16904.1438276057084.JavaMail.www@wwinf1h34> References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net> <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com> <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost> <55B9A182.2030504@kli.org> <32045577.9821.1438246295044.JavaMail.defaultUser@defaultHost> <1369192884.16904.1438276057084.JavaMail.www@wwinf1h34> Message-ID: <55BA70D6.2070002@ix.netcom.com> An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150730/ec2e4604/attachment.html> From doug at ewellic.org Thu Jul 30 13:46:31 2015 From: doug at ewellic.org (Doug Ewell) Date: Thu, 30 Jul 2015 11:46:31 -0700 Subject: Windows 10 release (is still: Re: WORD JOINER vs ZWNBSP) Message-ID: <20150730114631.665a7a7059d7ee80bb4d670165c8327d.7c3d4b8766.wbe@email03.secureserver.net> Marcel Schneider <charupdate at orange dot fr> wrote: >>> Unfortunately that doesn?t work on at least one recent version of >>> Windows. An unambigous bug was due to the presence of 0x2060 in the >>> Ligatures table. This has cost me a whole workday to retrieve, fix, >>> and verify. >>> >>> The effect of the bug was that Word, Excel, Firefox and Zotero were >>> unstartable. >>> >>> As a result, the WORD JOINER cannot be implemented on a driver based >>> keyboard layout for general use on Windows. By contrast, the ZWNBSP >>> can. and: > The so-called ligatures, by contrast, must not be constructed with > 0x2060. This however was the case of three items: > > - A justifying no-break space emulation 0x2060 0x0020 0x2060, for use > in word processors where the NBSP is not justifying, unlike as in > desktop publishing and high-end editing software as Philippe Verdy > referred to, where U+00A0 is justifying. It not being in word > processing is consistent with the need of using U+00A0 along with > punctuations in French, and the lack of U+202F in many fonts. > > - A colon with such a justifying no-break space, for use in documents > that imitate the usage of at least a part, if not mainstream, old- > fashioned typography: 0x2060 0x0020 0x2060 0x003a. > > - A punctuation apostrophe emulation 0x2060 0x0027 0x2060, mapped to > Kana + I. > > I'm about to test on another Windows Edition. I wonder if there is a > real issue or not, as you are suggesting. Nevertheless I believe that > no such bugs must occur in whatever version and edition of Windows. I created, installed, and activated an MSKLC keyboard with the three WJ sequences described above, mapped for convenience to AltGr+Z, AltGr+X, and AltGr+C respectively (not the Kana key, which I don't have), and had no trouble opening or using any applications on Windows 7, including the four mentioned above (except Zotero, which I don't use). KLC source available on request. I wouldn't have wasted the 15 minutes but for the continuing, tiresome rhetoric about Windows bugs. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From andrewcwest at gmail.com Thu Jul 30 14:07:12 2015 From: andrewcwest at gmail.com (Andrew West) Date: Thu, 30 Jul 2015 20:07:12 +0100 Subject: Emoji characters for food allergens In-Reply-To: <1369192884.16904.1438276057084.JavaMail.www@wwinf1h34> References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net> <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com> <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost> <55B9A182.2030504@kli.org> <32045577.9821.1438246295044.JavaMail.defaultUser@defaultHost> <1369192884.16904.1438276057084.JavaMail.www@wwinf1h34> Message-ID: <CALgEMhzMHj1JmP3X3_hp3MWPiE5x=RR1c_OyAafxJU6FKAwkkA@mail.gmail.com> On 30 July 2015 at 18:07, Marcel Schneider <charupdate at orange.fr> wrote: > > I'll try to respond to all, Please don't. Andrew From asmus-inc at ix.netcom.com Thu Jul 30 15:56:00 2015 From: asmus-inc at ix.netcom.com (Asmus Freytag (t)) Date: Thu, 30 Jul 2015 13:56:00 -0700 Subject: Emoji characters for food allergens In-Reply-To: <CALgEMhzMHj1JmP3X3_hp3MWPiE5x=RR1c_OyAafxJU6FKAwkkA@mail.gmail.com> References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net> <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com> <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost> <55B9A182.2030504@kli.org> <32045577.9821.1438246295044.JavaMail.defaultUser@defaultHost> <1369192884.16904.1438276057084.JavaMail.www@wwinf1h34> <CALgEMhzMHj1JmP3X3_hp3MWPiE5x=RR1c_OyAafxJU6FKAwkkA@mail.gmail.com> Message-ID: <55BA8F60.3050104@ix.netcom.com> An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150730/fdc23603/attachment.html> From wjgo_10009 at btinternet.com Fri Jul 31 04:16:42 2015 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Fri, 31 Jul 2015 10:16:42 +0100 (BST) Subject: Emoji characters for food allergens In-Reply-To: <55BA70D6.2070002@ix.netcom.com> References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net> <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com> <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost> <55B9A182.2030504@kli.org> <32045577.9821.1438246295044.JavaMail.defaultUser@defaultHost> <1369192884.16904.1438276057084.JavaMail.www@wwinf1h34> <55BA70D6.2070002@ix.netcom.com> Message-ID: <12270613.13467.1438334202211.JavaMail.defaultUser@defaultHost> >> it might not be enough to avoid allergens, we should pay attention to the presence of palm oil because of the useless devastation of primates' habitats while enough fallow land exists in a concerned country for palm oil production until 2050 > I believe that for topics like this, there are other lists or forums that are more appropriate. Well, Marcel was writing in the context of reading packaging information in a thread about emoji characters for food allergens. Now it could perhaps be said that encoding a symbol to indicate the presence of palm oil is off-topic to the thread and that a new thread spinning off from this thread would be desirable, yet still in this mailing list. However, it could also be said that as this thread is about emoji and food ingredients and knowing what is in a particular foodstuff that, although not strictly on-topic, it is relevant to discuss encoding a symbol to indicate the presence of palm oil in this thread. I had considered suggesting an emoji to express that a food is vegan, yet held back as it is not an allergen issue, more a lifestyle choice. Yet a statement that a foodstuff is suitable for a vegan diet does appear on some food packaging. Some packages also have an indication of spice strength, though I have observed that this is, within the gamut of my observations, only for things that are regarded as spicy as such, like curries, not just for a little spice in, say, the ingredients list of a soup. For me, as a gluten-avoiding vegan who avoids spicy food, the encoding of an emoji regarding gluten, yet not one for vegan or for no spice seems an issue that could reasonably be addressed while considering emoji for food allergens. So, I thank Marcel for raising the issue of palm oil in this thread. William Overington 31 July 2015 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150731/38b84ef9/attachment.html> From wjgo_10009 at btinternet.com Fri Jul 31 04:37:52 2015 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Fri, 31 Jul 2015 10:37:52 +0100 (BST) Subject: Emoji characters for food allergens In-Reply-To: <CALgEMhzMHj1JmP3X3_hp3MWPiE5x=RR1c_OyAafxJU6FKAwkkA@mail.gmail.com> References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net> <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com> <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost> <55B9A182.2030504@kli.org> <32045577.9821.1438246295044.JavaMail.defaultUser@defaultHost> <1369192884.16904.1438276057084.JavaMail.www@wwinf1h34> <CALgEMhzMHj1JmP3X3_hp3MWPiE5x=RR1c_OyAafxJU6FKAwkkA@mail.gmail.com> Message-ID: <29774584.15201.1438335472835.JavaMail.defaultUser@defaultHost> >> I'll try to respond to all, > Please don't. What Marcel wrote was as follows: quote I'll try to respond to all, having not much time outside my main concerns, sorry. end quote When I first read that, and indeed when I read it again after reading Andrew's comment, I read it as Marcel wishing that he could reply individually to each of several posts in this thread, but as he was busy, he would reply in just the one post, the post he was then writing, to various points. Thus there was no need to ask him not to do so, as he had already done it in that same post. As someone else has decided to post supporting the request, I reply that I enjoy reading Marcel's posts and that I hope that he continues. These are important issues for end users of encoding standards and for consumers generally as they are about food allergens and the labelling of food packaging. A request not to post and support for a request not to post without stating any reason whatsoever is, in my opinion, unfair. William Overington 31 July 2015 From charupdate at orange.fr Fri Jul 31 15:51:27 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Fri, 31 Jul 2015 22:51:27 +0200 (CEST) Subject: Windows 10 release (was: Re: WORD JOINER vs ZWNBSP) In-Reply-To: <20150730114631.665a7a7059d7ee80bb4d670165c8327d.7c3d4b8766.wbe@email03.secureserver.net> References: <20150730114631.665a7a7059d7ee80bb4d670165c8327d.7c3d4b8766.wbe@email03.secureserver.net> Message-ID: <1621810209.26857.1438375887376.JavaMail.www@wwinf1j14> On 30 Jul 2015 at 20:56, Doug Ewell wrote: > I created, installed, and activated an MSKLC keyboard with the three WJ > sequences described above, mapped for convenience to AltGr+Z, AltGr+X, > and AltGr+C respectively (not the Kana key, which I don't have), and had > no trouble opening or using any applications on Windows 7, including the > four mentioned above (except Zotero, which I don't use). KLC source > available on request. > > I wouldn't have wasted the 15 minutes but for the continuing, tiresome > rhetoric about Windows bugs. Thank you for having tested. Indeed the problem turned out to be located at another level. I'm still usure, but the WJ works now exept that LibreOffice doesn't insert the sequence. Sorry for my complaint. Best regards, Marcel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150731/8291b605/attachment.html> From charupdate at orange.fr Fri Jul 31 15:58:40 2015 From: charupdate at orange.fr (Marcel Schneider) Date: Fri, 31 Jul 2015 22:58:40 +0200 (CEST) Subject: Emoji characters for food allergens In-Reply-To: <29774584.15201.1438335472835.JavaMail.defaultUser@defaultHost> References: <20150728122416.665a7a7059d7ee80bb4d670165c8327d.e45e67032a.wbe@email03.secureserver.net> <CAGHjPPKjvtLhNE_N+V30YmHS+1GTk0uPeTf0ffhZm2A=boDR1w@mail.gmail.com> <27556497.34395.1438177379567.JavaMail.defaultUser@defaultHost> <55B9A182.2030504@kli.org> <32045577.9821.1438246295044.JavaMail.defaultUser@defaultHost> <1369192884.16904.1438276057084.JavaMail.www@wwinf1h34> <CALgEMhzMHj1JmP3X3_hp3MWPiE5x=RR1c_OyAafxJU6FKAwkkA@mail.gmail.com> <29774584.15201.1438335472835.JavaMail.defaultUser@defaultHost> Message-ID: <1958555392.26939.1438376320812.JavaMail.www@wwinf1j14> On 31 Jul 2015 at 15:32, William_J_G Overington wrote: > A request not to post and support for a request not to post without stating any reason whatsoever is, in my opinion, unfair. Thank you; however I believe that Mr West's and Mr Freytag's reactions were triggered also by my hasty complaints about Microsoft.?Fundamentally I didn't respect a mailing list rule which is to always respond to a particular request or statement, to stick with the thread. I'm sorry to have uselessly vented; nevertheless I'm thinking about some precise replies which I'll send soon. All the best, Marcel Schneider ? > Message du 31/07/15 15:32 > De : "William_J_G Overington" > A : komatsu at google.com, andrewcwest at gmail.com, asmus-inc at ix.netcom.com, charupdate at orange.fr > Copie ? : unicode at unicode.org > Objet : Re: Emoji characters for food allergens > > >> I'll try to respond to all, > > > Please don't. > > What Marcel wrote was as follows: > > quote > > I'll try to respond to all, having not much time outside my main concerns, sorry. > > end quote > > When I first read that, and indeed when I read it again after reading Andrew's comment, I read it as Marcel wishing that he could reply individually to each of several posts in this thread, but as he was busy, he would reply in just the one post, the post he was then writing, to various points. > > Thus there was no need to ask him not to do so, as he had already done it in that same post. > > As someone else has decided to post supporting the request, I reply that I enjoy reading Marcel's posts and that I hope that he continues. > > These are important issues for end users of encoding standards and for consumers generally as they are about food allergens and the labelling of food packaging. > > A request not to post and support for a request not to post without stating any reason whatsoever is, in my opinion, unfair. > > William Overington > > 31 July 2015 > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://unicode.org/pipermail/unicode/attachments/20150731/079ddeea/attachment.html>