From public at khwilliamson.com Wed Apr 1 12:08:46 2015 From: public at khwilliamson.com (Karl Williamson) Date: Wed, 01 Apr 2015 11:08:46 -0600 Subject: Meroitic cursive fractions numerical values In-Reply-To: <20150331103058.665a7a7059d7ee80bb4d670165c8327d.f6b0d19fa7.wbe@email03.secureserver.net> References: <20150331103058.665a7a7059d7ee80bb4d670165c8327d.f6b0d19fa7.wbe@email03.secureserver.net> Message-ID: <551C261E.80006@khwilliamson.com> On 03/31/2015 11:30 AM, Doug Ewell wrote: > Karl Williamson wrote: > >> It's a small matter to add code to reduce the UCD-specified rational >> numbers, but it's just one more complication to have to deal with >> along with the many that the UCD already presents, and if there is not >> a good reason the data for these new characters is specified contrary >> to mathematical convention, then the data should be changed instead of >> having to code around it. > > UAX #44, Section 5.9.1 says: > > | For all numeric properties, and for properties such as > | Unicode_Radical_Stroke which are constructed from combinations of > | numeric values, use loose matching rule UAX44-LM1 when comparing > | property values. > | > | UAX44-LM1. Apply numeric equivalences. > | ? "01.00" is equivalent to "1". > | ? "1.666667" in the UCD is a repeating fraction, and equivalent to > | "10/6" or "5/3". > > This strongly suggests that the implementation should be changed, not to > match the data, but to match the specification. Ok. I've made the change. Is it a problem that DerivedNumericValues.txt doesn't match UnicodeData.txt in this regard? (That is, the derived file comes with irreducible rationals) > > -- > Doug Ewell | http://ewellic.org | Thornton, CO ???? > > > _______________________________________________ > Unicode mailing list > Unicode at unicode.org > http://unicode.org/mailman/listinfo/unicode > From doug at ewellic.org Wed Apr 1 13:40:17 2015 From: doug at ewellic.org (Doug Ewell) Date: Wed, 01 Apr 2015 11:40:17 -0700 Subject: Meroitic cursive fractions numerical values Message-ID: <20150401114017.665a7a7059d7ee80bb4d670165c8327d.bf71ebbfc1.wbe@email03.secureserver.net> Karl Williamson wrote: > Is it a problem that DerivedNumericValues.txt doesn't match > UnicodeData.txt in this regard? (That is, the derived file comes with > irreducible rationals) Not if you follow the "loose matching" rule UAX44-LM1 (see earlier message) which says that mathematically equivalent (or nearly so) numbers should be treated as equivalent. UnicodeData-8.0.0d8.txt: 109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R;;;;6/12;N;;;;; DerivedNumericValues-8.0.0d10.txt: 109FB ; 0.5 ; ; 1/2 # No MEROITIC CURSIVE FRACTION SIX TWELFTHS -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From leob at mailcom.com Wed Apr 1 16:48:58 2015 From: leob at mailcom.com (Leo Broukhis) Date: Wed, 1 Apr 2015 14:48:58 -0700 Subject: Almost not a joke Message-ID: In light of http://www.unicode.org/reports/tr51/#Faces_Hands_Zodiac mentioning, among other things, ZIPPER-MOUTH FACE MONEY-MOUTH FACE FACE WITH THERMOMETER NERD FACE THINKING FACE FACE WITH ROLLING EYES UPSIDE-DOWN FACE FACE WITH HEAD-BANDAGE ROBOT FACE HUGGING FACE (some, like UPSIDE-DOWN FACE or ROBOT FACE, with obscure meanings), would today be a good day to ask for the FACEPALM emoji? Thanks, Leo From leob at mailcom.com Thu Apr 2 01:30:27 2015 From: leob at mailcom.com (Leo Broukhis) Date: Wed, 1 Apr 2015 23:30:27 -0700 Subject: Almost not a joke In-Reply-To: References: Message-ID: Apparently (h.t. to Ken) FACE PALM is in the plans: http://www.unicode.org/L2/L2015/15054r-emoji-tranche5.pdf Before the day is over, another related question: my blog post about upcoming emoji elicited a request for the SHRUG emoji. Any plans for that? Thanks, Leo On Wed, Apr 1, 2015 at 2:48 PM, Leo Broukhis wrote: > In light of http://www.unicode.org/reports/tr51/#Faces_Hands_Zodiac > mentioning, among other things, > > ZIPPER-MOUTH FACE > MONEY-MOUTH FACE > FACE WITH THERMOMETER > NERD FACE > THINKING FACE > FACE WITH ROLLING EYES > UPSIDE-DOWN FACE > FACE WITH HEAD-BANDAGE > ROBOT FACE > HUGGING FACE > > (some, like UPSIDE-DOWN FACE or ROBOT FACE, with obscure meanings), > > would today be a good day to ask for the FACEPALM emoji? > > Thanks, > Leo > -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Thu Apr 2 02:27:24 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Thu, 2 Apr 2015 09:27:24 +0200 Subject: Almost not a joke In-Reply-To: References: Message-ID: I wonder why there are no requests for common heraldic elements ("meubles" in French) as seen in coats of arms, seals, and flags. Their symbolic association is common enough to qualify as emojis, given this heradic semantic is shared across multiple flags and coats of arms from various cultures (before many people could even read and write in most countries: they recognized these symbols, including on battlefields where most soldiers were uneducated). These symbols have been added also in lots of written texts to exhibit at least their official status even if people could not read these texts! They are also present since long on coins and banknotes so people could recognize the origin currency and consider their associated value on markets also even if they can't read their written language. Many of them are already encoded (e.g. stars) but many are missing (the eagle and shark is present in this proposal, the fish is encoded, however flags and coats of arms use more specific species, other "animals" are the armed lion, the dragoon, the columbus symbol of peace, and various royal attributes, the balance symbol of justice, various war arms, religious symbols such as totems, egyptian cross and other old religions from the Middle-East to Asia and Africa). Later, colors (metals and furs) were also added but they were hard to reproduce in books and were only used in costly pieces of art, and a few banners for battlefields (before they became flags in common civil use). For most documents (including since long, coins and banknotes) they were only monochromatic seals which positioned these symbols, more or less decorated with various styled strokes, but very few letters or digits. For long, monochromatic seals have been more important in daily use than colorful flags (and they were also better substitutes than handwritten personal signatures based on people names that they did not know how to write correctly). 2015-04-02 8:30 GMT+02:00 Leo Broukhis : > Apparently (h.t. to Ken) FACE PALM is in the plans: > http://www.unicode.org/L2/L2015/15054r-emoji-tranche5.pdf > > Before the day > is over, another related question: my blog post about upcoming emoji > elicited a request for the SHRUG emoji. Any plans for that? > > Thanks, > Leo > > On Wed, Apr 1, 2015 at 2:48 PM, Leo Broukhis wrote: > >> In light of http://www.unicode.org/reports/tr51/#Faces_Hands_Zodiac >> mentioning, among other things, >> >> ZIPPER-MOUTH FACE >> MONEY-MOUTH FACE >> FACE WITH THERMOMETER >> NERD FACE >> THINKING FACE >> FACE WITH ROLLING EYES >> UPSIDE-DOWN FACE >> FACE WITH HEAD-BANDAGE >> ROBOT FACE >> HUGGING FACE >> >> (some, like UPSIDE-DOWN FACE or ROBOT FACE, with obscure meanings), >> >> would today be a good day to ask for the FACEPALM emoji? >> >> Thanks, >> Leo >> > > > _______________________________________________ > Unicode mailing list > Unicode at unicode.org > http://unicode.org/mailman/listinfo/unicode > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gwalla at gmail.com Thu Apr 2 21:39:41 2015 From: gwalla at gmail.com (Garth Wallace) Date: Thu, 2 Apr 2015 19:39:41 -0700 Subject: Meroitic cursive fractions numerical values In-Reply-To: References: <55170997.70109@khwilliamson.com> Message-ID: On Sunday, March 29, 2015, Andrew West wrote: > > Having said that, I note that the > numeric value of one character has been reduced in the Unicode data: > U+2189 VULGAR FRACTION ZERO THIRDS is given the numeric value of "0" > rather that "0/3". > Could that be because it's intended less as an actual fraction than as a shorthand for "0 out of 3" (for outs and strikes in baseball)? -------------- next part -------------- An HTML attachment was scrubbed... URL: From webalorixa at gmail.com Fri Apr 10 12:24:04 2015 From: webalorixa at gmail.com (Luis de la Orden) Date: Fri, 10 Apr 2015 18:24:04 +0100 Subject: =?UTF-8?Q?Fwd=3A_Combined_Yor=C3=B9b=C3=A1_characters_with_dot_below_and?= =?UTF-8?Q?_tonal_diacritics?= In-Reply-To: References: Message-ID: Hi to all in the list, This is my first post and apologies in advance if I make any mistakes. Today I enrolled as an individual member seeking to support the Unicode effort. I would like congratulate you all for the good work you are doing. You make the world a much better and easy place for many people out there. My journey into Unicode started a bit more than four years ago when I started playing with the creation of keyboard layouts that allowed me to write Brazilian Portuguese, my mother tongue, on a British keyboard by unlocking the latin accents already existent in that keyboard layout to write accented Portuguese characters. My interest widened to African languages who nowadays find themselves in the same situation even in their own geography and to make a long story short I also enabled the output of Yor?b? characters from a UK keyboard (Mac and PC): using the ALT/ALT GR keys to make e, o and s to output ?, ? and ? and allowing them to be tonalised with the accents in the UK keyboard changed to combining diacritics or dead keys combinations: ??, ??, ?, etc.. Whilst working in the creation of the layouts I realised combined characters (diacritic + character combined in one code) made life much easier as dead key outputs than using combining diacritics. The advantages and challenges I discovered are: 1. Dead keys (pressing accent key and then letter) are the way most (perhaps all) European keyboards work. Combining diacritics work the other way around, first one types the letter then the combining diacritic. There is an element of familiarity that is lost with using combining characters; 2. The dead key layout system prevents diacritics piling up on top of a character if you press them more than once, something essential for less technology-savvy typists as it limits the amount of mistakes one could make. You also avoid getting all and any character accented as it would happen with combining diacritics; 3. In the African techno-social context where local languages have to be typed from an European keyboard, if one decides to use the single quote as a rising tonal, making it a combining acute, they will lose the single quote forever. As a dead key the single quote will behave as an acute or tonal acute if it comes followed by the vowels and consonants you chose to, otherwise if followed by space it works as a single quote again. 4. In Windows 8 and probably earlier, combining diacritics (one code) added to a character (another code) misalign when cut and pasted from one document to another. If I typed ?? (capital letter e with dot below and combining acute) in MS Word and copied to Excel or vice-versa, the rendering would display something like ?'. 5. Both Windows and Mac sometimes re-adjust the line spacing and consequently length when one uses combining diacritics, which makes the line shrink or expand. Terrible if you are dyslexic. Needless to say that in my experience so far, dead keys are the most friendly, familiar and supported way to produce accented or tonalised characters. You might be asking, so why don't you go on and use dead keys from now on and be happy? There is a limitation with dead keys, the combination of two characters (accent and character) can only output one code. In the case of Yor?b?, I could go on setting the dead key combinations for: ?, ?, ?, ?, ? and even ? as they have one single code for the tonalised/accented character but I wouldn't be able to do create a dead key for ??, ??, etc..., as they don't have a single code combining the character (e with dot below) and the diacritic (combining acute). We need a (e with dot below with acute) in order to make this work well for Yor?b?. If you are still reading this I would like to submit a proposal for the creation of the following: ?? - LATIN SMALL LETTER E WITH DOT BELOW WITH ACUTE TONE MARK ?? - LATIN SMALL LETTER O WITH DOT BELOW WITH ACUTE TONE MARK ?? - LATIN CAPITAL LETTER E WITH DOT BELOW WITH ACUTE TONE MARK ?? - LATIN CAPITAL LETTER O WITH DOT BELOW WITH ACUTE TONE MARK ?? - LATIN SMALL LETTER E WITH DOT BELOW WITH GRAVE TONE MARK ?? - LATIN SMALL LETTER O WITH DOT BELOW WITH GRAVE TONE MARK ?? - LATIN CAPITAL LETTER E WITH DOT BELOW WITH GRAVE TONE MARK ?? - LATIN CAPITAL LETTER O WITH DOT BELOW WITH GRAVE TONE MARK Would you be very kind to provide any advice on whether you think this would be acceptable for submission? Many thanks to all, Luis Morais -------------- next part -------------- An HTML attachment was scrubbed... URL: From dzo at bisharat.net Fri Apr 10 13:00:05 2015 From: dzo at bisharat.net (dzo at bisharat.net) Date: Fri, 10 Apr 2015 18:00:05 +0000 Subject: =?Windows-1252?B?UmU6IEZ3ZDogQ29tYmluZWQgWW9y+WLhIGNoYXJhY3RlcnMgd2l0aCBkb3QgYmVsb3cgYW5kIHRvbmFsIGRpYWNyaXRpY3M=?= In-Reply-To: References:

Message-ID: <859719802-1428688807-cardhu_decombobulator_blackberry.rim.net-194887692-@b13.c4.bise6.blackberry> Hi Luis, This harks back to discussions some years ago on this list and the old A12n-collaboration list. The short answer, which I assume is still valid, is that Unicode will not encode more "precomposed" characters such as you propose. That said, you highlight ongoing issues with what I've called "category 4" Latin orthographies, which include extended Latin characters plus combining diacritics. It has been a while, but one workaround proposed was for glyphs representing a base character plus combining diacritics. Perhaps someone else has more recent information than I do re that concept. Don Osborn Sent via BlackBerry by AT&T -----Original Message----- From: Luis de la Orden Sender: "Unicode" Date: Fri, 10 Apr 2015 18:24:04 To: Subject: Fwd: Combined Yor?b? characters with dot below and tonal diacritics Hi to all in the list, This is my first post and apologies in advance if I make any mistakes. Today I enrolled as an individual member seeking to support the Unicode effort. I would like congratulate you all for the good work you are doing. You make the world a much better and easy place for many people out there. My journey into Unicode started a bit more than four years ago when I started playing with the creation of keyboard layouts that allowed me to write Brazilian Portuguese, my mother tongue, on a British keyboard by unlocking the latin accents already existent in that keyboard layout to write accented Portuguese characters. My interest widened to African languages who nowadays find themselves in the same situation even in their own geography and to make a long story short I also enabled the output of Yor?b? characters from a UK keyboard (Mac and PC): using the ALT/ALT GR keys to make e, o and s to output ?, ? and ? and allowing them to be tonalised with the accents in the UK keyboard changed to combining diacritics or dead keys combinations: ??, ??, ?, etc.. Whilst working in the creation of the layouts I realised combined characters (diacritic + character combined in one code) made life much easier as dead key outputs than using combining diacritics. The advantages and challenges I discovered are: 1. Dead keys (pressing accent key and then letter) are the way most (perhaps all) European keyboards work. Combining diacritics work the other way around, first one types the letter then the combining diacritic. There is an element of familiarity that is lost with using combining characters; 2. The dead key layout system prevents diacritics piling up on top of a character if you press them more than once, something essential for less technology-savvy typists as it limits the amount of mistakes one could make. You also avoid getting all and any character accented as it would happen with combining diacritics; 3. In the African techno-social context where local languages have to be typed from an European keyboard, if one decides to use the single quote as a rising tonal, making it a combining acute, they will lose the single quote forever. As a dead key the single quote will behave as an acute or tonal acute if it comes followed by the vowels and consonants you chose to, otherwise if followed by space it works as a single quote again. 4. In Windows 8 and probably earlier, combining diacritics (one code) added to a character (another code) misalign when cut and pasted from one document to another. If I typed ?? (capital letter e with dot below and combining acute) in MS Word and copied to Excel or vice-versa, the rendering would display something like ?'. 5. Both Windows and Mac sometimes re-adjust the line spacing and consequently length when one uses combining diacritics, which makes the line shrink or expand. Terrible if you are dyslexic. Needless to say that in my experience so far, dead keys are the most friendly, familiar and supported way to produce accented or tonalised characters. You might be asking, so why don't you go on and use dead keys from now on and be happy? There is a limitation with dead keys, the combination of two characters (accent and character) can only output one code. In the case of Yor?b?, I could go on setting the dead key combinations for: ?, ?, ?, ?, ? and even ? as they have one single code for the tonalised/accented character but I wouldn't be able to do create a dead key for ??, ??, etc..., as they don't have a single code combining the character (e with dot below) and the diacritic (combining acute). We need a (e with dot below with acute) in order to make this work well for Yor?b?. If you are still reading this I would like to submit a proposal for the creation of the following: ?? - LATIN SMALL LETTER E WITH DOT BELOW WITH ACUTE TONE MARK ?? - LATIN SMALL LETTER O WITH DOT BELOW WITH ACUTE TONE MARK ?? - LATIN CAPITAL LETTER E WITH DOT BELOW WITH ACUTE TONE MARK ?? - LATIN CAPITAL LETTER O WITH DOT BELOW WITH ACUTE TONE MARK ?? - LATIN SMALL LETTER E WITH DOT BELOW WITH GRAVE TONE MARK ?? - LATIN SMALL LETTER O WITH DOT BELOW WITH GRAVE TONE MARK ?? - LATIN CAPITAL LETTER E WITH DOT BELOW WITH GRAVE TONE MARK ?? - LATIN CAPITAL LETTER O WITH DOT BELOW WITH GRAVE TONE MARK Would you be very kind to provide any advice on whether you think this would be acceptable for submission? Many thanks to all, Luis Morais -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Fri Apr 10 13:30:22 2015 From: doug at ewellic.org (Doug Ewell) Date: Fri, 10 Apr 2015 11:30:22 -0700 Subject: Combined =?UTF-8?Q?Yor=C3=B9b=C3=A1=20characters=20with=20dot=20below?= =?UTF-8?Q?=20and=20tonal=20diacritics?= Message-ID: <20150410113022.665a7a7059d7ee80bb4d670165c8327d.6410f83a0d.wbe@email03.secureserver.net> Luis de la Orden wrote: > 4. In Windows 8 and probably earlier, combining diacritics (one code) > added to a character (another code) misalign when cut and pasted from > one document to another. If I typed ?? (capital letter e with dot > below and combining acute) in MS Word and copied to Excel or vice- > versa, the rendering would display something like ?'. This is almost always a font problem. Try experimenting with different fonts and notice how some do much better than others. On Windows 7, using Segoe UI, all of the combinations of {e, o, E, O} plus dot-below plus {acute, grave} that you mentioned look just about perfect. Windows 8 usually does at least as well. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From webalorixa at gmail.com Fri Apr 10 19:19:23 2015 From: webalorixa at gmail.com (Luis de la Orden) Date: Sat, 11 Apr 2015 01:19:23 +0100 Subject: =?UTF-8?Q?Re=3A_Combined_Yor=C3=B9b=C3=A1_characters_with_dot_below_and_?= =?UTF-8?Q?tonal_diacritics?= In-Reply-To: <20150410113022.665a7a7059d7ee80bb4d670165c8327d.6410f83a0d.wbe@email03.secureserver.net> References: <20150410113022.665a7a7059d7ee80bb4d670165c8327d.6410f83a0d.wbe@email03.secureserver.net> Message-ID: Dear all, Many thanks for your responses which have given me enough insight to look into several different ways of achieving what I want to do! @Tom and @Don I can see the logic behind stopping the creation of pre-composed characters and agree with it, it is just not sustainable. @Tom Thanks for challenging my understanding of dead keys. I have a layout in my Mac that works like a charm to write Yoru?ba?, Portuguese and Spanish with the UK layout. I am having trouble with the Windows layout and should have mentioned that more clearly. Nevertheless, I was using Microsoft Keyboard Layout Creator and assumed that the limitations of the software (or the limitations of my knowledge of the software) were the limitations of the technology as a whole. I have just got myself KBDedit Premium and realised the existence of ligatures which I will give a try, also got Keyboard Layout Manager 2000, I will learn best with two tools. @Doug I will check the fonts. @Don I got your book btw. It inspired me a lot. Apologies for the plug, I wrote an article on my impressions on the matter of the usage of African languages by natives after I read your book ( https://www.linkedin.com/pulse/why-i-believe-almost-all-african-languages-endangered-luis-morais). The article was also instigated by a chat I had with Jimmy Wales from Wikipedia in 2012 about the reason behind the small amount of African language content in Wikipedia and several conversations with Yoru?ba?s living in London. @All I am not sure if I am overstretching the purpose of this mailing list but I was wondering whether someone would know how I could get started to bring Yoru?ba? to OCR. I have self-funded the digitalisation of a couple of dictionaries whose entries at the moment are being tonalised manually (mainly because OCR doesn't recognise Yoru?ba? words from the PDF'ed dictionaries but hoping to end this catch-22). On top of OCR, when this glossary is fully tonalised it can be used to power all the kind of digital functions western languages already enjoy such as auto-correct and so forth. Many thanks to all of you, Luis (Louie) - https://uk.linkedin.com/in/empathlabs Currently involved with the creation of www.yorubaname.com (led by the Fullbright Fellow and Linguist K??la? Tubosun) - On 10 April 2015 at 19:30, Doug Ewell wrote: > Luis de la Orden wrote: > > > 4. In Windows 8 and probably earlier, combining diacritics (one code) > > added to a character (another code) misalign when cut and pasted from > > one document to another. If I typed ?? (capital letter e with dot > > below and combining acute) in MS Word and copied to Excel or vice- > > versa, the rendering would display something like ?'. > > This is almost always a font problem. Try experimenting with different > fonts and notice how some do much better than others. > > On Windows 7, using Segoe UI, all of the combinations of {e, o, E, O} > plus dot-below plus {acute, grave} that you mentioned look just about > perfect. Windows 8 usually does at least as well. > > -- > Doug Ewell | http://ewellic.org | Thornton, CO ???? > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nospam-abuse at ilyaz.org Sat Apr 11 16:50:53 2015 From: nospam-abuse at ilyaz.org (Ilya Zakharevich) Date: Sat, 11 Apr 2015 14:50:53 -0700 Subject: Combined =?iso-8859-1?Q?Yor=F9b?= =?iso-8859-1?Q?=E1?= characters with dot below and tonal diacritics In-Reply-To: References: <20150410113022.665a7a7059d7ee80bb4d670165c8327d.6410f83a0d.wbe@email03.secureserver.net> Message-ID: <20150411215053.GA8041@math.berkeley.edu> On Sat, Apr 11, 2015 at 01:19:23AM +0100, Luis de la Orden wrote: > Thanks for challenging my understanding of dead keys. I have a layout in my > Mac that works like a charm to write Yoru?ba?, Portuguese and Spanish with > the UK layout. I am having trouble with the Windows layout and should have > mentioned that more clearly. Nevertheless, I was using Microsoft Keyboard > Layout Creator and assumed that the limitations of the software (or the > limitations of my knowledge of the software) were the limitations of the > technology as a whole. I see no problem with using MSKLC with Yor?b?. Just make AltGr-e, AltGr-o, AltGr-s produce e?, o?, and s?. Then make AltGr--, AltGr-' and AltGr-` into prefix keys (deadkeys) converting characters into accented forms. IIRC, this would work fine also with ?base keys? producing Unicode clusters (like those above) (check in the document below). For details, see the corresponding sections of http://search.cpan.org/~ilyaz/UI-KeyboardLayout/lib/UI/KeyboardLayout.pm [I do not think the ?standard? keyboard input on Windows is documented anywhere else :-( ]. Hope this helps, Ilya From lang.support at gmail.com Sat Apr 11 22:06:51 2015 From: lang.support at gmail.com (Andrew Cunningham) Date: Sun, 12 Apr 2015 13:06:51 +1000 Subject: =?UTF-8?Q?Re=3A_Combined_Yor=C3=B9b=C3=A1_characters_with_dot_below_and_?= =?UTF-8?Q?tonal_diacritics?= In-Reply-To: <20150411215053.GA8041@math.berkeley.edu> References: <20150410113022.665a7a7059d7ee80bb4d670165c8327d.6410f83a0d.wbe@email03.secureserver.net> <20150411215053.GA8041@math.berkeley.edu> Message-ID: Hi Ilya, The problem with approach documented below is two fold: 1) the characters required do not all exist as precomposed characters thus microsoft's dead key sequences will not work for yoruba. 2) certaon alt-gr sequences are not quaranteed to work in all programs. Some programs treat the Alt-Gr sequence as the equivalent to the Alt key sequence. With program shortcuts overriding keyboard input. >From memory this was a problem we would have with MS Word. Care needs to be taken selecting AltGr sequences to implement in keyboard. And adding frequently typed characters like vowels and tone marks to altgr is usually a bad idea. Easier to move less needed sequences to the altgr state putting feequently type characters on the normal and shift states Andrew On Sunday, 12 April 2015, Ilya Zakharevich wrote: > On Sat, Apr 11, 2015 at 01:19:23AM +0100, Luis de la Orden wrote: >> Thanks for challenging my understanding of dead keys. I have a layout in my >> Mac that works like a charm to write Yoru?ba?, Portuguese and Spanish with >> the UK layout. I am having trouble with the Windows layout and should have >> mentioned that more clearly. Nevertheless, I was using Microsoft Keyboard >> Layout Creator and assumed that the limitations of the software (or the >> limitations of my knowledge of the software) were the limitations of the >> technology as a whole. > > I see no problem with using MSKLC with Yor?b?. Just make > AltGr-e, AltGr-o, AltGr-s > produce > e?, o?, and s?. > Then make AltGr--, AltGr-' and AltGr-` into prefix keys (deadkeys) > converting characters into accented forms. IIRC, this would work fine > also with ?base keys? producing Unicode clusters (like those above) > (check in the document below). > > For details, see the corresponding sections of > http://search.cpan.org/~ilyaz/UI-KeyboardLayout/lib/UI/KeyboardLayout.pm > [I do not think the ?standard? keyboard input on Windows is documented > anywhere else :-( ]. > > Hope this helps, > Ilya > -- Andrew Cunningham Project Manager, Research and Development (Social and Digital Inclusion) Public Libraries and Community Engagement State Library of Victoria 328 Swanston Street Melbourne VIC 3000 Australia Ph: +61-3-8664-7430 Mobile: 0459 806 589 Email: acunningham at slv.vic.gov.au lang.support at gmail.com http://www.openroad.net.au/ http://www.mylanguage.gov.au/ http://www.slv.vic.gov.au/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From nospam-abuse at ilyaz.org Sat Apr 11 23:52:05 2015 From: nospam-abuse at ilyaz.org (Ilya Zakharevich) Date: Sat, 11 Apr 2015 21:52:05 -0700 Subject: Combined =?iso-8859-1?Q?Yor=F9b?= =?iso-8859-1?Q?=E1?= characters with dot below and tonal diacritics In-Reply-To: References: <20150410113022.665a7a7059d7ee80bb4d670165c8327d.6410f83a0d.wbe@email03.secureserver.net> <20150411215053.GA8041@math.berkeley.edu> Message-ID: <20150412045205.GA10644@math.berkeley.edu> On Sun, Apr 12, 2015 at 01:06:51PM +1000, Andrew Cunningham wrote: > The problem with approach documented below is two fold: > 1) the characters required do not all exist as precomposed characters thus > microsoft's dead key sequences will not work for yoruba. As I explained in my mail, this is wrong. > 2) certaon alt-gr sequences are not quaranteed to work in all programs. > Some programs treat the Alt-Gr sequence as the equivalent to the Alt key > sequence. With program shortcuts overriding keyboard input. Some programs are broken. This is a fact of life. This should not be an issue to discuss here. (They may be broken with AltGr. They may be broken with deadkeys.) > From memory this was a problem we would have with MS Word. I have no experience with MS programs; however, I doubt your conclusion very much. > And adding frequently typed characters like vowels and tone marks to altgr > is usually a bad idea. Who cares? As far as it works? > Easier to move less needed sequences to the altgr > state putting feequently type characters on the normal and shift > states If you have 400-keys keyboard???fine with you. However, with Yoru?ba?, this may be even feasible, since there are many Latin characters excluded. The approach I explained does not require AltGr. It is just the logic of combining a prefix key with a key producing a cluster. Ilya P.S. If it was not clear, the AltGr-keys in my initial message should produce combinations with U+0329. From verdy_p at wanadoo.fr Sun Apr 12 00:07:01 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sun, 12 Apr 2015 07:07:01 +0200 Subject: =?UTF-8?Q?Re=3A_Combined_Yor=C3=B9b=C3=A1_characters_with_dot_below_and_?= =?UTF-8?Q?tonal_diacritics?= In-Reply-To: References: <20150410113022.665a7a7059d7ee80bb4d670165c8327d.6410f83a0d.wbe@email03.secureserver.net> <20150411215053.GA8041@math.berkeley.edu> Message-ID: 2015-04-12 5:06 GMT+02:00 Andrew Cunningham : > > Hi Ilya, > > The problem with approach documented below is two fold: > > 1) the characters required do not all exist as precomposed characters thus microsoft's dead key sequences will not work for yoruba. It's effectively a good catch that MSKLC (version 1.4) cannot produce sequences of characters (just one code point) with dead key sequences or anywhere it its keymap (independantly of the current keyboard state). This is effectively a strong limitation. Dropping that limitation would require updating the parsing format of the *.klc files to accept strings or sequences of codepoints, and also update the internal code of its generated driver so that it can map to a string (that the driver at run time will send with multiple WM_CHAR events) For now, all you can do is to generate a keymap that generates single code points, and map additional keys for the combining characters needed. May be Michael S. Kaplan (author of MSKLC at Microsoft) could review that limitation and propose an upgrade so that MSKLC would really be compliant with the Unicode character encoding policy without forcing users to enter every code point separately and map them separately on their keymaps. Also drivers generated by MSKLC 1.4 do not fully comply with Windows 8 and Windows Server 2012: notably they have serious issues when used with a touche interface (pressing AltGr + one character for example with the onscreen maintains the Altgr state active if we don't type another character WITHOUT the AltGr character; other keyboard states are also mixed, including SHIFT, CAPSLOCK, where they do not match the state for the physical keyboard). The problem being that if you use the onscreen keyboard to locate a character and then type with the physical keyboard, the physical keyboard will continue to use that state (in my opinion it is a bug of Windows 8 and Windows Server 2012, this bug does not exist in Windows 7 when we use the its onscreen keyboard!) Windows 8 nad Windows server 2012 also have TWO separate onscreen keyboards that do not work the same way. Note also that drivers generated by MSKLC 1.4 lack some data need to map correctly with an onscreen geometry (these drivers are only based on "VK_*" virtual key numeric identifiers there's no data in the MSKL drivers to map the virtual keys on a geometric layout, Windows 8 and Windows Server just provide a default mapping of vkeys using some default geometries (but it fact it cannot correctly infer the effective geometry to use, such as the form and placement of the return key, and keys around it). MSKLC does not provide a way to build another geometry and map geometric keys to vkeys (or the revers). --- Note also that (since always), MSKLC generated drivers have never allowed us to change the mapping of scancodes (from hardware keyboards) to virtual keys, aka "vkeys", or to "WM_SYSKEY" (this is hardwired in a lower internal level). These drivers only map sequences of one or more "vkeys" (and a few supported states, it's not possible to add keyboard states other than CTRL, SHIFT, CAPSLOCK, ALTGR2, and custom states for dead keys) to only one WM_CHAR. And it's not possible to change the mapping of vkeys to WM_SYSCHAR (this is also hardwired at a lower level). -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Sun Apr 12 01:46:31 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sun, 12 Apr 2015 08:46:31 +0200 Subject: =?UTF-8?Q?Re=3A_Combined_Yor=C3=B9b=C3=A1_characters_with_dot_below_and_?= =?UTF-8?Q?tonal_diacritics?= In-Reply-To: <20150411215053.GA8041@math.berkeley.edu> References: <20150410113022.665a7a7059d7ee80bb4d670165c8327d.6410f83a0d.wbe@email03.secureserver.net> <20150411215053.GA8041@math.berkeley.edu> Message-ID: 2015-04-11 23:50 GMT+02:00 Ilya Zakharevich : > For details, see the corresponding sections of > > http://search.cpan.org/~ilyaz/UI-KeyboardLayout/lib/UI/KeyboardLayout.pm That doc page on CPAN has *all* the links pointing to the former MSDN blog "archives" not pointing anywhere (they all redirect now to " http://blogs.msdn.com/msgs/?messageid=6" displaying "page not found"). The archives are apparently lost, or no longer directed correctly by Microsoft where they have been relocated... if they still exist somewhere Microsoft apparently does not know what is an "archive", why did he break them? (or may be Michael S. Kaplan deleted them if this was his personal account, or deleted his former personal account on the former MSDN blogs). Well that CPAN doc page is also full of junks, with considerations about a particular layout design for extended Latin, that should have been placed on a separate page for that layout. -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Sun Apr 12 03:18:02 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sun, 12 Apr 2015 09:18:02 +0100 Subject: Combined =?ISO-8859-1?B?WW9y+WLh?= characters with dot below and tonal diacritics In-Reply-To: References: <20150410113022.665a7a7059d7ee80bb4d670165c8327d.6410f83a0d.wbe@email03.secureserver.net> <20150411215053.GA8041@math.berkeley.edu>

Message-ID: <20150412091802.5991570b@JRWUBU2> On Sat, 11 Apr 2015 21:52:05 -0700 Ilya Zakharevich wrote: > P.S. If it was not clear, the AltGr-keys in my initial message should > produce combinations with U+0329. And that is why it won't work. The dead key mechanism is based on trios of dead key, plain character and composed character, but the plain character and the composed character have to be single codepoints in the BMP. Dead keys don't work for cuneiform! On Sun, 12 Apr 2015 07:07:01 +0200 Philippe Verdy wrote: > It's effectively a good catch that MSKLC (version 1.4) cannot produce > sequences of characters (just one code point) with dead key sequences > or anywhere it its keymap (independantly of the current keyboard > state). > May be Michael S. Kaplan (author of MSKLC at Microsoft) could review > that limitation and propose an upgrade so that MSKLC would really be > compliant with the Unicode character encoding policy without forcing > users to enter every code point separately and map them separately on > their keymaps. My understanding is that the driver used is now obsolete ('legacy'), and one should write a keyboard using the Text Services Framework, as Tavultesoft keyman does. Conceptually, a dead key takes up a modified key for a single character. It is typed in an unnatural order to mimic the limitation of a mechanical typewriter. It's perhaps taken on a life of its own as a way of providing another modifier, and it does also support an obsession with writing text in NFC. Richard. From nospam-abuse at ilyaz.org Sun Apr 12 04:27:09 2015 From: nospam-abuse at ilyaz.org (Ilya Zakharevich) Date: Sun, 12 Apr 2015 02:27:09 -0700 Subject: Combined =?iso-8859-1?Q?Yor=F9b?= =?iso-8859-1?Q?=E1?= characters with dot below and tonal diacritics In-Reply-To: References: <20150410113022.665a7a7059d7ee80bb4d670165c8327d.6410f83a0d.wbe@email03.secureserver.net> <20150411215053.GA8041@math.berkeley.edu>

Message-ID: <20150412092709.GA11913@math.berkeley.edu> On Sun, Apr 12, 2015 at 07:07:01AM +0200, Philippe Verdy wrote: > > 1) the characters required do not all exist as precomposed characters > thus microsoft's dead key sequences will not work for yoruba. (As I said in my other email, the conclusion is wrong.) > It's effectively a good catch that MSKLC (version 1.4) cannot produce > sequences of characters (just one code point) with dead key sequences or > anywhere it its keymap (independantly of the current keyboard state). Irrelevant???due to the way the kernel processes a combination of 1. Prefix key 2. Key producing a multi-char string. > This is effectively a strong limitation. Dropping that limitation would > require updating the parsing format of the *.klc files to accept strings or > sequences of codepoints, and also update the internal code of its generated > driver so that it can map to a string (that the driver at run time will > send with multiple WM_CHAR events) This shows that you have no clue about this topic. Format of .klc files is absolutely irrelevant here. A .klc file is just one of preprocessor steps???and the final result is a static table controlling the kernel. It is the format of THIS TABLE which is the relevant limitation. And as far as the topics we discuss here go, the .klc covers all the relevant features of this table. (See microsoft?s header files for details: they are in kbd.h, with finer details documented in the documentation for my Perl module.) > For now, all you can do is to generate a keymap that generates single code > points, Wrong. > and map additional keys for the combining characters needed. Wrong???there is nothing special for combining characters. Deadkeys (prefix keys) map UTF-16 codepoints to UTF-16 codepoints???but the way this interacts with multi-codepoint keys makes the Yor?b? input possible. > MSKLC does not provide a way to build another geometry and map geometric > keys to vkeys (or the revers). Again, this has nothing to do with MSKLC. > Note also that (since always), MSKLC generated drivers have never allowed > us to change the mapping of scancodes (from hardware keyboards) to virtual > keys, aka "vkeys", or to "WM_SYSKEY" (this is hardwired in a lower internal > level). Wrong. Look for any French or German keyboard. > These drivers only map sequences of one or more "vkeys" (and a few > supported states, it's not possible to add keyboard states other than CTRL, > SHIFT, CAPSLOCK, ALTGR2, and custom states for dead keys) How do you think I do it in my layout? > to only one WM_CHAR. I have no idea why you would mix in WM_* stuff into this discussion? > And it's not possible to change the mapping of vkeys to WM_SYSCHAR > (this is also hardwired at a lower level). I have no clue what you are talking about now? Ilya From nospam-abuse at ilyaz.org Sun Apr 12 04:37:31 2015 From: nospam-abuse at ilyaz.org (Ilya Zakharevich) Date: Sun, 12 Apr 2015 02:37:31 -0700 Subject: Combined =?iso-8859-1?Q?Yor=F9b?= =?iso-8859-1?Q?=E1?= characters with dot below and tonal diacritics In-Reply-To: References: <20150410113022.665a7a7059d7ee80bb4d670165c8327d.6410f83a0d.wbe@email03.secureserver.net> <20150411215053.GA8041@math.berkeley.edu> Message-ID: <20150412093731.GC11913@math.berkeley.edu> On Sun, Apr 12, 2015 at 08:46:31AM +0200, Philippe Verdy wrote: > Well that CPAN doc page is also full of junks, with considerations about a > particular layout design for extended Latin, that should have been placed > on a separate page for that layout. If you think it is junk, please write a better one. Thanks, Ilya From lang.support at gmail.com Sun Apr 12 05:38:52 2015 From: lang.support at gmail.com (Andrew Cunningham) Date: Sun, 12 Apr 2015 20:38:52 +1000 Subject: =?UTF-8?Q?Re=3A_Combined_Yor=C3=B9b=C3=A1_characters_with_dot_below_and_?= =?UTF-8?Q?tonal_diacritics?= In-Reply-To: <20150412092709.GA11913@math.berkeley.edu> References: <20150410113022.665a7a7059d7ee80bb4d670165c8327d.6410f83a0d.wbe@email03.secureserver.net> <20150411215053.GA8041@math.berkeley.edu>

<20150412092709.GA11913@math.berkeley.edu> Message-ID: On 12/04/2015 7:27 PM, "Ilya Zakharevich" wrote: > > On Sun, Apr 12, 2015 at 07:07:01AM +0200, Philippe Verdy wrote: > > > MSKLC does not provide a way to build another geometry and map geometric > > keys to vkeys (or the revers). > > Again, this has nothing to do with MSKLC. > If you are compiling a keyboard driver from source, then it has nothing to do with MSKLC. But for a general answer, for the average user who needs to develop a keyboard, then MSKLC is very pertinent. > > Note also that (since always), MSKLC generated drivers have never allowed > > us to change the mapping of scancodes (from hardware keyboards) to virtual > > keys, aka "vkeys", or to "WM_SYSKEY" (this is hardwired in a lower internal > > level). > > Wrong. Look for any French or German keyboard. Microsoft has a tendency never to change a keyboard or how it operates, there is a lot of bad design decisions and cruft that is still there. Just because something can be done, doesn't mean it should be done. > > > These drivers only map sequences of one or more "vkeys" (and a few > > supported states, it's not possible to add keyboard states other than CTRL, > > SHIFT, CAPSLOCK, ALTGR2, and custom states for dead keys) > > How do you think I do it in my layout? > There are Microft keyboard layouts that use other states, the Canadian multilingual keyboard comes to mind, mainly to comply with a canadian standard. But microsoft themselves recommend remaining to the four keyboard states Phillipe lists. > > to only one WM_CHAR. > > I have no idea why you would mix in WM_* stuff into this discussion? > Depending on your perspective it is pertinent or not. > > And it's not possible to change the mapping of vkeys to WM_SYSCHAR > > (this is also hardwired at a lower level). > > I have no clue what you are talking about now? > Andrew -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Sun Apr 12 05:40:20 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sun, 12 Apr 2015 11:40:20 +0100 Subject: Combined =?ISO-8859-1?B?WW9y+WLh?= characters with dot below and tonal diacritics In-Reply-To: <20150412092709.GA11913@math.berkeley.edu> References: <20150410113022.665a7a7059d7ee80bb4d670165c8327d.6410f83a0d.wbe@email03.secureserver.net> <20150411215053.GA8041@math.berkeley.edu>

<20150412092709.GA11913@math.berkeley.edu> Message-ID: <20150412114020.0582fc55@JRWUBU2> On Sun, 12 Apr 2015 02:27:09 -0700 Ilya Zakharevich wrote: > On Sun, Apr 12, 2015 at 07:07:01AM +0200, Philippe Verdy wrote: > > and map additional keys for the combining characters needed. > Wrong???there is nothing special for combining characters. He didn't say there was. > Deadkeys > (prefix keys) map UTF-16 codepoints to UTF-16 codepoints???but the way > this interacts with multi-codepoint keys makes the Yor?b? input > possible. How does it interact? I'm guessing that the first character of the 'ligature' combines with the dead key, and the subsequent characters of the ligature are not lost. Is this documented anywhere? Richard. From verdy_p at wanadoo.fr Sun Apr 12 06:47:36 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sun, 12 Apr 2015 13:47:36 +0200 Subject: =?UTF-8?Q?Re=3A_Combined_Yor=C3=B9b=C3=A1_characters_with_dot_below_and_?= =?UTF-8?Q?tonal_diacritics?= In-Reply-To: <20150412092709.GA11913@math.berkeley.edu> References: <20150410113022.665a7a7059d7ee80bb4d670165c8327d.6410f83a0d.wbe@email03.secureserver.net> <20150411215053.GA8041@math.berkeley.edu>

<20150412092709.GA11913@math.berkeley.edu> Message-ID: 2015-04-12 11:27 GMT+02:00 Ilya Zakharevich : > > It's effectively a good catch that MSKLC (version 1.4) cannot produce > > sequences of characters (just one code point) with dead key sequences or > > anywhere it its keymap (independantly of the current keyboard state). > > Irrelevant ? due to the way the kernel processes a combination of > 1. Prefix key > 2. Key producing a multi-char string. > This is *your* response is not releant, I did not speak about the *kernel* but about *MSKLC* (version 1.4) which *does not* support mapping keys to more than one code point ! You did not read correctly. It's simply impossible with MSKLC which cannot generate such mappings even if the kernel support them -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Sun Apr 12 07:22:08 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sun, 12 Apr 2015 14:22:08 +0200 Subject: =?UTF-8?Q?Re=3A_Combined_Yor=C3=B9b=C3=A1_characters_with_dot_below_and_?= =?UTF-8?Q?tonal_diacritics?= In-Reply-To: <20150412092709.GA11913@math.berkeley.edu> References: <20150410113022.665a7a7059d7ee80bb4d670165c8327d.6410f83a0d.wbe@email03.secureserver.net> <20150411215053.GA8041@math.berkeley.edu>

<20150412092709.GA11913@math.berkeley.edu> Message-ID: 2015-04-12 11:27 GMT+02:00 Ilya Zakharevich : > > Note also that (since always), MSKLC generated drivers have never allowed > > us to change the mapping of scancodes (from hardware keyboards) to > virtual > > keys, aka "vkeys", or to "WM_SYSKEY" (this is hardwired in a lower > internal > > level). > > Wrong. Look for any French or German keyboard. > Wrong. I use a French keyboard everyday. And even used MSKLC for extending this French keyboard (but now with weird compatibility problems in Windows Server 2012 and Windows 8+, this is a separate issue). Seriously, how can you imagine that I never looked at a French keyboard ? You're still speaking about something else, I just spoke about MSKLC which has not been revized at all to generate compatible drivers. But even if it did, it DOES NOT support the extended interface in "kdh.h" (which is NOT the ".klc" format that MSKLC supports) And yes, MSKLC *does not* allow changing the mapping of scancodes (from hardware keyboards) to vkeys, it just allows editing the mapping from sequences of vkeys(+limited keyboard states) to ONE and only ONE code point. the first part is not accessible (and not described at all in .klc source files which have nothing in common to the internal binary structures generated in installable keyboard drivers, and used then by the kernel) and it offers absolutely no way to describe a layout geometry (as needed for touch interfaces) where we can reposition the vkeys accurately (for example for typing with left-hand only, or for an ABCD disposition, or for defining other custom input methods similar to phones using 10 digits or using predictive "T9" methods, or for creating specific selection grid layouts for symbols and emojis with mutliple selectable sets): MSKLC still assumes legacy 101/102 key geometries (and does not in fact distinguish the numeric keypad, with the *only* exception of the decimal separator key which is distinguished). And there's no way to change the layout of "function keys" (those keys that are show with a black/dark grey background in this editor): you can only defined mappings for the vkeys in the alphanumeric part plus the spacebar. Everything else is assumed (there's also no distinction of the two Alt keys for keymaps that want to distinguish Ctrl+LeftAlt and Ctrl+AltGr which may also be Shifted) In fact the *.klc source file format (specific to the MSKLC tool) is not even documented (and not supported at all by "klc.h" which just describes the binary structures: the kernel has abosutely no support for processing .klc files directly, it just wants the binary-encoded tables exposed by the driver that must be compiled). And did I say that the MSKLC tool does not even worok in Windows 8+ or Windows Server 2012 +? It cannot compile the driver (fails to launch the linker of the Windows SDK, even if it's correctly installed). To run this tool you still need to use it in previous versions of Windows. The generated driver however is installable in later versions of Windows (with some quirks to coexist with the onscreen interface). And even in Windows 8 and Windows Server 1012 here are TWO distinct touch interfaces: - one compatible with accessibility options tools (works even without a touch screen, you can click on it with your mouse, or touch it), - another newer one specific for touch screens (not available without a touch screen, but when it exists, its really behaves differently but with too frequent bugs if you use it simultaneously with a physical keyboard, notably each time you have used it to touch the onscreen shift keys, whose state is incorrectly defined; the only way to solve these bugs in Windows 8 is to disable the touch screen device completely, but it is enabled autiomatically...). -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Sun Apr 12 07:43:09 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sun, 12 Apr 2015 14:43:09 +0200 Subject: Daesh/ISIS anhilates the last tracks of the cuneiform script in Nimrud, Iraq Message-ID: Another bad day for cultures, and in this case for old scripts. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdaoden at yandex.com Mon Apr 13 05:19:18 2015 From: sdaoden at yandex.com (Steffen Nurpmeso) Date: Mon, 13 Apr 2015 12:19:18 +0200 Subject: Daesh/ISIS anhilates the last tracks of the cuneiform script in Nimrud, Iraq In-Reply-To: References: Message-ID: <20150413111918.7clJPqgBrtU=%sdaoden@yandex.com> Philippe Verdy wrote: |Another bad day for cultures, and in this case for old scripts. Oh yes. I personally do still suffer whenever i recall the demolition of the Buddhas of Bamiyan, and i think this is a spectacular pain that will last. I never saw them in real life. Of course there is ?When you point one finger, there are three fingers pointing back to you.? In no way do i defend ISIS or the Taliban, nor would i ever have supported them. Regarding philosophy and culture? everything such a shame. Some ?sapiens sapiens? materialization? in the plain existence of Unicode. --steffen From gwalla at gmail.com Mon Apr 13 16:18:54 2015 From: gwalla at gmail.com (Garth Wallace) Date: Mon, 13 Apr 2015 14:18:54 -0700 Subject: Chess symbol rotations (revisited) Message-ID: I'm much further along on my research for a proposal to encode heterodox chess symbols. I asked about terms for rotations last November and was told that the terms in use in the standard are CLOCKWISE-ROTATED and ANTICLOCKWISE-ROTATED (e.g. U+29BC), but I wasn't sure I would be proposing the knights in intermediate 45 degree rotations. Now I believe I have sufficient evidence for their use in running text, which brings up the question of how to name them. In my current draft I'm using terms like BLACK 45 DEGREE CLOCKWISE-ROTATED CHESS KNIGHT and WHITE 135 DEGREE ANTICLOCKWISE-ROTATED CHESS KNIGHT for them. It seems awkward, but I can't think of any better naming convention. The precedent in arrows for terms like NORTH EAST doesn't seem applicable: chess symbols have defined bases but do not technically point in a direction. The SignWriting rotation modifiers aren't much help when it comes to naming, since they are just numbered. The clock faces use 30 degree increments. Also, would it be useful to include a section on current practice, such as existing fonts and LaTeX? From haberg-1 at telia.com Mon Apr 13 17:14:31 2015 From: haberg-1 at telia.com (Hans Aberg) Date: Tue, 14 Apr 2015 00:14:31 +0200 Subject: Chess symbol rotations (revisited) In-Reply-To: References: Message-ID: <705B7AED-4C2F-4FD8-819F-F2B28E0A4FE7@telia.com> > On 13 Apr 2015, at 23:18, Garth Wallace wrote: > > I'm much further along on my research for a proposal to encode > heterodox chess symbols. I asked about terms for rotations last > November and was told that the terms in use in the standard are > CLOCKWISE-ROTATED and ANTICLOCKWISE-ROTATED (e.g. U+29BC), but I > wasn't sure I would be proposing the knights in intermediate 45 degree > rotations. Now I believe I have sufficient evidence for their use in > running text, which brings up the question of how to name them. In my > current draft I'm using terms like BLACK 45 DEGREE CLOCKWISE-ROTATED > CHESS KNIGHT and WHITE 135 DEGREE ANTICLOCKWISE-ROTATED CHESS KNIGHT > for them. It seems awkward, but I can't think of any better naming > convention. Have you checked if they are here: http://www.chessvariants.org/index/mainquery.php?type=Piececlopedia&orderby=LinkText&displayauthor=1&displayinventor=1&usethisheading=Piececlopedia From gwalla at gmail.com Mon Apr 13 19:21:40 2015 From: gwalla at gmail.com (Garth Wallace) Date: Mon, 13 Apr 2015 17:21:40 -0700 Subject: Chess symbol rotations (revisited) In-Reply-To: <705B7AED-4C2F-4FD8-819F-F2B28E0A4FE7@telia.com> References: <705B7AED-4C2F-4FD8-819F-F2B28E0A4FE7@telia.com> Message-ID: On Monday, April 13, 2015, Hans Aberg wrote: > > > On 13 Apr 2015, at 23:18, Garth Wallace > > wrote: > > > > I'm much further along on my research for a proposal to encode > > heterodox chess symbols. I asked about terms for rotations last > > November and was told that the terms in use in the standard are > > CLOCKWISE-ROTATED and ANTICLOCKWISE-ROTATED (e.g. U+29BC), but I > > wasn't sure I would be proposing the knights in intermediate 45 degree > > rotations. Now I believe I have sufficient evidence for their use in > > running text, which brings up the question of how to name them. In my > > current draft I'm using terms like BLACK 45 DEGREE CLOCKWISE-ROTATED > > CHESS KNIGHT and WHITE 135 DEGREE ANTICLOCKWISE-ROTATED CHESS KNIGHT > > for them. It seems awkward, but I can't think of any better naming > > convention. > > Have you checked if they are here: > > http://www.chessvariants.org/index/mainquery.php?type=Piececlopedia&orderby=LinkText&displayauthor=1&displayinventor=1&usethisheading=Piececlopedia > > (Oops, meant to reply to the list) The Piececlopedia doesn't really address symbols directly, it describes pieces by their moves. Rotated chess piece symbols are used as placeholders, with their actual identities as pieces assigned on a problem-by-problem basis (only the 180 degree turned queen and knight are fixed by convention, to the grasshopper and nightrider). Think variables, rather than constants. So, for example, in one problem a knight turned 90 degrees clockwise may be a camel (1,3 leaper), in another problem a mao (xiangqi horse), and still another problem may use a knight turned 90 degrees counter-clockwise for the camel instead. Without context, it means "a knight-like piece of some variety, but not an actual knight". This is long-standing practice in fairy chess problems. -------------- next part -------------- An HTML attachment was scrubbed... URL: From haberg-1 at telia.com Tue Apr 14 03:54:53 2015 From: haberg-1 at telia.com (Hans Aberg) Date: Tue, 14 Apr 2015 10:54:53 +0200 Subject: Chess symbol rotations (revisited) In-Reply-To: References: <705B7AED-4C2F-4FD8-819F-F2B28E0A4FE7@telia.com> Message-ID: <6E35F023-4AA3-4241-BC39-A4890E0A4183@telia.com> > On 14 Apr 2015, at 02:21, Garth Wallace wrote: > >> On Monday, April 13, 2015, Hans Aberg wrote: >> >>> On 13 Apr 2015, at 23:18, Garth Wallace wrote: >>> >>> I'm much further along on my research for a proposal to encode >>> heterodox chess symbols. I asked about terms for rotations last >>> November and was told that the terms in use in the standard are >>> CLOCKWISE-ROTATED and ANTICLOCKWISE-ROTATED (e.g. U+29BC), but I >>> wasn't sure I would be proposing the knights in intermediate 45 degree >>> rotations. >> >> Have you checked if they are here: >> http://www.chessvariants.org/index/mainquery.php?type=Piececlopedia&orderby=LinkText&displayauthor=1&displayinventor=1&usethisheading=Piececlopedia >> > The Piececlopedia doesn't really address symbols directly, it > describes pieces by their moves. Rotated chess piece symbols are used as placeholders, with their actual identities as pieces assigned on a problem-by-problem basis (only the 180 degree turned queen and knight are fixed by convention, to the grasshopper and nightrider). Think variables, rather than constants. So, for example, in one problem a knight turned 90 degrees clockwise may be a camel (1,3 leaper), in > another problem a mao (xiangqi horse), and still another problem may use a knight turned 90 degrees counter-clockwise for the camel instead. Without context, it means "a knight-like piece of some variety, but not an actual knight". This is long-standing practice in fairy chess problems. The mathematical symbols are a mixture of graphical and semantic descriptions. For example ? SUBSET OF U+2282 ? RIGHTWARDS DOUBLE ARROW U+21D2 So one can have both. From gwalla at gmail.com Tue Apr 14 09:46:47 2015 From: gwalla at gmail.com (Garth Wallace) Date: Tue, 14 Apr 2015 07:46:47 -0700 Subject: Chess symbol rotations (revisited) In-Reply-To: References: <705B7AED-4C2F-4FD8-819F-F2B28E0A4FE7@telia.com>

Message-ID: On Tuesday, April 14, 2015, Hans ?berg wrote: > > > On 14 Apr 2015, at 02:21, Garth Wallace > > wrote: > > > >> On Monday, April 13, 2015, Hans Aberg > wrote: > >> > >> > On 13 Apr 2015, at 23:18, Garth Wallace > wrote: > >> > > >> > I'm much further along on my research for a proposal to encode > >> > heterodox chess symbols. I asked about terms for rotations last > >> > November and was told that the terms in use in the standard are > >> > CLOCKWISE-ROTATED and ANTICLOCKWISE-ROTATED (e.g. U+29BC), but I > >> > wasn't sure I would be proposing the knights in intermediate 45 degree > >> > rotations. > >> > >> Have you checked if they are here: > >> > http://www.chessvariants.org/index/mainquery.php?type=Piececlopedia&orderby=LinkText&displayauthor=1&displayinventor=1&usethisheading=Piececlopedia > >> > > The Piececlopedia doesn't really address symbols directly, it > > describes pieces by their moves. Rotated chess piece symbols are used as > placeholders, with their actual identities as pieces assigned on a > problem-by-problem basis (only the 180 degree turned queen and knight are > fixed by convention, to the grasshopper and nightrider). Think variables, > rather than constants. So, for example, in one problem a knight turned 90 > degrees clockwise may be a camel (1,3 leaper), in > > another problem a mao (xiangqi horse), and still another problem may use > a knight turned 90 degrees counter-clockwise for the camel instead. Without > context, it means "a knight-like piece of some variety, but not an actual > knight". This is long-standing practice in fairy chess problems. > > The mathematical symbols are a mixture of graphical and semantic > descriptions. For example > ? SUBSET OF U+2282 > ? RIGHTWARDS DOUBLE ARROW U+21D2 > So one can have both. > > > Yes, and so far my proposal also covers some dedicated compound piece symbols, but my question is about naming some of the rotated ones. -------------- next part -------------- An HTML attachment was scrubbed... URL: From webalorixa at gmail.com Tue Apr 14 18:05:51 2015 From: webalorixa at gmail.com (Luis de la Orden) Date: Wed, 15 Apr 2015 00:05:51 +0100 Subject: =?UTF-8?Q?Re=3A_Combined_Yor=C3=B9b=C3=A1_characters_with_dot_below_and_?= =?UTF-8?Q?tonal_diacritics?= Message-ID: Dear all, I am delighted with the amount of information you have kindly shared with me. I was watching the discussion as it evolved please see my comments as below: @Andrew Cunningham @Tom Gewecke Hi again. Thanks for clarifying about MKLC limitation, a whole year spewing fire with this limitation not knowing it was a software limitation. ---------------- @Don Hi Don, I saw the Konyin layout and it is excellent in many aspects, nevertheless I don't think user behaviour makes it a successful alternative. It still requires learning a new way of typing which although easy it is a barrier in people's mind. And then there is a question of having another keyboard when your computer already shipped with one. It is one of those things that, as you highlight in your book, requires a concerted effort from policymakers and manufacturers to make it the first option every time one buys a computer in Nigeria for example so that it can be successful. With regards to this keyboard in specific, there is an issue of marketing and distribution as most of the people I talk to in Nigeria either never heard of it or don't know where to buy it. It seems that this initiative died out as the last we hear from the company manufacturing those keyboards is in 2006 in disused forums and old posts. ----------- @Andrew Cunningham @Ilya Zakharevich >From memory this was a problem we would have with MS Word. Care needs to be > taken selecting AltGr sequences to implement in keyboard. > And adding frequently typed characters like vowels and tone marks to altgr > is usually a bad idea. Easier to move less needed sequences to the altgr > state putting feequently type characters on the normal and shift states Hi Andrew, just a clarification there are pre-composed characters for Nigerian languages which use letters with a dot below. But with regards to the ALt-Gr, there it goes my innocence and feeling of accomplishment :)), I had everything linked to the Alt-Gr key and did exactly as Ilya said... MS Word is fine, but very specialised software such as Photoshop are a pain as their power-user shortcuts all use ALT-Gr indeed. Although I will resort to Ilya's argument as far as I can, this is an issue I must consider if I want to be inclusive or at least warn people using a localised layout within an European keyboard. ------------ @Ilya Zakharevich Many thanks for sharing your link at http://search.cpan.org/~ilyaz/UI-KeyboardLayout/lib/UI/KeyboardLayout.pm. Reading it avidly at the moment. --------------- @Philippe Verdy Perhaps Unicode could give Microsoft a gentle nudge in that direction, this is the only free software I know so far. ----------- #OCR #Tesseract Actually, I am thinking a little bit bigger than my boots here, I am doing all this work of compiling an accented glossary of words from existing printed dictionaries so that I can help Adobe. Microsoft, Google and the likes to speed up their language support for PDF and whichever technology they have out there. I feel quite strongly that it is high time these were niche solutions that require effort from ordinary people to implement and became mainstream out-of-the-box solutions. But from the content of what I just said above, you are right to assume I have no clue how to get started, I am just a guy with a growing Excel spreadsheet and a dwindling bank account as a result :). ------------ Many thanks to all of you! I still have questions with regards to a set of binary Yoru?ba? characters/numbers system and will open a new discussion to keep things well organised. Regards, Luis -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Wed Apr 15 10:59:52 2015 From: doug at ewellic.org (Doug Ewell) Date: Wed, 15 Apr 2015 08:59:52 -0700 Subject: Combined =?UTF-8?Q?Yor=C3=B9b=C3=A1=20characters=20with=20dot=20below?= =?UTF-8?Q?=20and=20tonal=20diacritics?= Message-ID: <20150415085952.665a7a7059d7ee80bb4d670165c8327d.9dd37058c0.wbe@email03.secureserver.net> Luis de la Orden wrote: >> From memory this was a problem we would have with MS Word. Care needs >> to be taken selecting AltGr sequences to implement in keyboard. >> And adding frequently typed characters like vowels and tone marks to >> altgr is usually a bad idea. Easier to move less needed sequences to >> the altgr state putting feequently type characters on the normal and >> shift states > > Hi Andrew, just a clarification there are pre-composed characters for > Nigerian languages which use letters with a dot below. > > But with regards to the ALt-Gr, there it goes my innocence and feeling > of accomplishment :)), I had everything linked to the Alt-Gr key and > did exactly as Ilya said... MS Word is fine, but very specialised > software such as Photoshop are a pain as their power-user shortcuts > all use ALT-Gr indeed. It's true that some very popular software packages define their own Alt+key or Ctrl+Alt+key combinations, and keyboard layouts that use the same combinations (as AltGr+key) will conflict with them. I just annoyed a colleague the other day because my keyboard layout (John Cowan's delightful Moby Latin [1]) co-opts one of his favorite Visual Studio shortcuts; he tried to build a project and got U+2022 BULLET instead. But there are also a lot of keyboard layouts worldwide that do use AltGr keys. There are only so many keys on a standard keyboard, and if you're designing a layout and you've made all the tough choices and you still need to find room for more characters, you pretty much have to go to AltGr. It helps to educate users of such a keyboard layout, especially Americans, that the left and right Alt keys aren't the same. Americans tend not to expect this, because the standard U.S. keyboard doesn't use AltGr at all and the key isn't marked as such. [1] http://recycledknowledge.blogspot.com/2013/09/us-moby-latin-keyboard-for-windows.html -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From mark at macchiato.com Thu Apr 16 03:01:07 2015 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Thu, 16 Apr 2015 10:01:07 +0200 Subject: Combining character example Message-ID: I happened to run across a good example of productive use of combining marks, the Duden site (a great online dictionary for German). They use U+0323 ( ?) COMBINING DOT BELOW to indicate the stress. Here is an example: u?nterbuttern http://www.duden.de/rechtschreibung/unterbuttern They aren't, however, consistent; you also see underlining for stress. e?i?nschr?nken But not, interestingly, with the HTML underline, but with U+0332 ( ? ) COMBINING LOW LINE. Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From jknappen at web.de Thu Apr 16 04:32:21 2015 From: jknappen at web.de (=?UTF-8?Q?=22J=C3=B6rg_Knappen=22?=) Date: Thu, 16 Apr 2015 11:32:21 +0200 Subject: Aw: Combining character example In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From mark at macchiato.com Thu Apr 16 05:45:35 2015 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Thu, 16 Apr 2015 12:45:35 +0200 Subject: Combining character example In-Reply-To: References:

Message-ID: Thanks for the corrections; I should have looked for a key to the conventions they use. Mark *? Il meglio ? l?inimico del bene ?* On Thu, Apr 16, 2015 at 11:32 AM, "J?rg Knappen" wrote: > Hi Mark, > > the use of DOT BELOW and LINE BELOW is in fact consistent in German Duden. > The > difference in the diacritics is used to denote length of the stressed > vowel, DOT BELOW > denotes a short vowel and LINE BELOW denotes a long vowel. > > Diphthongs are always long and there is a single line under the whole > Diphthong. > > Digraphs (e.g. the "ou" in words borrowed from French) also have either a > single line > under the whole digraph or (this happens rarely) a single dot in the > middle of the > digraph. > > --J?rg Knappen > > *Gesendet:* Donnerstag, 16. April 2015 um 10:01 Uhr > *Von:* "Mark Davis [image: ?]?" > *An:* "Unicode Public" , "Unicode Book" < > book at unicode.org> > *Betreff:* Combining character example > I happened to run across a good example of productive use of combining > marks, the Duden site (a great online dictionary for German). They use > U+0323 ( ?) COMBINING DOT BELOW to indicate the stress. Here is an > example: > > u?nterbuttern > > http://www.duden.de/rechtschreibung/unterbuttern > > They aren't, however, consistent; you also see underlining for stress. > > e?i?nschr?nken > But not, interestingly, with the HTML underline, but with U+0332 ( ? ) > COMBINING LOW LINE. > > Mark > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: emoji_u2615.png Type: image/png Size: 1547 bytes Desc: not available URL: From verdy_p at wanadoo.fr Thu Apr 16 07:11:37 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Thu, 16 Apr 2015 14:11:37 +0200 Subject: Combining character example In-Reply-To: References:

Message-ID: 2015-04-16 11:32 GMT+02:00 "J?rg Knappen" : > Digraphs (e.g. the "ou" in words borrowed from French) also have either a > single line > under the whole digraph or (this happens rarely) a single dot in the > middle of the > digraph. > The Standard French digraph "ou" (or "o?") /u/ is never long or the length is not significant, it was significant in old French or remains significant in some regional variants of French such as Acadian French). If we need the single dot, it is not really to represent the /u/ vowel but the /w/ semi-vowel. For example, compare: - "mouette" /mw?t/ with a single phonetic syllable, we would note the dot below the "ou" digraph to indicate this is a half-vowel /w/ - "brouette" /b?u.?t/ with two phonetic syllables, where we would note the single line below the "ou" digraph to indicate this is a /u/ vowel And for such use, the distinction between /u/ and /w/ is definitely NOT "rare" in French (for each one of its variants) ! ---- There's another French digraph using the /w/ half-vowel: "oi" (or "o?" in a few words like "bo?te" where the circumflex denotes an old etymological "s" that is now completely mute; such circumflex over "i" or "?" is now optional in most words except if this orthographically disambiguates homophones, such as "du" vs. "d?"). The "oi" digraph is now read as the diphtong /wa/ in Standard French (but as the diphtong /w?/ or just the vowel /?/ in old French or in some regional variants). When it was used as a verbal desinence it is now consistantly written /ai/ and spelled as a single vowel /?/ without a diphtong. In all cases, Standard French no longer has any phonetic distinction of vowel length, it also no longer has any distinction of stress, or tone. And orthograp?ically, vowel length, or stress, or tone, is never written (there's no standard diacritic for them). So the spoken language can freely alter these phonetic variations without changing the meaning (e.g. in poestry or songs, where this gives much more freedom for authors or interprets), except for emphasis purposes or in extremely rare cases for the spoken language only. In the written form, if needed, the distinction for these variations is made using typographic styles (you could mark it by bold, or underlined styles for emphasis), or by using separators or punctuations; to distinguish a single digraph from a pair of vowels, the diaresis diacritic is used orthographically, over one of the two vowels: traditionally the diaeresis ("tr?ma" in French) was hold by the second vowel (but it could be over the first vowel if the second vowel already as another diacritic such as an acute accent), and in reformed orthoghraphy this is the first vowel that consistantly holds the diaeresis. ---- Another possible usage of the "dot below" diacritic in French Text with the German Duden notation would be to denote the "unaspirated h" (which is completely mute in all contexts, allows liaisons and contractions, and sometimes even diphtongs to appear with a preceding vowel in fast speech by merging two syllables (e.g. "cohabiter" /ko.a.bi.te/ is possibly muted to /kwa.bi.te/ in fast speech). The "low line" (or "low macron"?) below "h" with the German Duden notation would denote the "aspirated" h, which is now *also* mute in Standard French (except that it prohibits all phonetic "liaisons" with the final consonants a previous word, as well as contractions of a previous article or preposition). The "aspirated h" may however be emphatically pronounced /h/ (and it is still the norm in regional variants of French). But traditionnally, French dictionaries denote the "aspirated h" (which only exists at start of words) with a leading asterisk symbol (or with a similar symbol such as a bullet) before the orthograhic word entry; very few use the low line (or low macron) diacritic which is not enough visible. For such use, the dot below would definitely not be "rare" (even if it won't be in the middle of a digraph but below a single mute "h" letter). ---- You could also note with these diacritics the main difference of pronounciation of "ch": - /k/ traditionally mostly for most words with Greek etymology (such as "choriste") would use a dot below the mute "h", or - /?/ for other words (like "machine" in French and English, but compare with the Latin expression "Deus ex machina" which is still pronouncing "machina" with /k/ like in Greek!) would use a line below the whole digraph (in both cases, it does not denote tone, stress, or length/gemination of the consonnant). Length/gemination of consonnants in Standard French is no longer significant orally; it just persists orthographically (or in some regional variants), and speakers can freely alter it if they want for marking emphasis; in written form, they would use typographical styles (such as bold, underlining, or capitals in the middle of words, or bigger font sizes) or would insert additional separators such as hyphens or middle dots. Some words in French still hesitate between the two main pronounciations /k/ and /?/ of "ch" (e.g. "chorizo" borrowed directly from Spanish into French, where it means the same kind of dried hot-spiced sausage). A few words borrowed from English are also rarely pronounced with /t?/ but more often /?/, and notably those words that English itself borrowed from French with minor orthographic changes, before they came back again to French. /t?/ is just for some "purists" who want to maintain the English distinction, but for most users it is not incorrect and even recommended to mute it back to the standard French /?/ (including for English people name such as "Prince Charles", or for English toponomyms like "Chicago"). It an author wanted to annotate a French-written text to mark where "ch" should be pronounced /t?/, he could insert an additional "t" letter between parentheses or in superscript, or another custom diacritic over the "ch" digraph or one of its letters (there's no orthographic standard for such notation). The rare French words where this phonetic mutation of /t?/ to /?/ is prohibited, are written explicitly with the trigram "tch" (e.g. "Tchad", the African country or lake ; or the interjection "Tchin !", to contrast it phonetically from "Chine", the East-Asian country, or "chine", a verbal form of "chiner", both never pronounced with /t?/ in French) -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Thu Apr 16 11:21:23 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Thu, 16 Apr 2015 18:21:23 +0200 Subject: =?UTF-8?Q?Re=3A_Combined_Yor=C3=B9b=C3=A1_characters_with_dot_below_and_?= =?UTF-8?Q?tonal_diacritics?= In-Reply-To: <20150415085952.665a7a7059d7ee80bb4d670165c8327d.9dd37058c0.wbe@email03.secureserver.net> References: <20150415085952.665a7a7059d7ee80bb4d670165c8327d.9dd37058c0.wbe@email03.secureserver.net> Message-ID: 2015-04-15 17:59 GMT+02:00 Doug Ewell : > Luis de la Orden wrote: > > > But with regards to the ALt-Gr, there it goes my innocence and feeling > > of accomplishment :)), I had everything linked to the Alt-Gr key and > > did exactly as Ilya said... MS Word is fine, but very specialised > > software such as Photoshop are a pain as their power-user shortcuts > > all use ALT-Gr indeed. > > It's true that some very popular software packages define their own > Alt+key or Ctrl+Alt+key combinations, and keyboard layouts that use the > same combinations (as AltGr+key) will conflict with them. I just annoyed > a colleague the other day because my keyboard layout (John Cowan's > delightful Moby Latin [1]) co-opts one of his favorite Visual Studio > shortcuts; he tried to build a project and got U+2022 BULLET instead. > > But there are also a lot of keyboard layouts worldwide that do use AltGr > keys. There are only so many keys on a standard keyboard, and if you're > designing a layout and you've made all the tough choices and you still > need to find room for more characters, you pretty much have to go to > AltGr. > > It helps to educate users of such a keyboard layout, especially > Americans, that the left and right Alt keys aren't the same. Americans > tend not to expect this, because the standard U.S. keyboard doesn't use > AltGr at all and the key isn't marked as such. > Another candidate key for modifiers that you can use on PC keyboards is the useless "NumLock" key on 101/102-keys keyboards (there's actually no need to switch the working mode of the numeric keypad, given you have also a separate set of keys for cursor movements, which remain active independantly of the NumLock setting). So this NumLock can be reused as another modifier (e.g. to support input in Japanese without a physical Japanese keyboard). Of course this will not work if your connected keyboard does not have a separate numpad but you need to share it with cursor keys. Some keyboard layouts also use ScrollLock for similar purpose (but this function key is frequently not easy to access on notebooks where you activate it by using the "Fn" key plus another function key. NumLock in that case remains more accessible as it will remain under a single keystroke. Several keyboard drivers (e.g. Logitech) have settings that disable ScrollLock by default (or that recommend disabling it...), a good sign that this function is useless in standard layouts, but more useful for extended layouts (in that case don't disable it in the device control panel!). Outside physical keyboards, for virtual touch keyboards there's no such limitation, and these layouts can have much more freedom in how they will switch their visible panels in various input modes. Many more facilities can be integrated, including faciltiies specific to the application having the input focus (they don't necessarily have to modify the virtual driver of the OS, they can provide these faciltiies directly in their own application UI, but the OS-provided virtual keyboard facility can provide a space for such application-specific customizations of the touch panel). For these virtual panels, even common modifiers like CapsLock, Shift, Control, or Numlock are not productive, as the proposed lists of characters may be arranged in very different groups or input modes and with more choice in terms of geometry. However these applications providing custom UI should propose clear mappings to other common input devices, including physical keyboards and mouse or other pointing devices. These interfaces should also not assume that alternate devices will be able to emulate a multitouch capability (with multiple pointing positions on screen) On touch interfaces, the "AltGr" modifier itself does not have any meaning. It is more important for the layout to offer the best selections of characters for the current language, and offer extended subsets with some logical groupings that make sense for that language, but we are not limited to just 2 or 3 "modifier" keys (for example long presses or clicks on a basic virtual key can open a popup panel that contains dozens characters to choose from, and contextual input can also predispose the most common or most recent choice in some easy position without having to look for other panels). Virtual onscreen keyboards (not necessarily on touch screens because you may also point and click in them) are in fact acting more like an IME than a traditional keyboard These virtual keyboards are in fact true applications with a visual UI, accessible with several other input devices: a tactile area, possibly multitouch, pointing devices, physical keyboards, or other sensors. These applications process all these inputs, present some selections, present the result to be sent to other applications, and they will send these results either by emulating standard keyboard events, or mouse events, or even image captures, or clipboard contents to be pasted in other applications. These IME can also be desigend to manage completely the content of an external input control widget, becoming their standard text editor. -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmus-inc at ix.netcom.com Thu Apr 16 11:23:04 2015 From: asmus-inc at ix.netcom.com (Asmus Freytag (t)) Date: Thu, 16 Apr 2015 09:23:04 -0700 Subject: Combining character example In-Reply-To: References:

Message-ID: <552FE1E8.7010204@ix.netcom.com> On 4/16/2015 3:45 AM, Mark Davis ?? wrote: > Thanks for the corrections; I should have looked for a key to the > conventions they use. > > It's clear why they would not want to use the HTML underline. The additional information is content, not style. A./ > Mark > / > / > /? Il meglio ? l?inimico del bene ?/ > // > > On Thu, Apr 16, 2015 at 11:32 AM, "J?rg Knappen" > wrote: > > Hi Mark, > the use of DOT BELOW and LINE BELOW is in fact consistent in > German Duden. The > difference in the diacritics is used to denote length of the > stressed vowel, DOT BELOW > denotes a short vowel and LINE BELOW denotes a long vowel. > Diphthongs are always long and there is a single line under the > whole Diphthong. > Digraphs (e.g. the "ou" in words borrowed from French) also have > either a single line > under the whole digraph or (this happens rarely) a single dot in > the middle of the > digraph. > --J?rg Knappen > *Gesendet:* Donnerstag, 16. April 2015 um 10:01 Uhr > *Von:* "Mark Davis ??" > > *An:* "Unicode Public" >, "Unicode Book" > > *Betreff:* Combining character example > I happened to run across a good example of productive use of > combining marks, the Duden site (a great online dictionary for > German). They use U+0323 ( ?) COMBINING DOT BELOW to indicate the > stress. Here is an example: > u?nterbuttern > http://www.duden.de/rechtschreibung/unterbuttern > They aren't, however, consistent; you also see underlining for stress. > > e?i?nschr?nken > > But not, interestingly, with the HTML underline, but with U+0332 ( > ? ) COMBINING LOW LINE. > Mark > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1547 bytes Desc: not available URL: From doug at ewellic.org Thu Apr 16 11:53:34 2015 From: doug at ewellic.org (Doug Ewell) Date: Thu, 16 Apr 2015 09:53:34 -0700 Subject: Combined Yoruba characters with dot below and tonal diacritics Message-ID: <20150416095334.665a7a7059d7ee80bb4d670165c8327d.65d9e37ae2.wbe@email03.secureserver.net> Philippe Verdy wrote: > Another candidate key for modifiers that you can use on PC keyboards > is the useless "NumLock" key on 101/102-keys keyboards (there's > actually no need to switch the working mode of the numeric keypad, > given you have also a separate set of keys for cursor movements, which > remain active independantly of the NumLock setting). [+544 words] Speaking only about Windows here, not other platforms: 1. AltGr is the industry standard for this sort of "Level 3" shifting function. Users would probably not expect NumLock or Scroll Lock, which are far from the normal typewriter keys, to perform this function. 2. Windows (or at least MSKLC) doesn't allow NumLock to be remapped in this way. You'd have to drop down to a lower level of key processing. 3. Some users actually prefer the arrow keys on the numeric keypad, and for them, NumLock isn't "useless." -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From verdy_p at wanadoo.fr Thu Apr 16 13:08:45 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Thu, 16 Apr 2015 20:08:45 +0200 Subject: Combined Yoruba characters with dot below and tonal diacritics In-Reply-To: <20150416095334.665a7a7059d7ee80bb4d670165c8327d.65d9e37ae2.wbe@email03.secureserver.net> References: <20150416095334.665a7a7059d7ee80bb4d670165c8327d.65d9e37ae2.wbe@email03.secureserver.net> Message-ID: 2015-04-16 18:53 GMT+02:00 Doug Ewell : > Philippe Verdy wrote: > > > Another candidate key for modifiers that you can use on PC keyboards > > is the useless "NumLock" key on 101/102-keys keyboards (there's > > actually no need to switch the working mode of the numeric keypad, > > given you have also a separate set of keys for cursor movements, which > > remain active independantly of the NumLock setting). > > [+544 words] > > Speaking only about Windows here, not other platforms: > > 1. AltGr is the industry standard for this sort of "Level 3" shifting > function. Users would probably not expect NumLock or Scroll Lock, which > are far from the normal typewriter keys, to perform this function. I have not contested the use of AltGr key on 101/202 physical keyboard for that shifting function. But for adding other modifier keys (e.g. emulating a Japanese keyboard that has another "kana" modifier). Also this working mode for physical keyboards is absolutely not a requirement for visual input on virtual onscreen keyboards which are absolutely not required to use or even emulate these layouts with the same working modes. And yes Numlock is useless on keyboards that have BOTH a cursor control keypad AND a numeric keypad; you use the cursor control keypad directly and want the numeric pad to remain in numeric mode (so logivally, the Logitech driver proposes to disable Numlock completely). Numlock is a very old feature inherited from initial IBM PC keyboards (and it had no equivalent before the PC) when there was no separate cursor control keypad. Many notebook don't even have it (or if you activate it, it remaps the inexistant numeric keypad on top of the main alphabetic keys, the cursor keys remain active where they are and independantly of Numlock state!) So, Numlock is not a standard even in the PC/Windows world. Where it still really exists (only external physical keyboards), it is most often useless. Its existence was justified only on keyboards with about 90 keys (ignoring other multimedia/powercontrol keys or the newer modifier keys for Windows/OS, Appl/Menu, and Fn). As well not all physical keyboards have separate keys for ScrollLock, PrintScreen, Break/SystReq (they are remapped on other keys by combining them with the "Fn" key, directly in their internal firmware, without control by Windows itself). -------------- next part -------------- An HTML attachment was scrubbed... URL: From schne59863 at laposte.net Fri Apr 17 12:08:24 2015 From: schne59863 at laposte.net (schne59863 at laposte.net) Date: Fri, 17 Apr 2015 19:08:24 +0200 (CEST) Subject: =?utf-8?Q?NamesList,_Code=C2=A0Charts,_ISO/IEC=C2=A010646?= In-Reply-To: <23898299.11201876.1429290382097.JavaMail.zimbra@laposte.net> Message-ID: <1672337588.11213982.1429290504770.JavaMail.zimbra@laposte.net> Hi, there seems to be a mistake with character names. In fact they are designations, and they are handled a such. The goal of a character?s name is to give an accurate idea of what the character is, and to facilitate referring to in natural language. As an immutable identifier there is the code point. Systems handle code points, not character names. Software does not need any other identifier. This is why freezing character names is an abuse, especially when they proved to be wrong. There is a very strong desire to design most accurate names, which lead to passionate discussions at the merger of ISO/IEC 10646 with Unicode. But the renaming of U+00C6/U+00E6 to its original letter status produced surprisingly a name-update prohibition act, a Stability Policy that extends over names instead of ensuring code point stability only. Suddenly, character names were called by ISO ?convenient identifiers?, not more. And not less. Fortunately Unicode found a workaround, giving characters that are completely misnamed, a Formal Alias, thanks to which Formal Alias aware software is able to display a true designation in most cases. Unfortunately, the remedy is not applied to characters such as U+002F SOLIDUS, a slash that bears the scholar name of the fraction slash (U+2044 FRACTION SLASH may be called with some reason a solidus). And even more unfortunately, there would be fare too many Formal Aliases if all the abusive lateralization of bidi-mirrored paired punctuations would be corrected. Even out of bidirectional context, the ?LEFT? qualifier is unfitting for U+2018 and U+201C in a Universal Character Set. UnicodeData shows clearly where most of the awkward names are from. Or, more accurately, where they are NOT from. By misnaming characters in an ethnocentric way, ISO acted against its mission as an international standards body. It is obvious an international organization for standardization must respect its members? wishes. And when one of the countries complains about misnaming, it must correct and apologize, not rage and protest. Nor prohibit further updates. Therefore I suggest doing some general overhaul. Beginning with the Stability Policy. As to avoid lateralization where it is undue, LEFT and RIGHT may be replaced with the original OPENING and CLOSING where it is unambiguous, or with BACKWARD-POINTING and FORWARD-POINTING. Best regards, -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmus-inc at ix.netcom.com Fri Apr 17 12:31:47 2015 From: asmus-inc at ix.netcom.com (Asmus Freytag (t)) Date: Fri, 17 Apr 2015 10:31:47 -0700 Subject: NamesList, =?UTF-8?B?Q29kZcKgQ2hhcnRzLCBJU08vSUVDwqAxMDY0Ng==?= In-Reply-To: <1672337588.11213982.1429290504770.JavaMail.zimbra@laposte.net> References: <1672337588.11213982.1429290504770.JavaMail.zimbra@laposte.net> Message-ID: <55314383.5070507@ix.netcom.com> On 4/17/2015 10:08 AM, schne59863 at laposte.net wrote: > > Hi, > > > > there seems to be a mistake with character names. > Dear schne5983, There seems to be a mistake in this message. It does not include a signature or a name. I'll reserve responding in detail until I know who I have the pleasure of conversing with. But permit me to ask one question up front. What would be served by making such a sweeping change at this juncture, after 25 years of established practice? Best wishes, A./ -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Fri Apr 17 12:36:02 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Fri, 17 Apr 2015 19:36:02 +0200 Subject: NamesList, Code Charts, ISO/IEC 10646 In-Reply-To: <1672337588.11213982.1429290504770.JavaMail.zimbra@laposte.net> References: <23898299.11201876.1429290382097.JavaMail.zimbra@laposte.net> <1672337588.11213982.1429290504770.JavaMail.zimbra@laposte.net> Message-ID: 2015-04-17 19:08 GMT+02:00 schne59863 at laposte.net : > As to avoid lateralization where it is undue, LEFT and RIGHT may be > replaced with the original OPENING and CLOSING where it is unambiguous, or > with BACKWARD-POINTING and FORWARD-POINTING. > LEFT and RIGHT are accurate for characters that MUST NOT be mirrored in BiDi contexts according (this applies notably to punctuation and symbols that have a weak or contextual direction and do not force the direction of the text encoded after them). Those characters that MUST be mirrored according to the resolved BiDi direction, are the only ones for which OPENING/CLOSING or BACKWARD/FORWARD should be used instead of LEFT/RIGHT. Note also that some characters will rotate in vertical writing modes, and in that case they have a FORWARD/BACKWARD direction, even if those characters do not mirror in bidirectional horizontal modes. So a single naming avoiding LEFT/RIGHT is not desirable for all characters. This depends on how they behave when writing direction changes each time these characters don not have a strong direction. -------------- next part -------------- An HTML attachment was scrubbed... URL: From m_kato at ga2.so-net.ne.jp Mon Apr 27 21:20:29 2015 From: m_kato at ga2.so-net.ne.jp (Makoto Kato) Date: Tue, 28 Apr 2015 11:20:29 +0900 Subject: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana? Message-ID: <553EEE6D.2020004@ga2.so-net.ne.jp> Hi. http://www.unicode.org/reports/tr14/proposed.html#ID defines Ideographic (ID). Although full-width katakana is included in ID, half-width katakana (U+FF66 and U+FF71-U+FF9D) isn't. Why? Also, Conditional Japanese Starter (CJ, http://www.unicode.org/reports/tr14/proposed.html#CJ) considers half-width variants such as half-width katakana letter small a. -- Makoto From mpsuzuki at hiroshima-u.ac.jp Mon Apr 27 22:14:54 2015 From: mpsuzuki at hiroshima-u.ac.jp (suzuki toshiya) Date: Tue, 28 Apr 2015 12:14:54 +0900 Subject: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana? In-Reply-To: <553EEE6D.2020004@ga2.so-net.ne.jp> References: <553EEE6D.2020004@ga2.so-net.ne.jp> Message-ID: <553EFB2E.3010808@hiroshima-u.ac.jp> Kato-san, At present, I have no objection to add halfwidth katakana to ideographic-class in UAX#14, but I'm unfamiliar with the (negative) impact caused by the lack of halfwidth katakana in it. Could you tell me if you know anything? I guess, the inclusion or exclusion in other classes, like, AI, AL, CJ, JL, JV, JT, SA might be quite important to realize the appropriate line breaking, but the inclusion or exclusion in ID-class does not seem to be important. If the inclusion in ID-class is important, more characters (e.g. Bopomofo) should be considered for full coverage. How do you think of? Regards, mpsuzuki Makoto Kato wrote: > Hi. > > http://www.unicode.org/reports/tr14/proposed.html#ID defines Ideographic > (ID). Although full-width katakana is included in ID, half-width > katakana (U+FF66 and U+FF71-U+FF9D) isn't. Why? > > Also, Conditional Japanese Starter (CJ, > http://www.unicode.org/reports/tr14/proposed.html#CJ) considers > half-width variants such as half-width katakana letter small a. > > > -- Makoto From m_kato at ga2.so-net.ne.jp Tue Apr 28 00:57:56 2015 From: m_kato at ga2.so-net.ne.jp (Makoto Kato) Date: Tue, 28 Apr 2015 14:57:56 +0900 Subject: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana? In-Reply-To: <553EFB2E.3010808@hiroshima-u.ac.jp> References: <553EEE6D.2020004@ga2.so-net.ne.jp> <553EFB2E.3010808@hiroshima-u.ac.jp> Message-ID: Hi, Suzuki-san. Thank you for reply. > At present, I have no objection to add halfwidth katakana > to ideographic-class in UAX#14, but I'm unfamiliar with the > (negative) impact caused by the lack of halfwidth katakana > in it. Could you tell me if you know anything? Since half-width katakana isn't ID, it isn't break line like full-wdith katakana. This is a sample for line break of half-width katakana. (There is good sample by web browser implementation) http://mxr.mozilla.org/mozilla-central/source/layout/reftests/line-breaking/ja-3.html Firefox and IE11 define half-width katakana as ID. The line break of half-width katakana is same as full-width katakana. Chrome doesn't define it as ID. Half-width katakana isn't line break per character. Although I read JIS X 4051, it doesn't define that half-width katakana and full-width katakana are differently. > I guess, the inclusion or exclusion in other classes, like, > AI, AL, CJ, JL, JV, JT, SA might be quite important to realize > the appropriate line breaking, but the inclusion or exclusion > in ID-class does not seem to be important. If the inclusion > in ID-class is important, more characters (e.g. Bopomofo) > should be considered for full coverage. How do you think of? My discussion is why half-width katanaka character isn't same class of full-width katakana character. In this case, half-width katakana originally defines as AL at current spec. So when moving to ID, break rule is strongly changed. (non-break -> break before or after). -- Makoto On Tue, Apr 28, 2015 at 12:14 PM, suzuki toshiya wrote: > Kato-san, > > At present, I have no objection to add halfwidth katakana > to ideographic-class in UAX#14, but I'm unfamiliar with the > (negative) impact caused by the lack of halfwidth katakana > in it. Could you tell me if you know anything? > > I guess, the inclusion or exclusion in other classes, like, > AI, AL, CJ, JL, JV, JT, SA might be quite important to realize > the appropriate line breaking, but the inclusion or exclusion > in ID-class does not seem to be important. If the inclusion > in ID-class is important, more characters (e.g. Bopomofo) > should be considered for full coverage. How do you think of? > > Regards, > mpsuzuki > > Makoto Kato wrote: >> Hi. >> >> http://www.unicode.org/reports/tr14/proposed.html#ID defines Ideographic >> (ID). Although full-width katakana is included in ID, half-width >> katakana (U+FF66 and U+FF71-U+FF9D) isn't. Why? >> >> Also, Conditional Japanese Starter (CJ, >> http://www.unicode.org/reports/tr14/proposed.html#CJ) considers >> half-width variants such as half-width katakana letter small a. >> >> >> -- Makoto From mpsuzuki at hiroshima-u.ac.jp Tue Apr 28 01:27:08 2015 From: mpsuzuki at hiroshima-u.ac.jp (suzuki toshiya) Date: Tue, 28 Apr 2015 15:27:08 +0900 Subject: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana? In-Reply-To: References: <553EEE6D.2020004@ga2.so-net.ne.jp> <553EFB2E.3010808@hiroshima-u.ac.jp> Message-ID: <553F283C.9090307@hiroshima-u.ac.jp> # Sorry, I slipped to consider about the # big picture attachment. I reduced the # image size and resend to Unicode mailing # list. Kato-san, Thank you very much for prompt response. > This is a sample for line break of half-width katakana. (There is > good sample by web browser implementation) > http://mxr.mozilla.org/mozilla-central/source/layout/reftests/line-breaking/ja-3.html I wish if the sample text is longer to show the line breaking behaviour. I attached jugem.txt and the screenshot by Firefox and Chromium. > Firefox and IE11 define half-width katakana as ID. The line break of > half-width katakana is same as full-width katakana. > Chrome doesn't define it as ID. Half-width katakana isn't line break > per character. Oh, Google Chrome could not break half-width katakana text by per-character line breaking! It is very good example showing that the lack of explicit definition caused the incompatibility (and inconvenience). I'm sorry for troubling you about the explanation. I agree with your proposal to add halfwidth katakana to ID-class, even if further discussion is needed for other scripts. Regards, mpsuzuki Makoto Kato wrote: > Hi, Suzuki-san. Thank you for reply. > >> At present, I have no objection to add halfwidth katakana >> to ideographic-class in UAX#14, but I'm unfamiliar with the >> (negative) impact caused by the lack of halfwidth katakana >> in it. Could you tell me if you know anything? > > Since half-width katakana isn't ID, it isn't break line like > full-wdith katakana. > > This is a sample for line break of half-width katakana. (There is > good sample by web browser implementation) > http://mxr.mozilla.org/mozilla-central/source/layout/reftests/line-breaking/ja-3.html > > Firefox and IE11 define half-width katakana as ID. The line break of > half-width katakana is same as full-width katakana. > Chrome doesn't define it as ID. Half-width katakana isn't line break > per character. > > Although I read JIS X 4051, it doesn't define that half-width katakana > and full-width katakana are differently. > > >> I guess, the inclusion or exclusion in other classes, like, >> AI, AL, CJ, JL, JV, JT, SA might be quite important to realize >> the appropriate line breaking, but the inclusion or exclusion >> in ID-class does not seem to be important. If the inclusion >> in ID-class is important, more characters (e.g. Bopomofo) >> should be considered for full coverage. How do you think of? > > My discussion is why half-width katanaka character isn't same class of > full-width katakana character. In this case, half-width katakana > originally defines as AL at current spec. So when moving to ID, break > rule is strongly changed. (non-break -> break before or after). > > > -- Makoto > > On Tue, Apr 28, 2015 at 12:14 PM, suzuki toshiya > wrote: >> Kato-san, >> >> At present, I have no objection to add halfwidth katakana >> to ideographic-class in UAX#14, but I'm unfamiliar with the >> (negative) impact caused by the lack of halfwidth katakana >> in it. Could you tell me if you know anything? >> >> I guess, the inclusion or exclusion in other classes, like, >> AI, AL, CJ, JL, JV, JT, SA might be quite important to realize >> the appropriate line breaking, but the inclusion or exclusion >> in ID-class does not seem to be important. If the inclusion >> in ID-class is important, more characters (e.g. Bopomofo) >> should be considered for full coverage. How do you think of? >> >> Regards, >> mpsuzuki >> >> Makoto Kato wrote: >>> Hi. >>> >>> http://www.unicode.org/reports/tr14/proposed.html#ID defines Ideographic >>> (ID). Although full-width katakana is included in ID, half-width >>> katakana (U+FF66 and U+FF71-U+FF9D) isn't. Why? >>> >>> Also, Conditional Japanese Starter (CJ, >>> http://www.unicode.org/reports/tr14/proposed.html#CJ) considers >>> half-width variants such as half-width katakana letter small a. >>> >>> >>> -- Makoto -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: jugem-halfwidth-katakana.txt URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: jugem-firefox-vs-chromium.png Type: image/png Size: 23506 bytes Desc: not available URL: From verdy_p at wanadoo.fr Tue Apr 28 02:47:40 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Tue, 28 Apr 2015 09:47:40 +0200 Subject: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana? In-Reply-To: <553EEE6D.2020004@ga2.so-net.ne.jp> References: <553EEE6D.2020004@ga2.so-net.ne.jp> Message-ID: My feeeling is that half-width kanas behave like Latin letters and do not even have to follow the ideographic composition square to line up with them (unlike standard kanas). So effectively their line breaking behavior is very different. Those "half-width letters" are in fact similar to linear jamos (not composed into syllabic squares) in the Korean script, and to Bopomofo letters. And may be we could add the CJK key letters (radicals used for example in IDS) to this list, or Yi radicals. They are harmonized to be used along with other alphabetic scripts. In fact they may even not be really "half-width" but proportional. They are also used with non-ideographic punctuation (notably the ASCII punctuation) and standard SPACE (U+0020). If rendered in vertical lines, they could be either rotated (just like Latin letters), or not (aligned horizontallly like letters in columns of crosswords, but they may also have proportional height, like in Latin/Greek/Cyrillic where it is sometimes needed for example with capital letters with stacked accents, or when using sized spaces) So IMHO, those "half-width" letters are in fact to be considered as another separate script, for typographic purpose. They are "unified" with non-halfwidth letters, only for collation with minor differences (plain-text searching and sorting). 2015-04-28 4:20 GMT+02:00 Makoto Kato : > Hi. > > http://www.unicode.org/reports/tr14/proposed.html#ID defines Ideographic > (ID). Although full-width katakana is included in ID, half-width > katakana (U+FF66 and U+FF71-U+FF9D) isn't. Why? > > Also, Conditional Japanese Starter (CJ, > http://www.unicode.org/reports/tr14/proposed.html#CJ) considers > half-width variants such as half-width katakana letter small a. > > > -- Makoto > -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Tue Apr 28 03:03:14 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Tue, 28 Apr 2015 10:03:14 +0200 Subject: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana? In-Reply-To: References: <553EEE6D.2020004@ga2.so-net.ne.jp> Message-ID: Note: is it really allowed to break between a Latin letter and an half-width kana? Such sequences are frequent when there are untranslated foreign Latin (or may be Greek/Cyrillic/Hebrew/Arabic) insertions in Japanese (toponyms, trademarks, people names...), that are followed by a semantic kana terminator. If you allow this break, the terminator will loose its semantic. There are probably similar exception between [ideographs or fullwidth Latin/Greek/Cyrillic] and [half-width or full-width kana], for those script boundaries. 2015-04-28 9:47 GMT+02:00 Philippe Verdy : > My feeeling is that half-width kanas behave like Latin letters and do not > even have to follow the ideographic composition square to line up with them > (unlike standard kanas). So effectively their line breaking behavior is > very different. > > Those "half-width letters" are in fact similar to linear jamos (not > composed into syllabic squares) in the Korean script, and to Bopomofo > letters. And may be we could add the CJK key letters (radicals used for > example in IDS) to this list, or Yi radicals. > > They are harmonized to be used along with other alphabetic scripts. In > fact they may even not be really "half-width" but proportional. They are > also used with non-ideographic punctuation (notably the ASCII punctuation) > and standard SPACE (U+0020). > > If rendered in vertical lines, they could be either rotated (just like > Latin letters), or not (aligned horizontallly like letters in columns of > crosswords, but they may also have proportional height, like in > Latin/Greek/Cyrillic where it is sometimes needed for example with capital > letters with stacked accents, or when using sized spaces) > > So IMHO, those "half-width" letters are in fact to be considered as > another separate script, for typographic purpose. They are "unified" with > non-halfwidth letters, only for collation with minor differences > (plain-text searching and sorting). > > > 2015-04-28 4:20 GMT+02:00 Makoto Kato : > >> Hi. >> >> http://www.unicode.org/reports/tr14/proposed.html#ID defines Ideographic >> (ID). Although full-width katakana is included in ID, half-width >> katakana (U+FF66 and U+FF71-U+FF9D) isn't. Why? >> >> Also, Conditional Japanese Starter (CJ, >> http://www.unicode.org/reports/tr14/proposed.html#CJ) considers >> half-width variants such as half-width katakana letter small a. >> >> >> -- Makoto >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mpsuzuki at hiroshima-u.ac.jp Tue Apr 28 03:04:12 2015 From: mpsuzuki at hiroshima-u.ac.jp (suzuki toshiya) Date: Tue, 28 Apr 2015 17:04:12 +0900 Subject: ["Unicode"] Re: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana? In-Reply-To: References: <553EEE6D.2020004@ga2.so-net.ne.jp> Message-ID: <553F3EFC.9060603@hiroshima-u.ac.jp> Dear Philippe, Philippe Verdy wrote: > My feeeling is that half-width kanas behave like Latin letters and do not > even have to follow the ideographic composition square to line up with them > (unlike standard kanas). So effectively their line breaking behavior is > very different. Excuse me, do you mean that a half-width kana text should have the spaces between the words, although full-width (standard) kana text may not have? Could you tell me more about the community preferring such distinction? I think, the orthography proposed to write Japanese language in Kana without Kanji has the word-breaking space, like, http://ja.wikipedia.org/wiki/%E3%83%95%E3%82%A1%E3%82%A4%E3%83%AB:Kana_no_Hikari,_number_1,_page_1.png but it is not officialized, and, it does not distinguish full-width kana and half-width kana. Regards, mpsuzuki > Those "half-width letters" are in fact similar to linear jamos (not > composed into syllabic squares) in the Korean script, and to Bopomofo > letters. And may be we could add the CJK key letters (radicals used for > example in IDS) to this list, or Yi radicals. > > They are harmonized to be used along with other alphabetic scripts. In fact > they may even not be really "half-width" but proportional. They are also > used with non-ideographic punctuation (notably the ASCII punctuation) and > standard SPACE (U+0020). > > If rendered in vertical lines, they could be either rotated (just like > Latin letters), or not (aligned horizontallly like letters in columns of > crosswords, but they may also have proportional height, like in > Latin/Greek/Cyrillic where it is sometimes needed for example with capital > letters with stacked accents, or when using sized spaces) > > So IMHO, those "half-width" letters are in fact to be considered as another > separate script, for typographic purpose. They are "unified" with > non-halfwidth letters, only for collation with minor differences > (plain-text searching and sorting). > > > 2015-04-28 4:20 GMT+02:00 Makoto Kato : > >> Hi. >> >> http://www.unicode.org/reports/tr14/proposed.html#ID defines Ideographic >> (ID). Although full-width katakana is included in ID, half-width >> katakana (U+FF66 and U+FF71-U+FF9D) isn't. Why? >> >> Also, Conditional Japanese Starter (CJ, >> http://www.unicode.org/reports/tr14/proposed.html#CJ) considers >> half-width variants such as half-width katakana letter small a. >> >> >> -- Makoto >> > From wl at gnu.org Tue Apr 28 03:09:29 2015 From: wl at gnu.org (Werner LEMBERG) Date: Tue, 28 Apr 2015 10:09:29 +0200 (CEST) Subject: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana? In-Reply-To: References: <553EEE6D.2020004@ga2.so-net.ne.jp> Message-ID: <20150428.100929.286390887.wl@gnu.org> > My feeeling is that half-width kanas behave like Latin letters and > do not even have to follow the ideographic composition square to > line up with them (unlike standard kanas). It's exactly the half of the ideographic square. > So effectively their line breaking behavior is very different. Maybe. However, the most important property is to be able to start a new line after (almost) any half-width kana. > They are harmonized to be used along with other alphabetic > scripts. In fact they may even not be really "half-width" but > proportional. Do you have an example for that? I've *exclusively* seen fonts where half-width kanas are really half the CJK width. > If rendered in vertical lines, they could be either rotated (just > like Latin letters), Actually, I haven't seen half-width kanas ever used in vertical context. Does this exist? > So IMHO, those "half-width" letters are in fact to be considered as > another separate script, for typographic purpose. Yes, for typographic purposes. But typographic issues are not covered by Unicode. AFAIK, the existence of half-width kanas in Unicode is purely for backwards and round-trip compatibility. Werner From verdy_p at wanadoo.fr Tue Apr 28 03:10:56 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Tue, 28 Apr 2015 10:10:56 +0200 Subject: ["Unicode"] Re: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana? In-Reply-To: <553F3EFC.9060603@hiroshima-u.ac.jp> References: <553EEE6D.2020004@ga2.so-net.ne.jp> <553F3EFC.9060603@hiroshima-u.ac.jp> Message-ID: I just gave an opinion about what I have seen. I don't know if this is correct or preferred. Half-width text is a modern invention that does not obey the traditions used in CJK composition squares (which should also be rendered vertically by default, even if today on the Internet this is not the case, it is still the case for printed texts). They started being used at the same time that Latin letters started to be mixed in text, and computers appeared that offered only half-width character cells in monospaced fonts (to see other ideographs, those old computers needed to allocated two cells and use separate fonts for the left side and the right side) I don't know if whitespace is prefered or not in halfwidth text, I have seen both... 2015-04-28 10:04 GMT+02:00 suzuki toshiya : > Dear Philippe, > > Philippe Verdy wrote: > > My feeeling is that half-width kanas behave like Latin letters and do not > > even have to follow the ideographic composition square to line up with > them > > (unlike standard kanas). So effectively their line breaking behavior is > > very different. > > Excuse me, do you mean that a half-width kana text should > have the spaces between the words, although full-width > (standard) kana text may not have? Could you tell me more > about the community preferring such distinction? > > I think, the orthography proposed to write Japanese language > in Kana without Kanji has the word-breaking space, like, > > http://ja.wikipedia.org/wiki/%E3%83%95%E3%82%A1%E3%82%A4%E3%83%AB:Kana_no_Hikari,_number_1,_page_1.png > but it is not officialized, and, it does not distinguish > full-width kana and half-width kana. > > Regards, > mpsuzuki > > > > Those "half-width letters" are in fact similar to linear jamos (not > > composed into syllabic squares) in the Korean script, and to Bopomofo > > letters. And may be we could add the CJK key letters (radicals used for > > example in IDS) to this list, or Yi radicals. > > > > They are harmonized to be used along with other alphabetic scripts. In > fact > > they may even not be really "half-width" but proportional. They are also > > used with non-ideographic punctuation (notably the ASCII punctuation) and > > standard SPACE (U+0020). > > > > If rendered in vertical lines, they could be either rotated (just like > > Latin letters), or not (aligned horizontallly like letters in columns of > > crosswords, but they may also have proportional height, like in > > Latin/Greek/Cyrillic where it is sometimes needed for example with > capital > > letters with stacked accents, or when using sized spaces) > > > > So IMHO, those "half-width" letters are in fact to be considered as > another > > separate script, for typographic purpose. They are "unified" with > > non-halfwidth letters, only for collation with minor differences > > (plain-text searching and sorting). > > > > > > 2015-04-28 4:20 GMT+02:00 Makoto Kato : > > > >> Hi. > >> > >> http://www.unicode.org/reports/tr14/proposed.html#ID defines > Ideographic > >> (ID). Although full-width katakana is included in ID, half-width > >> katakana (U+FF66 and U+FF71-U+FF9D) isn't. Why? > >> > >> Also, Conditional Japanese Starter (CJ, > >> http://www.unicode.org/reports/tr14/proposed.html#CJ) considers > >> half-width variants such as half-width katakana letter small a. > >> > >> > >> -- Makoto > >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wl at gnu.org Tue Apr 28 03:12:34 2015 From: wl at gnu.org (Werner LEMBERG) Date: Tue, 28 Apr 2015 10:12:34 +0200 (CEST) Subject: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana? In-Reply-To: <20150428.100929.286390887.wl@gnu.org> References: <553EEE6D.2020004@ga2.so-net.ne.jp> <20150428.100929.286390887.wl@gnu.org> Message-ID: <20150428.101234.474804781.wl@gnu.org> > However, the most important property is to be able to start a new > line after (almost) any half-width kana. Bad formulation, sorry. I mean: However, the most important property is to be able to break a line after (almost) any half-width kana. Werner From verdy_p at wanadoo.fr Tue Apr 28 03:14:30 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Tue, 28 Apr 2015 10:14:30 +0200 Subject: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana? In-Reply-To: <20150428.100929.286390887.wl@gnu.org> References: <553EEE6D.2020004@ga2.so-net.ne.jp> <20150428.100929.286390887.wl@gnu.org> Message-ID: 2015-04-28 10:09 GMT+02:00 Werner LEMBERG : > Yes, for typographic purposes. But typographic issues are not covered > by Unicode. AFAIK, the existence of half-width kanas in Unicode is > purely for backwards and round-trip compatibility. > Yes, compatibility with typographic conventions. And yes I have seen half-width text rendered vertically (always rotated: I've not seen them for now aligned like in crosswords...). -------------- next part -------------- An HTML attachment was scrubbed... URL: From albrecht.dreiheller at siemens.com Tue Apr 28 04:02:31 2015 From: albrecht.dreiheller at siemens.com (Dreiheller, Albrecht) Date: Tue, 28 Apr 2015 09:02:31 +0000 Subject: AW: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana? In-Reply-To: <20150428.100929.286390887.wl@gnu.org> References: <553EEE6D.2020004@ga2.so-net.ne.jp> <20150428.100929.286390887.wl@gnu.org> Message-ID: <3E10480FE4510343914E4312AB46E74212AD7F78@DEFTHW99EH5MSX.ww902.siemens.net> No. They are still in use. One typical usage of half-width kanas is the display of short texts on small devices of embedded systems, like status messages of control units, for example a one-line display, 30 characters wide, monospace, with 8x10 pixels per character. Albrecht -----Urspr?ngliche Nachricht----- From: Unicode [mailto:unicode-bounces at unicode.org] Im Auftrag von Werner LEMBERG Sent: Dienstag, 28. April 2015 10:09 To: verdy_p at wanadoo.fr Cc: m_kato at ga2.so-net.ne.jp; unicode at unicode.org Subject: Re: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana? (...) AFAIK, the existence of half-width kanas in Unicode is purely for backwards and round-trip compatibility. From kenwhistler at att.net Tue Apr 28 20:22:47 2015 From: kenwhistler at att.net (Ken Whistler) Date: Tue, 28 Apr 2015 18:22:47 -0700 Subject: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana? In-Reply-To: References: <553EEE6D.2020004@ga2.so-net.ne.jp> <553EFB2E.3010808@hiroshima-u.ac.jp> Message-ID: <55403267.9060202@att.net> Taking this thread back to the original question... The Line_Break property values for halfwidth katakana (lb=AL) and regular katakana (lb=ID) have been stable since they were first defined for Unicode 3.0 -- 15 years ago. Regardless of whether lb=AL is the optimal assignment for the halfwidth katakana, it seems likely to me that trying to *change* that Line_Break assignment, just for halfwidth katakana, at this late date, would likely be more destabilizing for existing implementations, rather than helpful. The citations below show *different* behavior between browsers for linebreaking around halfwidth katakana. That suggests that Firefox and IE11 have already provided tailoring to better match expectations. The correct avenue forward, it seems to me, would be to pursue bugs against browsers that do not show expected behavior, to see if improvements there are feasible, rather than to modify the base Line_Break property values that everybody has to tailor *from*. Note that this is not *just* a Japanese problem nor a matter of not matching JIS X 4051. UAX #14 is *not* a direct implementation of JIS X 4051 rules, although it is certainly informed by them and has many Line_Break values introduced to get default behavior closer to the Japanese rules for linebreaking. And the compatibility halfwidth characters in the standard also include halfwidth jamo and symbols, so any changes also would need to be considered in the context of consistency for those and for *Korean* rules, as well as for Japanese. --Ken On 4/27/2015 10:57 PM, Makoto Kato wrote: > Hi, Suzuki-san. Thank you for reply. > >> At present, I have no objection to add halfwidth katakana >> to ideographic-class in UAX#14, but I'm unfamiliar with the >> (negative) impact caused by the lack of halfwidth katakana >> in it. Could you tell me if you know anything? > Since half-width katakana isn't ID, it isn't break line like > full-wdith katakana. > > > Firefox and IE11 define half-width katakana as ID. The line break of > half-width katakana is same as full-width katakana. > Chrome doesn't define it as ID. Half-width katakana isn't line break > per character. > > Although I read JIS X 4051, it doesn't define that half-width katakana > and full-width katakana are differently. > > >> I guess, the inclusion or exclusion in other classes, like, >> AI, AL, CJ, JL, JV, JT, SA might be quite important to realize >> the appropriate line breaking, but the inclusion or exclusion >> in ID-class does not seem to be important. If the inclusion >> in ID-class is important, more characters (e.g. Bopomofo) >> should be considered for full coverage. How do you think of? > My discussion is why half-width katanaka character isn't same class of > full-width katakana character. In this case, half-width katakana > originally defines as AL at current spec. So when moving to ID, break > rule is strongly changed. (non-break -> break before or after). > > > -- Makoto > > From verdy_p at wanadoo.fr Thu Apr 30 17:53:20 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Fri, 1 May 2015 00:53:20 +0200 Subject: alternate hyphens for word breaking Message-ID: I've seen several usages of a distinct hyphen being used in English dictionaries where it appears for breaking words (sometimes even not between syllables but at arbitrary positions within words). This hyphen is slightly different from a regular hyphen: it is slanted about 30? (2 o'clock), and a bit longer than standard orthographic hyphens (more or less the length of an en dash, the difference being that it is still not horizontal). Some other books are also using some wavy forms (slanted tilde), or a curved form similar to a mirrored "?" (except that the angle is rounded), or something like a small "/?" or a small "_/" (here also with a rounded angle). It is typically used in books printed in small formats, with narrow columns, and compacted presentation (to save the number of printed pages and reduce the volume of the book). I've seen them also in some pocket version of the Bible or other economic books in pocket format (including litterature, tourist guides, phone diaries in small formats). In technical documentations such as documents providing source code, it is used to indicate the presence of a force line break which is not part of the source code itself (in which cases other typical forms include some arrow head, or the hyphen has some diacritical ellipsis above it, or the hyphen or another symbols is decorated by some enclosing dotted box) Have you seen other forms for these special hyphens (used exclusively at end of line, and aligned with the right margin of text columns)? -------------- next part -------------- An HTML attachment was scrubbed... URL: