From rick at unicode.org Thu Jan 2 12:18:35 2014 From: rick at unicode.org (Rick McGowan) Date: Thu, 02 Jan 2014 10:18:35 -0800 Subject: Mail list changes for 2014 In-Reply-To: <52C2F57A.2020108@unicode.org> References: <529E619E.7030305@unicode.org> <52C2F57A.2020108@unicode.org> Message-ID: <52C5AD7B.9020000@unicode.org> Hello everyone. The Unicode mail list has now been re-activated. If you experience trouble with subscription issues or functionality, please feel free to e-mail me directly. Regards, Rick On 12/31/2013 8:48 AM, Rick McGowan wrote: > The mail list will now go off-line shortly, and be back after the new > year. > Regards, > Rick > > On 12/3/2013 2:56 PM, Rick McGowan wrote: >> At the end of the year, we will be changing the mail list server for >> the public-access mail lists, including this one. The new system will >> be Gnu "Mailman", an interface familiar to many. This should make it >> easier for users to handle their subscriptions and options in one >> place, via the web interface. >> >> We will thus be shutting down the public mail lists over the "holiday >> break" in the final days of 2013, and re-open with the new system in >> January 2014. >> >> Affected mail lists are those listed on the Mail Lists page here: >> http://www.unicode.org/consortium/distlist.html >> including Unicode, CLDR-Users, ULI-Users, and Indic. >> >> The new mail list system is documented here: >> http://www.gnu.org/software/mailman/ >> > From richard.wordingham at ntlworld.com Sun Jan 5 18:11:03 2014 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Mon, 6 Jan 2014 00:11:03 +0000 Subject: Codepoint Support for Phonetically-Aware Collation Message-ID: <20140106001103.3356960c@JRWUBU2> Several languages with phonetically ambiguous spelling take pronunciation into account when sorting words alphabetically. Typical examples are Welsh and Slovak, where contractions are not applied for chance combinations of characters ('ng' in Welsh and 'ch' in Slovak). Less typically, visually opaque syllable boundaries are taken into account, e.g. in Lao and in some older Thai dictionaries (though the Thai examples I know of were compiled by Europeans). There are two approaches to these ambiguities for correct automated collation. One can either use a vocabulary-based collation table (as is done for Tibetan-script languages) or use mark-up characters such as U+00AD SOFT HYPHEN, U+200B ZERO WIDTH SPACE or U+034F COMBINING GRAPHEME JOINER (CGJ) as appropriate to prevent contractions in collation. In the latter case, it is reasonable to assume that such characters will only be used when it is likely that the text will be subject to culturally-sensitive sorting. For example, the 'search' collation settings for Welsh in the CLDR do not use the contractions used for sorting Welsh, so one does not have to worry about the encoding of the town name 'Bangor' unless it will be presented in an index in Welsh - in which case Welsh inflections will be a greater source of trouble. CGJ may also used to distinguish umlaut and diaeresis (both usually encoded U+0308) in German, by encoding the diaeresis as . In some SE Asian dictionaries, an ordering distinction is made between the use of the letter corresponding to Indic PA to represent a voiced sound similar to /b/, used for native words, and the unvoiced sound /p/, used in Indic loan words. The examples I know of are U+1794 KHMER LETTER BA and U+1A37 TAI THAM LETTER BA. While it is possible to represent the contrasting sound /p/ by or U+1A38 TAI THAM LETTER HIGH PA respectively instead, in many Indic loan words this is not done. Is there any encoding level mark-up available to distinguish between the two pronunciations of BA when necessary? I had thought the problem had been solved for Khmer, but I can now find no evidence of a solution. The usage of the two scripts share the feature that as the first element of what is or was a true consonant cluster, BA usually (always?) has an unvoiced sound, not the voiced sound. (Sound changes have made the situation more complicated to describe in Tai Lue, Tai Khuen and Northern Thai, but the principle remains unchanged.) This complicates the use of what to me had seemed obvious, namely to use to represent the unvoiced sound. It would be more natural to use to indicate the voiced sound should it appear in clusters in foreign loanwords. Richard. From naenaguru at gmail.com Wed Jan 8 22:43:38 2014 From: naenaguru at gmail.com (Naena Guru) Date: Thu, 9 Jan 2014 10:13:38 +0530 Subject: interaction of Arabic ligatures with vowel marks In-Reply-To: <51B7E66B.1050101@gmail.com> References: <51B7E66B.1050101@gmail.com> Message-ID: Please see this page: (for IE, use v 2010 and up) http://lovatasinhala.com/ The font is almost all ligatures. If you copy and inspect the text, you'll notice that it is simple romanized Singhala. I am currently in Sri Lanka demonstrating this. The people at president's office and one of the powerful ministers have seen it. They are elated that after all, Singhala, the most complex of 'Abigudas' is much like a Western European language and amazingly computer and user friendly. This is contrary to how it was portrayed to them by local academics and technocrats causing the poor country unnecessary debt. The ideas of Abiguda and Complex fade away if a font is made fully understanding Unicode's description of ligatures and how they are implemented by OpenType (now OpenFont). I believe that Arabic and Hebrew can follow this model so that typing the script is simplified for users without compromising orthography. On Wed, Jun 12, 2013 at 8:39 AM, Stephan Stiller wrote: > Hi, > > How is the placement of vowel marks around ligatures handled in Arabic > text? > > Does anyone have good pointers on this topic? > > My guess is that this does not come up often (just like the topic of > pointing for handwritten Hebrew), as vowel marks are mostly not added in > ordinary text. Nonetheless, any text making heavy use of ligatures will > from time to time need to add vowel marks for a foreign name or as a > reading aid, and (as many of us know) the Quran is traditionally printed > with vowel marks. > > I'm also wondering how font designers normally handle this. I think there > are analogous questions for various ligature-heavy abugidas, so there must > be an existing body of knowledge. There should be better answers than > "squeeze the vowels around the consonant clusters in whatever way seems > most intuitive". Do traditional printing presses use extra metal types for > such glyph clusters, or do they manually add and adjust the positioning of > vowels? > > Stephan > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pravin.d.s at gmail.com Fri Jan 10 04:15:00 2014 From: pravin.d.s at gmail.com (pravin.d.s at gmail.com) Date: Fri, 10 Jan 2014 15:45:00 +0530 Subject: Handling Malayalam "NTA" issue for Lohit2 Message-ID: Hi All, We are working on lohit2[1] project, whose plan is to create standard and reusable open type tables with additional improvement. Lohit as a default system fonts in most of the open source distros always follow standard around language technology. (Font specification, Storage, Guideline related to Languages) Recently we started working on Lohit Malayalam font [2] with some planned improvement and came across couple of bugs related [3][4] with well know "NTA" issue introduced during the addition of Atomic chillu characters in Unicode 5.1 Now dilemma is number of users already using * A. u0D28 + u0D4D + u0D31 for getting NTA character even before Unicode 5.1 * * B. But Unicode from 5.1 onward says (TUS 6.2 chapter 9.9 p 321) use u0D7B + u0D4D + u0D31 for getting same "NTA" * In my humble opinion here one thing is very clear that Unicode forgot to add normalization (backward compatibility) for newly added sequence in (B). Still i have not seen any improvement in it from long time. Now dilemma with lohit2 development is - Lohit 1 is supporting sequence (A) from long time (even before Unicode 5.1), so for the backward compatibility lohit2 should support the same. - Since Lohit follows standards, it is important to support sequence (B) for following Unicode 6.3. But following Unicode 6.3 in this case clearly invites dual encoding without any normalization rules handy. Good documentation on NTA issues is available at [5] Presently i am in favour of not supporting Unicode defined sequence (B) in lohit2 and keep on using (A) which is used in Lohit fonts family from long time. Please let me know your view on it. Is there any chance of getting this mention in Unicode chapter 9? is there any chance of Normalization rule for this? Regards, Pravin Satpute 1. http://pravin-s.blogspot.in/2013/08/project-creating-standard-and-reusable.html 2. http://pravin-s.blogspot.in/2013/12/lohit2-lohit-malayalam-development-plans.html 3. https://bugzilla.redhat.com/show_bug.cgi?id=1016984 4. https://bugzilla.redhat.com/show_bug.cgi?id=1016989 5. http://thottingal.in/documents/Malayalam-NTA.pdf -------------- next part -------------- An HTML attachment was scrubbed... URL: From samjnaa at gmail.com Fri Jan 10 06:24:46 2014 From: samjnaa at gmail.com (Shriramana Sharma) Date: Fri, 10 Jan 2014 17:54:46 +0530 Subject: [Lohit-devel-list] Handling Malayalam "NTA" issue for Lohit2 In-Reply-To: References: Message-ID: On Fri, Jan 10, 2014 at 3:45 PM, pravin.d.s at gmail.com wrote: > In my humble opinion here one thing is very clear that Unicode forgot to > add normalization (backward compatibility) for newly added sequence in (B). Dear Pravin, If by normalization you mean http://www.unicode.org/glossary/#normalization -- then it is not possible in this case since the individually encoded chillus do not have canonical decomposition to their related consonants. Indeed, that would defeat the purpose of the separate encoding, which was to provide semantically distinct chillus! The recent additional chillus trickling into the standard seems to indicate that one should have encoded a CHILLU MARKER back then, but there's no going back now, so chillus galore! ;-) On a more serious note, I think it is important to adhere to the standard, as it is good for you in the long run even though it is difficult at first. If you delay the adoption of the standard, it only gets all the harder as time passes, since in the interim even more people continue to assume the old behaviour... -- Shriramana Sharma ???????????? ???????????? From paivakil at gmail.com Fri Jan 10 11:46:30 2014 From: paivakil at gmail.com (Mahesh T. Pai) Date: Fri, 10 Jan 2014 23:16:30 +0530 Subject: Handling Malayalam "NTA" issue for Lohit2 In-Reply-To: References: Message-ID: <20140110174630.GA18104@localhost> pravin.d.s at gmail.com said on Fri, Jan 10, 2014 at 03:45:00PM +0530,: - Lohit 1 is supporting sequence (A) from long time (even before > Unicode 5.1), so for the backward compatibility lohit2 should support the > same. > I believe thet the UTC wanted to maintain compatibility with some _beta_ version of Microsoft's some software in making the choice that it did regarding the /nta/ sequence. > Presently i am in favour of not supporting Unicode defined > sequence (B) in lohit2 and keep on using (A) which is used in Lohit > fonts family from long time. Allow me to go on a nostalgia trip. Almost a decade back, the then SMC team came accross what was obvious lack of clarity in the UTS. They decided, against my advise, to follow the suggestions in OpenType definition. To be fair, then, I had no alternative to offer, except not to implement the suggestion in the OpenType pages. Microsoft ultimately waited for some clarity in the UTS before implementing anything. and the communimity efforts went (mostly) in vain. Right now, given a choice between supporting legacy data and standards, I will choose the latter, with some kind of jugaad based on the PUA / glyph name to enable support for legacy data. Not the ideal situation, but when politics get the uppoer hand over merits, efficiency and appropriateness always takes a backseat. -- Mahesh T. Pai || free - (adj) able to act at will; not hampered; not under compulsion or restraint; free from obligations or duties; not bound to servitude; at liberty. From pravin.d.s at gmail.com Mon Jan 13 00:04:33 2014 From: pravin.d.s at gmail.com (pravin.d.s at gmail.com) Date: Mon, 13 Jan 2014 11:34:33 +0530 Subject: [Lohit-devel-list] Handling Malayalam "NTA" issue for Lohit2 In-Reply-To: References: Message-ID: On 10 January 2014 17:54, Shriramana Sharma wrote: > On Fri, Jan 10, 2014 at 3:45 PM, pravin.d.s at gmail.com > wrote: > > In my humble opinion here one thing is very clear that Unicode > forgot to > > add normalization (backward compatibility) for newly added sequence in > (B). > > Dear Pravin, > > If by normalization you mean > http://www.unicode.org/glossary/#normalization -- then it is not > possible in this case since the individually encoded chillus do not > have canonical decomposition to their related consonants. Indeed, that > would defeat the purpose of the separate encoding, which was to > provide semantically distinct chillus! > Ok not normalization but at least Unicode should mention old habit of writing NTA and new with addition of atomic chillu. It will definitely help people working on NLP to handle data having these two different sequence. > > On a more serious note, I think it is important to adhere to the > standard, as it is good for you in the long run even though it is > difficult at first. If you delay the adoption of the standard, it only > gets all the harder as time passes, since in the interim even more > people continue to assume the old behaviour... > >From font perspective if we consider there is NTA sequence is available in both form (A) & (B) in data around. We have to add required rules for both way. Unfortunately in this case Unicode has not consider for backward compatibility but at least Lohit project definitely consider it. So to be in safer side now i am fever of having both rules in font. Regards, Pravin Satpute -------------- next part -------------- An HTML attachment was scrubbed... URL: From pravin.d.s at gmail.com Mon Jan 13 00:28:52 2014 From: pravin.d.s at gmail.com (pravin.d.s at gmail.com) Date: Mon, 13 Jan 2014 11:58:52 +0530 Subject: Handling Malayalam "NTA" issue for Lohit2 In-Reply-To: <20140110174630.GA18104@localhost> References: <20140110174630.GA18104@localhost> Message-ID: On 10 January 2014 23:16, Mahesh T. Pai wrote: > pravin.d.s at gmail.com said on Fri, Jan 10, 2014 at 03:45:00PM +0530,: > - Lohit 1 is supporting sequence (A) from long time (even before > > Unicode 5.1), so for the backward compatibility lohit2 should support > the > > same. > > > > I believe thet the UTC wanted to maintain compatibility with some > _beta_ version of Microsoft's some software in making the choice that > it did regarding the /nta/ sequence. > > > > Presently i am in favour of not supporting Unicode defined > > sequence (B) in lohit2 and keep on using (A) which is used in Lohit > > fonts family from long time. > > Allow me to go on a nostalgia trip. Almost a decade back, the then SMC > team came accross what was obvious lack of clarity in the UTS. They > decided, against my advise, to follow the suggestions in OpenType > definition. To be fair, then, I had no alternative to offer, except > not to implement the suggestion in the OpenType pages. Microsoft > ultimately waited for some clarity in the UTS before implementing > anything. and the communimity efforts went (mostly) in vain. > I was wondering how ISCII was handling this. > > Right now, given a choice between supporting legacy data and > standards, I will choose the latter, with some kind of jugaad based on > the PUA / glyph name to enable support for legacy data. > Yeah, as said above will support both legacy and standard sequence. > > Not the ideal situation, but when politics get the uppoer hand over > merits, efficiency and appropriateness always takes a backseat. > That is pain point of standardization activities. Thanks & Regards, Pravin Satpute -------------- next part -------------- An HTML attachment was scrubbed... URL: From cibucj at gmail.com Mon Jan 13 00:32:16 2014 From: cibucj at gmail.com (=?UTF-8?B?4LS44LS/4LSs4LWBIOC0uOC0vyDgtJzgtYY=?=) Date: Sun, 12 Jan 2014 22:32:16 -0800 Subject: [Lohit-devel-list] Handling Malayalam "NTA" issue for Lohit2 In-Reply-To: References: Message-ID: In fact, there is one more sequence to consider. Kartika in Windows follows for NTA. However, the existing data in that sequence is quite less. In case, Chillus standard is asking display software to be prepared for data in both sequences. I agree, it could document NTA's legacy Vs standard sequences, likewise. 2014/1/12 pravin.d.s at gmail.com > > > > On 10 January 2014 17:54, Shriramana Sharma wrote: > >> On Fri, Jan 10, 2014 at 3:45 PM, pravin.d.s at gmail.com >> wrote: >> > In my humble opinion here one thing is very clear that Unicode >> forgot to >> > add normalization (backward compatibility) for newly added sequence in >> (B). >> >> Dear Pravin, >> >> If by normalization you mean >> http://www.unicode.org/glossary/#normalization -- then it is not >> possible in this case since the individually encoded chillus do not >> have canonical decomposition to their related consonants. Indeed, that >> would defeat the purpose of the separate encoding, which was to >> provide semantically distinct chillus! >> > > Ok not normalization but at least Unicode should mention old habit of > writing NTA and new with addition of atomic chillu. It will definitely help > people working on NLP to handle data having these two different sequence. > > >> >> On a more serious note, I think it is important to adhere to the >> standard, as it is good for you in the long run even though it is >> difficult at first. If you delay the adoption of the standard, it only >> gets all the harder as time passes, since in the interim even more >> people continue to assume the old behaviour... >> > > From font perspective if we consider there is NTA sequence is available in > both form (A) & (B) in data around. We have to add required rules for both > way. Unfortunately in this case Unicode has not consider for backward > compatibility but at least Lohit project definitely consider it. > > So to be in safer side now i am fever of having both rules in font. > > Regards, > Pravin Satpute > > > > _______________________________________________ > Indic mailing list > Indic at unicode.org > http://unicode.org/mailman/listinfo/indic > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From infofarmer at gmail.com Tue Jan 14 18:06:23 2014 From: infofarmer at gmail.com (Andrew Pantyukhin) Date: Wed, 15 Jan 2014 04:06:23 +0400 Subject: CJK IDS database Message-ID: Hi! I find Ideographic Description Sequences massively useful for studying and describing Chinese characters. However, I found only one comprehensive source of them ? http://macchiato.com/ids/ Does anyone know where the files come from? Were they part of the IRG process, or just an isolated effort? What are the private use characters in the sequences? I'd like to contribute to the IDS database and incorporate it into products like wiktionary and rikaikun. -------------- next part -------------- An HTML attachment was scrubbed... URL: From michel at suignard.com Tue Jan 14 21:36:07 2014 From: michel at suignard.com (Michel Suignard) Date: Wed, 15 Jan 2014 03:36:07 +0000 Subject: CJK IDS database In-Reply-To: References: Message-ID: <18d0f28f79804234b9d301aa40f4bf32@CO1PR02MB157.namprd02.prod.outlook.com> I guess you should ask the owner, our distinguished president. Michel From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Andrew Pantyukhin Sent: Tuesday, January 14, 2014 4:06 PM To: unicode at unicode.org Subject: CJK IDS database Hi! I find Ideographic Description Sequences massively useful for studying and describing Chinese characters. However, I found only one comprehensive source of them ? http://macchiato.com/ids/ Does anyone know where the files come from? Were they part of the IRG process, or just an isolated effort? What are the private use characters in the sequences? I'd like to contribute to the IDS database and incorporate it into products like wiktionary and rikaikun. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Tue Jan 14 23:53:51 2014 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJU=?=) Date: Wed, 15 Jan 2014 06:53:51 +0100 Subject: CJK IDS database In-Reply-To: <18d0f28f79804234b9d301aa40f4bf32@CO1PR02MB157.namprd02.prod.outlook.com> References: <18d0f28f79804234b9d301aa40f4bf32@CO1PR02MB157.namprd02.prod.outlook.com> Message-ID: Boy, I'd forgotten about those. There is an open-source collection of IDSs that I used to create those files. Unfortunately, I found that *that* data would take a lot of cleanup. I do agree that it would be very useful to have an open-source repository of IDSs for Unicode characters, but I don't know of one. Others? Mark *? Il meglio ? l?inimico del bene ?* On Wed, Jan 15, 2014 at 4:36 AM, Michel Suignard wrote: > I guess you should ask the owner, our distinguished president. > > Michel > > > > *From:* Unicode [mailto:unicode-bounces at unicode.org] *On Behalf Of *Andrew > Pantyukhin > *Sent:* Tuesday, January 14, 2014 4:06 PM > *To:* unicode at unicode.org > *Subject:* CJK IDS database > > > > Hi! > > I find Ideographic Description Sequences massively useful for studying and > describing Chinese characters. However, I found only one comprehensive > source of them ? http://macchiato.com/ids/ > > > Does anyone know where the files come from? Were they part of the IRG > process, or just an isolated effort? What are the private use characters in > the sequences? > > I'd like to contribute to the IDS database and incorporate it into > products like wiktionary and rikaikun. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mpsuzuki at hiroshima-u.ac.jp Wed Jan 15 00:10:54 2014 From: mpsuzuki at hiroshima-u.ac.jp (suzuki toshiya) Date: Wed, 15 Jan 2014 15:10:54 +0900 Subject: ["Unicode"] Re: CJK IDS database In-Reply-To: References: <18d0f28f79804234b9d301aa40f4bf32@CO1PR02MB157.namprd02.prod.outlook.com> Message-ID: <52D6266E.6090104@hiroshima-u.ac.jp> Hi, The query of the latest IDS collection is periodical issue in Unihan mailing list, I think :-) The repository maintained by Kawabata (technical editor of IRG Working Document Set) is now located at: https://github.com/cjkvi # the users should be careful the location of the # repository is stablized. It is often changed (without # notice of new place to go), don't be afraid and ask # experts where to go. Kawabata-san's work is based on CHISE database, which is available at: http://git.chise.org/gitweb/?p=chise/ids.git Regards, mpsuzuki Mark Davis ? wrote: > Boy, I'd forgotten about those. There is an open-source collection of IDSs > that I used to create those files. Unfortunately, I found that *that* data > would take a lot of cleanup. > > I do agree that it would be very useful to have an open-source repository > of IDSs for Unicode characters, but I don't know of one. Others? > > > Mark > > *? Il meglio ? l?inimico del bene ?* > > > On Wed, Jan 15, 2014 at 4:36 AM, Michel Suignard wrote: > >> I guess you should ask the owner, our distinguished president. >> >> Michel >> >> >> >> *From:* Unicode [mailto:unicode-bounces at unicode.org] *On Behalf Of *Andrew >> Pantyukhin >> *Sent:* Tuesday, January 14, 2014 4:06 PM >> *To:* unicode at unicode.org >> *Subject:* CJK IDS database >> >> >> >> Hi! >> >> I find Ideographic Description Sequences massively useful for studying and >> describing Chinese characters. However, I found only one comprehensive >> source of them ? http://macchiato.com/ids/ >> >> >> Does anyone know where the files come from? Were they part of the IRG >> process, or just an isolated effort? What are the private use characters in >> the sequences? >> >> I'd like to contribute to the IDS database and incorporate it into >> products like wiktionary and rikaikun. >> >> >> > > > ------------------------------------------------------------------------ > > _______________________________________________ > Unicode mailing list > Unicode at unicode.org > http://unicode.org/mailman/listinfo/unicode From xn--mlform-iua at xn--mlform-iua.no Wed Jan 15 21:43:05 2014 From: xn--mlform-iua at xn--mlform-iua.no (Leif Halvard Silli) Date: Thu, 16 Jan 2014 04:43:05 +0100 Subject: Commercial minus as italic variant of division sign in German and Scandinavian context Message-ID: <20140116044305293116.f28ead07@xn--mlform-iua.no> Thanks to our discussion in July 2012,[1] the Unicode code charts now says, about 00F7 ? DIVISION SIGN, this: ?? occasionally used as an alternate, more visually distinct version of 2212 ? {MINUS SIGN} or 2011 ? {NON-BREAKING HYPHEN} in some contexts [? snip ?] ? 2052 ? commercial minus sign? However, I think it can also be added somewhere that commercial minus is just the italic variant of ?division minus?. I?ll hereby argue for this based on an old German book on ?commercial arithmetics? I have come accross, plus what the the July 2012 discussion and what Unicode already says about the commercial sign: FIRST: IDENTICAL CONTEXTS. German language is an important locale for the Commercial Minus. In German, the Commercial minus is both referred to as ?kaufm?nnische Minus(zeichen)? and as "buchhalterische Minus" (?Commercial Minus Character? and ?Bookkeeper Minus?). And, speaking of ?division minus? in the context I know best, Norway, we find it in advertising (commercial context) and in book keeping documentation and taxation forms. Simply put, what the Unicode 6.2 ?General Punctuation? section says about Commercial Minus, can also be said about DIVISION SIGN used as minus: ?U+2052 % commercial minus sign is used in commercial or tax related forms or publications in several European countries, including Germany and Scandinavia.? So, basically and for the most part, the commercial minus and the ?division sign minus? occur in the very same contexts, with very much the same meaning. This is a strong hint that they are the same character. SECOND: GERMAN USE OF DIVISION SIGN FOR MINUS IN COMMERCIAL CONTEXT. Is there any proof that German used both an italics variant and a non-italics variant of the ?division minus?? Seemingly yes. The book ?Kaufm?nnische Arithmetik? (?Commercial arithmetics?) from 1825 by Johann Philipp Schellenberg. By reading section 118 ?Anhang zur Addition und Subtraction der Br?che? [?Appendix about the addition and subtraction of fractions?]) at page 213 and onwards,[2] we can conclude that he describes as ?commercial? use of the ? ?division minus?, where the ? signifies a _negative remainder_ of a division (while the plus sign is used to signify a positive remainder). Or to quote, from page 214: ?so wird das Fehlende durch das [Zei]chen ? (minus) bemerkt, und bei Berechn[nung der Preis der Waare abgezogen? [?then the lacking remainder is marked with the ? (minus) and withdrawn when the price of the commodity is calculated?]. {Note that some bits of the text are lacking, I marked my guessed in square brackets.} I did not find (yet) that he used the italic commercial minus, however, the context is correct. (My guess is that the italics variant has been put to more use, in the computer age, partly to separate it from the DIVISION SIGN or may be simply because people started to see it often in handwriting but seldom in print. And so would not have recognized it in the form of the non-italic division sign.) THIRD: IDENTICAL INTERPRETATION The word ?abgezogen? in the above quote is interesting since the Code Charts for 2052 ? COMMERCIAL MINUS cites the related German word ?abz?glich?. And from the Swedish context, the charts quotes the expression ?med avdrag?. English translation might be ?to be withdrawn? or ?with subtraction/rebate [for]?. Simply put, we here see the commercial meaning. WHAT ABOUT COMMERCIAL MINUS AS ?CORRECT? SIGN IN SCANDINAVIAN SCHOOLS? UNICODE 6.3 notes that in some European (e.g. Finnish, Swedish and perhaps Norwegian) traditions, teachers use the Commercial Minus Sign to signify that something is correct (whereas a red check mark is used to signify error). If my theory is right, that commercial minus and division sign minus are the same signs, how on earth is that possible? How can a minus sign count as positive for the student? The answer is, I think, to be found in the Code Chart?s Swedish description ("med avdrag"/"with subtraction/rebate"). Because, I think that the correct understanding is not that it means "correct" or "OK". Rather, it denotes something that is counted in the customer/student?s favor. So, you could say it it really means "slack", or "rebate". So it really mans ?good answer?. It is a ?rebate? that the student rightfully deserves. FOURTH: A DEEPER MEANING If we look at it from a very high level, then we can say that the division minus is used to signify something that is the result of a calculation - such as a price, an entry in bookkeeping or, indeed, a character/mark/point/score in a (home)work evaluated by a teacher. Whereas the ?normal? minus sign is used to when we represent negative data. For example, in taxation, all the numbers one reports, is the result of some calculation. Likewise, when a teach ticks of an answer as ?good answer?, then it is because the teacher has evaluated (a.k.a. ?calculated?) the answer and found it to be good and that the student has calculated correctly/well. CIRCUMSTANCIAL EVIDENCE The commercial minus looks like a percentage sign. And also, in programming, e.g. JavaScript, the percentage sign is often used for the modulo operator - which is an operator that finds the dividend of a division. Hence, when we take all this together, I believe we have to conclude that the COMMERCIAL MINUS is just the italic variant of the DIVISION SIGN. PS: For more German documentation of this custom, it would probably be wise to research books about bookkeeping as well as ?commercial arithmetics?. I also have a suspicion that it would be worth investigation contexts where modulo/division remainders operations are found - for instance, in calendar calculations. [1] http://www.unicode.org/mail-arch/unicode-ml/y2012-m07/0053.html [2] https://archive.org/stream/kaufmnnischeari00schegoog#page/n229/mode/2up -- leif halvard silli From asmusf at ix.netcom.com Thu Jan 16 01:17:46 2014 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Wed, 15 Jan 2014 23:17:46 -0800 Subject: Commercial minus as italic variant of division sign in German and Scandinavian context In-Reply-To: <20140116044305293116.f28ead07@xn--mlform-iua.no> References: <20140116044305293116.f28ead07@xn--mlform-iua.no> Message-ID: <52D7879A.70103@ix.netcom.com> I find it unhelpful to consider 2052 as the italic variant of 00F7, and further find the "evidence" for that not all that germane. Both are variants of the "-" sign, and so ipso facto are variants of each other. However, to identify something as "italic" to me would require that one form is used in the context of italic fonts, while the other is not. I cannot see anything supporting that interpretation in the "evidence" adduced below. On the contrary, you would expect both forms available in sans-serif and typewriter fonts (those being perhaps the most common for accounting), and perhaps also roman. Further, while italic (as well as oblique fonts) tend to slant the letter forms, there's not a universal, established practice of turning horizontal dashes into slashes to mark the alternation between roman and italic fonts. From that perspective, considering one the "italic" variant of the other also appears to be a non-starter. However, it seems to be possible to establish that these two characters are indeed rather close variants: both are used to visually emphasize the minus sign by means of decorating it with a pair of dots. And both are employed in situations that are have a large semantic overlap. (Not surprisingly, because their meaning is based on the minus sign). The choice of variant, though, is driven by context and tradition for a given type of document, not by choice of font style. And, the choice of using 2052 instead of hyphen-minus or minus is deliberate and conscious, making it an alternate spelling rather than an alternate "glyph". If 00F7 can be used to stand in as a marked 2011, as claimed in the Unicode namelist annotation then that use is clearly NOT as a variant of 2052, because 2011 does not have any connotations of negation. That means the semantic relations between 00F7 and 2052 only partially overlap, which is yet another indication that thinking of one as a font-style variant of the other is not particularly helpful - even if the ultimate origin may have derived from the same sign. At this stage of the game, they are properly disunified, just as i and j or u and v. A./ On 1/15/2014 7:43 PM, Leif Halvard Silli wrote: > Thanks to our discussion in July 2012,[1] the Unicode code charts now > says, about 00F7 ? DIVISION SIGN, this: > > ?? occasionally used as an alternate, more visually > distinct version of 2212 ? {MINUS SIGN} or 2011 ? > {NON-BREAKING HYPHEN} in some contexts > [? snip ?] > ? 2052 ? commercial minus sign? > > However, I think it can also be added somewhere that commercial minus > is just the italic variant of ?division minus?. I?ll hereby argue for > this based on an old German book on ?commercial arithmetics? I have > come accross, plus what the the July 2012 discussion and what Unicode > already says about the commercial sign: > > FIRST: IDENTICAL CONTEXTS. > > German language is an important locale for the Commercial Minus. In > German, the Commercial minus is both referred to as ?kaufm?nnische > Minus(zeichen)? and as "buchhalterische Minus" (?Commercial Minus > Character? and ?Bookkeeper Minus?). And, speaking of ?division minus? > in the context I know best, Norway, we find it in advertising > (commercial context) and in book keeping documentation and taxation > forms. Simply put, what the Unicode 6.2 ?General Punctuation? section > says about Commercial Minus, can also be said about DIVISION SIGN used > as minus: ?U+2052 % commercial minus sign is used in commercial or tax > related forms or publications in several European countries, including > Germany and Scandinavia.? So, basically and for the most part, the > commercial minus and the ?division sign minus? occur in the very same > contexts, with very much the same meaning. This is a strong hint that > they are the same character. > > SECOND: GERMAN USE OF DIVISION SIGN FOR MINUS IN COMMERCIAL CONTEXT. > > Is there any proof that German used both an italics variant and a > non-italics variant of the ?division minus?? Seemingly yes. The book > ?Kaufm?nnische Arithmetik? (?Commercial arithmetics?) from 1825 by > Johann Philipp Schellenberg. By reading section 118 ?Anhang zur > Addition und Subtraction der Br?che? [?Appendix about the addition and > subtraction of fractions?]) at page 213 and onwards,[2] we can conclude > that he describes as ?commercial? use of the ? ?division minus?, where > the ? signifies a _negative remainder_ of a division (while the plus > sign is used to signify a positive remainder). Or to quote, from page > 214: ?so wird das Fehlende durch das [Zei]chen ? (minus) bemerkt, und > bei Berechn[nung der Preis der Waare abgezogen? [?then the lacking > remainder is marked with the ? (minus) and withdrawn when the price of > the commodity is calculated?]. {Note that some bits of the text are > lacking, I marked my guessed in square brackets.} I did not find (yet) > that he used the italic commercial minus, however, the context is > correct. (My guess is that the italics variant has been put to more > use, in the computer age, partly to separate it from the DIVISION SIGN > or may be simply because people started to see it often in handwriting > but seldom in print. And so would not have recognized it in the form of > the non-italic division sign.) > > THIRD: IDENTICAL INTERPRETATION > > The word ?abgezogen? in the above quote is interesting since the Code > Charts for 2052 ? COMMERCIAL MINUS cites the related German word > ?abz?glich?. And from the Swedish context, the charts quotes the > expression ?med avdrag?. English translation might be ?to be withdrawn? > or ?with subtraction/rebate [for]?. Simply put, we here see the > commercial meaning. > > WHAT ABOUT COMMERCIAL MINUS AS ?CORRECT? SIGN IN SCANDINAVIAN SCHOOLS? > > UNICODE 6.3 notes that in some European (e.g. Finnish, Swedish and > perhaps Norwegian) traditions, teachers use the Commercial Minus Sign > to signify that something is correct (whereas a red check mark is used > to signify error). If my theory is right, that commercial minus and > division sign minus are the same signs, how on earth is that possible? > How can a minus sign count as positive for the student? > > The answer is, I think, to be found in the Code Chart?s Swedish > description ("med avdrag"/"with subtraction/rebate"). Because, I think > that the correct understanding is not that it means "correct" or "OK". > Rather, it denotes something that is counted in the customer/student?s > favor. So, you could say it it really means "slack", or "rebate". So > it really mans ?good answer?. It is a ?rebate? that the student > rightfully deserves. > > FOURTH: A DEEPER MEANING > > If we look at it from a very high level, then we can say that the > division minus is used to signify something that is the result of a > calculation - such as a price, an entry in bookkeeping or, indeed, a > character/mark/point/score in a (home)work evaluated by a teacher. > Whereas the ?normal? minus sign is used to when we represent negative > data. For example, in taxation, all the numbers one reports, is the > result of some calculation. Likewise, when a teach ticks of an answer > as ?good answer?, then it is because the teacher has evaluated (a.k.a. > ?calculated?) the answer and found it to be good and that the student > has calculated correctly/well. > > CIRCUMSTANCIAL EVIDENCE > > The commercial minus looks like a percentage sign. And also, in > programming, e.g. JavaScript, the percentage sign is often used for the > modulo operator - which is an operator that finds the dividend of a > division. > > Hence, when we take all this together, I believe we have to conclude > that the COMMERCIAL MINUS is just the italic variant of the DIVISION > SIGN. > > PS: For more German documentation of this custom, it would probably be > wise to research books about bookkeeping as well as ?commercial > arithmetics?. I also have a suspicion that it would be worth > investigation contexts where modulo/division remainders operations are > found - for instance, in calendar calculations. > > [1] http://www.unicode.org/mail-arch/unicode-ml/y2012-m07/0053.html > [2] > https://archive.org/stream/kaufmnnischeari00schegoog#page/n229/mode/2up From jknappen at web.de Thu Jan 16 02:26:10 2014 From: jknappen at web.de (=?UTF-8?Q?=22J=C3=B6rg_Knappen=22?=) Date: Thu, 16 Jan 2014 09:26:10 +0100 (CET) Subject: Aw: Commercial minus as italic variant of division sign in German and Scandinavian context In-Reply-To: <20140116044305293116.f28ead07@xn--mlform-iua.no> References: <20140116044305293116.f28ead07@xn--mlform-iua.no> Message-ID: An HTML attachment was scrubbed... URL: From xn--mlform-iua at xn--mlform-iua.no Thu Jan 16 07:34:23 2014 From: xn--mlform-iua at xn--mlform-iua.no (Leif Halvard Silli) Date: Thu, 16 Jan 2014 14:34:23 +0100 Subject: Commercial minus as italic variant of division sign in German and Scandinavian context In-Reply-To: <52D7879A.70103@ix.netcom.com> References: <20140116044305293116.f28ead07@xn--mlform-iua.no> <52D7879A.70103@ix.netcom.com> Message-ID: <20140116143423686172.a3f32e12@xn--mlform-iua.no> Asmus, I am not certain that commercial minus isn?t sometimes used as italics for the ?division sign minus?. For instance, when looking at my message in Firefox [1], the commercial minus looks like a ?handwritten? variant of the division sign. I think it would be entirely possible to use a that way looking commercial minus in a Norwegian taxation formulary, for instance. (I attach a screenshot of it.) I suspect that it is a monospace Courier font. Also, I wonder about the claim in the General Punctuation section that commercial minus is used in taxation forms in Scandinavia and Germany. I would dearly like to see the evidence for that claim. I must say that I suspect that the use of the division sign in Norwegian taxation forms for this purpose have been counted in a s evidence for that claim - could it be that our ?straight commercial minus? was counted as, well, a commercial minus? Could it be that the wish to see oneself - or us - in the ?German tradition?, made one draw the wrong conclusion about which character we use? Anyway, when I spoke if 2052 as an italic version of 00F7, I meant in the, kind of, ?mathematical? sense: Unicode for instance contains both MATHEMATICAL BOLD ITALIC CAPITAL A, and MATHEMATICAL BOLD CAPITAL A, and even if they are (I believe) used for different mathematical purposes, everyone sees and knows that they are variants of one and the same letter - the capital A. And also, in some contexts, one might be able to use a normal capital A instead of the mathematical ones. The same knowledge is not present about 00F7 and 2052. The best would have been if the two characters shared a similar name. For instance, if 00F7 got an additional, synonymous name, like STRAIGHT COMMERCIAL MINUS, or perhaps, better, COMMERCIAL HYPHEN-MINUS. Then the relationship would be clear - or at least clearer. Like MATHEMATICAL BOLD ITALIC CAPITAL A, and MATHEMATICAL BOLD CAPITAL A show, two characters do not need to be 100% synonymous just because their names only differs in a stylistic way, so to speak. When reading Unicode, one is only left to guess about the relationship between 00F7 and 2052. For instance, 2052 is described in the general punctuation - and distinguished there from the ?normal? minus and hyphen-minus, whereas 0057 is not described there. A sentence, there, that said that, in some countries, it is actually the 00F7 and not the 2052, that is used, would be very helpful and enlightenting. Likewise, there is no description of 00F7 amongst the dashes/hyphens. You wrote: > Further, while italic (as well as oblique fonts) tend to slant the letter > forms, there's not a universal, established practice of turning horizontal > dashes into slashes to mark the alternation between roman and > italic fonts. From that perspective, considering one the "italic" > variant of the other also appears to be a non-starter. Right. And I can only underline once more that I meant ?italic? as part of the name, see above. You: > However, it seems to be possible to establish that these two > characters are indeed rather close variants: [?] Indeed. > The choice of variant, though, is driven by context and tradition > for a given type of document, not by choice of font style. > And, the choice of using 2052 instead of hyphen-minus or minus > is deliberate and conscious, making it an alternate spelling rather > than an alternate "glyph". Well, yes. > If 00F7 can be used to stand in as a marked 2011, as claimed in > the Unicode namelist annotation then that use is clearly NOT > as a variant of 2052, because 2011 does not have > any connotations of negation. It is an argument for seeing 00F7 as (also) a hyphen-minus variant, no? > That means the semantic > relations between 00F7 and 2052 only partially overlap, which > is yet another indication that thinking of one as a font-style > variant of the other is not particularly helpful - even if the > ultimate origin may have derived from the same sign. > > At this stage of the game, they are properly disunified, > just as i and j or u and v. I am not really arguing for their unification - which anyhow is impossible, if I have understood the stability rules of Unicode. (Whereas an *additional* name is not ruled out, if I got it right.) I am ?only? arguing that Unicode takes information that clearly links the two together. As it is today, no one seems to realize how commercial minus relates to ?division sign minus?. [1] http://unicode.org/pipermail/unicode/2014-January/000013.html [2] attachment of the file ?screenshot-of-minuses.png" -------------- next part -------------- A non-text attachment was scrubbed... Name: screenshot-of-minuses.png Type: image/png Size: 9512 bytes Desc: not available URL: -------------- next part -------------- Leif Halvard Silli Asmus Freytag, Wed, 15 Jan 2014 23:17:46 -0800: > I find it unhelpful to consider 2052 as the italic variant of 00F7, and > further find the "evidence" for that not all that germane. > > Both are variants of the "-" sign, and so ipso facto are variants of > each other. > > However, to identify something as "italic" to me would require that > one form is used in the context of italic fonts, while the other is not. > > I cannot see anything supporting that interpretation in the "evidence" > adduced below. > > On the contrary, you would expect both forms available in sans-serif > and typewriter fonts (those being perhaps the most common for > accounting), and perhaps also roman. > > Further, while italic (as well as oblique fonts) tend to slant the letter > forms, there's not a universal, established practice of turning horizontal > dashes into slashes to mark the alternation between roman and > italic fonts. From that perspective, considering one the "italic" > variant of the other also appears to be a non-starter. > > However, it seems to be possible to establish that these two > characters are indeed rather close variants: both are used > to visually emphasize the minus sign by means of decorating > it with a pair of dots. And both are employed in situations that > are have a large semantic overlap. (Not surprisingly, because their > meaning is based on the minus sign). > > The choice of variant, though, is driven by context and tradition > for a given type of document, not by choice of font style. > And, the choice of using 2052 instead of hyphen-minus or minus > is deliberate and conscious, making it an alternate spelling rather > than an alternate "glyph". > > If 00F7 can be used to stand in as a marked 2011, as claimed in > the Unicode namelist annotation then that use is clearly NOT > as a variant of 2052, because 2011 does not have > any connotations of negation. That means the semantic > relations between 00F7 and 2052 only partially overlap, which > is yet another indication that thinking of one as a font-style > variant of the other is not particularly helpful - even if the > ultimate origin may have derived from the same sign. > > At this stage of the game, they are properly disunified, > just as i and j or u and v. > > A./ > > > > > On 1/15/2014 7:43 PM, Leif Halvard Silli wrote: >> Thanks to our discussion in July 2012,[1] the Unicode code charts now >> says, about 00F7 ? DIVISION SIGN, this: >> >> ?? occasionally used as an alternate, more visually >> distinct version of 2212 ? {MINUS SIGN} or 2011 ? >> {NON-BREAKING HYPHEN} in some contexts >> [? snip ?] >> ? 2052 ? commercial minus sign? >> >> However, I think it can also be added somewhere that commercial minus >> is just the italic variant of ?division minus?. I?ll hereby argue for >> this based on an old German book on ?commercial arithmetics? I have >> come accross, plus what the the July 2012 discussion and what Unicode >> already says about the commercial sign: >> >> FIRST: IDENTICAL CONTEXTS. >> >> German language is an important locale for the Commercial Minus. In >> German, the Commercial minus is both referred to as ?kaufm?nnische >> Minus(zeichen)? and as "buchhalterische Minus" (?Commercial Minus >> Character? and ?Bookkeeper Minus?). And, speaking of ?division minus? >> in the context I know best, Norway, we find it in advertising >> (commercial context) and in book keeping documentation and taxation >> forms. Simply put, what the Unicode 6.2 ?General Punctuation? section >> says about Commercial Minus, can also be said about DIVISION SIGN used >> as minus: ?U+2052 % commercial minus sign is used in commercial or tax >> related forms or publications in several European countries, including >> Germany and Scandinavia.? So, basically and for the most part, the >> commercial minus and the ?division sign minus? occur in the very same >> contexts, with very much the same meaning. This is a strong hint that >> they are the same character. >> >> SECOND: GERMAN USE OF DIVISION SIGN FOR MINUS IN COMMERCIAL CONTEXT. >> >> Is there any proof that German used both an italics variant and a >> non-italics variant of the ?division minus?? Seemingly yes. The book >> ?Kaufm?nnische Arithmetik? (?Commercial arithmetics?) from 1825 by >> Johann Philipp Schellenberg. By reading section 118 ?Anhang zur >> Addition und Subtraction der Br?che? [?Appendix about the addition and >> subtraction of fractions?]) at page 213 and onwards,[2] we can conclude >> that he describes as ?commercial? use of the ? ?division minus?, where >> the ? signifies a _negative remainder_ of a division (while the plus >> sign is used to signify a positive remainder). Or to quote, from page >> 214: ?so wird das Fehlende durch das [Zei]chen ? (minus) bemerkt, und >> bei Berechn[nung der Preis der Waare abgezogen? [?then the lacking >> remainder is marked with the ? (minus) and withdrawn when the price of >> the commodity is calculated?]. {Note that some bits of the text are >> lacking, I marked my guessed in square brackets.} I did not find (yet) >> that he used the italic commercial minus, however, the context is >> correct. (My guess is that the italics variant has been put to more >> use, in the computer age, partly to separate it from the DIVISION SIGN >> or may be simply because people started to see it often in handwriting >> but seldom in print. And so would not have recognized it in the form of >> the non-italic division sign.) >> >> THIRD: IDENTICAL INTERPRETATION >> >> The word ?abgezogen? in the above quote is interesting since the Code >> Charts for 2052 ? COMMERCIAL MINUS cites the related German word >> ?abz?glich?. And from the Swedish context, the charts quotes the >> expression ?med avdrag?. English translation might be ?to be withdrawn? >> or ?with subtraction/rebate [for]?. Simply put, we here see the >> commercial meaning. >> >> WHAT ABOUT COMMERCIAL MINUS AS ?CORRECT? SIGN IN SCANDINAVIAN SCHOOLS? >> >> UNICODE 6.3 notes that in some European (e.g. Finnish, Swedish and >> perhaps Norwegian) traditions, teachers use the Commercial Minus Sign >> to signify that something is correct (whereas a red check mark is used >> to signify error). If my theory is right, that commercial minus and >> division sign minus are the same signs, how on earth is that possible? >> How can a minus sign count as positive for the student? >> >> The answer is, I think, to be found in the Code Chart?s Swedish >> description ("med avdrag"/"with subtraction/rebate"). Because, I think >> that the correct understanding is not that it means "correct" or "OK". >> Rather, it denotes something that is counted in the customer/student?s >> favor. So, you could say it it really means "slack", or "rebate". So >> it really mans ?good answer?. It is a ?rebate? that the student >> rightfully deserves. >> >> FOURTH: A DEEPER MEANING >> >> If we look at it from a very high level, then we can say that the >> division minus is used to signify something that is the result of a >> calculation - such as a price, an entry in bookkeeping or, indeed, a >> character/mark/point/score in a (home)work evaluated by a teacher. >> Whereas the ?normal? minus sign is used to when we represent negative >> data. For example, in taxation, all the numbers one reports, is the >> result of some calculation. Likewise, when a teach ticks of an answer >> as ?good answer?, then it is because the teacher has evaluated (a.k.a. >> ?calculated?) the answer and found it to be good and that the student >> has calculated correctly/well. >> >> CIRCUMSTANCIAL EVIDENCE >> >> The commercial minus looks like a percentage sign. And also, in >> programming, e.g. JavaScript, the percentage sign is often used for the >> modulo operator - which is an operator that finds the dividend of a >> division. >> >> Hence, when we take all this together, I believe we have to conclude >> that the COMMERCIAL MINUS is just the italic variant of the DIVISION >> SIGN. >> >> PS: For more German documentation of this custom, it would probably be >> wise to research books about bookkeeping as well as ?commercial >> arithmetics?. I also have a suspicion that it would be worth >> investigation contexts where modulo/division remainders operations are >> found - for instance, in calendar calculations. >> >> [1] http://www.unicode.org/mail-arch/unicode-ml/y2012-m07/0053.html >> [2] >> https://archive.org/stream/kaufmnnischeari00schegoog#page/n229/mode/2up > From xn--mlform-iua at xn--mlform-iua.no Thu Jan 16 07:54:55 2014 From: xn--mlform-iua at xn--mlform-iua.no (Leif Halvard Silli) Date: Thu, 16 Jan 2014 14:54:55 +0100 Subject: Aw: Commercial minus as italic variant of division sign in German and Scandinavian context In-Reply-To: References: <20140116044305293116.f28ead07@xn--mlform-iua.no> Message-ID: <20140116145455777875.bc3e6891@xn--mlform-iua.no> "J?rg Knappen", Thu, 16 Jan 2014 09:26:10 +0100 (CET): > The most important word in the comment on 00F7 ? DIVISION SIGN is > "occasionally". > ? > In fact, the occasions are such rare that you can live a whole life > in germany without encountering one of them. > ? > On the other hand, 00F7 ? DIVISION SIGN is used _frequently_ in > german schoolbooks to denote ... > division (books aimed at professionals doing math prefer : (COLON) or > / (SLASH) for this purpose, but schoolbooks don't). This sounds like Norway ... ? > 2052 ? commercial minus sign _always_ means subtraction and it has > this shape (or the alternate shape ./.) > in all contexts, roman or italic. It is not the italic version of > some other symbol. So, I can only once more emphasize that when I said ?italics? I meant it the way Unicode already have many characters (primarily mathematical ones) which are distinguished, in name, only by a reference to the style of the letter. Hope this helps. As for the clarity of 2052 ? commercial minus sign, no, you are wrong. While it is clear to you, in Germany, perhaps, at least in some Scandinavian school contexts, it has a different meaning, namely as a ?well done? sign, from the teacher. As for the Norwegian context, I guess we can say that the use of ? DIVISION SIGN as minus sing is more on the down than on the up. But it has its contexts (and just last week, I received an ad for glasses were it was used), and no one thinks about it. It is not an issue. When we get the taxation form on paper or in PDF form, the division minus is there, and everyone understands it correctly. (Knock on woods - *some* probably stumbles.) They don?t every realize what they see - it is knowledge that is unaccounted for. (For instance, until I took this up, Wikipedia made no mention of it. Hah! Even Unicode 6.3 talks about the ?commercial minus sign? in _Scandinavian_ taxation forms, without (is my claim) understanding that it talks about DIVISION SIGN. See my reply to Asmus.) So what I don?t want is that the ?untraditional? uses of ? DIVISION SIGN are left in the dark as some strange traditions without any roots. Also, I don't want the commercial minus to live a life as if it is such a unique thing. Let us document things properly. Leif Halvard Silli > Gesendet:?Donnerstag, 16. Januar 2014 um 04:43 Uhr > Von:?"Leif Halvard Silli" > An:?unicode at unicode.org > Betreff:?Commercial minus as italic variant of division sign in > German and Scandinavian context > Thanks to our discussion in July 2012,[1] the Unicode code charts now > says, about 00F7 ? DIVISION SIGN, this: > > ?? occasionally used as an alternate, more visually > distinct version of 2212 ? {MINUS SIGN} or 2011 ? > {NON-BREAKING HYPHEN} in some contexts > [? snip ?] > ? 2052 ? commercial minus sign? > > However, I think it can also be added somewhere that commercial minus > is just the italic variant of ?division minus?. I?ll hereby argue for > this based on an old German book on ?commercial arithmetics? I have > come accross, plus what the the July 2012 discussion and what Unicode > already says about the commercial sign: > > FIRST: IDENTICAL CONTEXTS. > > German language is an important locale for the Commercial Minus. In > German, the Commercial minus is both referred to as ?kaufm?nnische > Minus(zeichen)? and as "buchhalterische Minus" (?Commercial Minus > Character? and ?Bookkeeper Minus?). And, speaking of ?division minus? > in the context I know best, Norway, we find it in advertising > (commercial context) and in book keeping documentation and taxation > forms. Simply put, what the Unicode 6.2 ?General Punctuation? section > says about Commercial Minus, can also be said about DIVISION SIGN used > as minus: ?U+2052 % commercial minus sign is used in commercial or tax > related forms or publications in several European countries, including > Germany and Scandinavia.? So, basically and for the most part, the > commercial minus and the ?division sign minus? occur in the very same > contexts, with very much the same meaning. This is a strong hint that > they are the same character. > > SECOND: GERMAN USE OF DIVISION SIGN FOR MINUS IN COMMERCIAL CONTEXT. > > Is there any proof that German used both an italics variant and a > non-italics variant of the ?division minus?? Seemingly yes. The book > ?Kaufm?nnische Arithmetik? (?Commercial arithmetics?) from 1825 by > Johann Philipp Schellenberg. By reading section 118 ?Anhang zur > Addition und Subtraction der Br?che? [?Appendix about the addition and > subtraction of fractions?]) at page 213 and onwards,[2] we can conclude > that he describes as ?commercial? use of the ? ?division minus?, where > the ? signifies a _negative remainder_ of a division (while the plus > sign is used to signify a positive remainder). Or to quote, from page > 214: ?so wird das Fehlende durch das [Zei]chen ? (minus) bemerkt, und > bei Berechn[nung der Preis der Waare abgezogen? [?then the lacking > remainder is marked with the ? (minus) and withdrawn when the price of > the commodity is calculated?]. {Note that some bits of the text are > lacking, I marked my guessed in square brackets.} I did not find (yet) > that he used the italic commercial minus, however, the context is > correct. (My guess is that the italics variant has been put to more > use, in the computer age, partly to separate it from the DIVISION SIGN > or may be simply because people started to see it often in handwriting > but seldom in print. And so would not have recognized it in the form of > the non-italic division sign.) > > THIRD: IDENTICAL INTERPRETATION > > The word ?abgezogen? in the above quote is interesting since the Code > Charts for 2052 ? COMMERCIAL MINUS cites the related German word > ?abz?glich?. And from the Swedish context, the charts quotes the > expression ?med avdrag?. English translation might be ?to be withdrawn? > or ?with subtraction/rebate [for]?. Simply put, we here see the > commercial meaning. > > WHAT ABOUT COMMERCIAL MINUS AS ?CORRECT? SIGN IN SCANDINAVIAN SCHOOLS? > > UNICODE 6.3 notes that in some European (e.g. Finnish, Swedish and > perhaps Norwegian) traditions, teachers use the Commercial Minus Sign > to signify that something is correct (whereas a red check mark is used > to signify error). If my theory is right, that commercial minus and > division sign minus are the same signs, how on earth is that possible? > How can a minus sign count as positive for the student? > > The answer is, I think, to be found in the Code Chart?s Swedish > description ("med avdrag"/"with subtraction/rebate"). Because, I think > that the correct understanding is not that it means "correct" or "OK". > Rather, it denotes something that is counted in the customer/student?s > favor. So, you could say it it really means "slack", or "rebate". So > it really mans ?good answer?. It is a ?rebate? that the student > rightfully deserves. > > FOURTH: A DEEPER MEANING > > If we look at it from a very high level, then we can say that the > division minus is used to signify something that is the result of a > calculation - such as a price, an entry in bookkeeping or, indeed, a > character/mark/point/score in a (home)work evaluated by a teacher. > Whereas the ?normal? minus sign is used to when we represent negative > data. For example, in taxation, all the numbers one reports, is the > result of some calculation. Likewise, when a teach ticks of an answer > as ?good answer?, then it is because the teacher has evaluated (a.k.a. > ?calculated?) the answer and found it to be good and that the student > has calculated correctly/well. > > CIRCUMSTANCIAL EVIDENCE > > The commercial minus looks like a percentage sign. And also, in > programming, e.g. JavaScript, the percentage sign is often used for the > modulo operator - which is an operator that finds the dividend of a > division. > > Hence, when we take all this together, I believe we have to conclude > that the COMMERCIAL MINUS is just the italic variant of the DIVISION > SIGN. > > PS: For more German documentation of this custom, it would probably be > wise to research books about bookkeeping as well as ?commercial > arithmetics?. I also have a suspicion that it would be worth > investigation contexts where modulo/division remainders operations are > found - for instance, in calendar calculations. > > [1] http://www.unicode.org/mail-arch/unicode-ml/y2012-m07/0053.html > [2] > https://archive.org/stream/kaufmnnischeari00schegoog#page/n229/mode/2up > -- > leif halvard silli > > _______________________________________________ > Unicode mailing list > Unicode at unicode.org > http://unicode.org/mailman/listinfo/unicode From asmusf at ix.netcom.com Thu Jan 16 09:24:45 2014 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Thu, 16 Jan 2014 07:24:45 -0800 Subject: Commercial minus as italic variant of division sign in German and Scandinavian context In-Reply-To: <20140116143423686172.a3f32e12@xn--mlform-iua.no> References: <20140116044305293116.f28ead07@xn--mlform-iua.no> <52D7879A.70103@ix.netcom.com> <20140116143423686172.a3f32e12@xn--mlform-iua.no> Message-ID: <52D7F9BD.8060106@ix.netcom.com> On 1/16/2014 5:34 AM, Leif Halvard Silli wrote: > Asmus, > > I am not certain that commercial minus isn?t sometimes used as italics > for the ?division sign minus?. For instance, when looking at my message > in Firefox [1], the commercial minus looks like a ?handwritten? variant > of the division sign. I think it would be entirely possible to use a > that way looking commercial minus in a Norwegian taxation formulary, > for instance. (I attach a screenshot of it.) I suspect that it is a > monospace Courier font. The screen shot indeed shows a glyph for 2052 that superficially looks like a *reverse* (!) oblique variant of the glyph for 00F7. I say "superficially" because the other distinction is the use of heavier dots. However, the fact that the "slant" is reverse, rather than forward, is contrary to the way oblique or italic fonts usually work. So, again, I find your suggestion of "italic variant" not helpful. > > Also, I wonder about the claim in the General Punctuation section that > commercial minus is used in taxation forms in Scandinavia and Germany. > I would dearly like to see the evidence for that claim. I must say that > I suspect that the use of the division sign in Norwegian taxation forms > for this purpose have been counted in a s evidence for that claim - > could it be that our ?straight commercial minus? was counted as, well, > a commercial minus? Could it be that the wish to see oneself - or us - > in the ?German tradition?, made one draw the wrong conclusion about > which character we use? I would not be surprised if the actual situation is a bit more detailed than expressed in Unicode's namelist annotations (or even the descriptions in the chapter texts). However, I can't assist you in tracking those down as I have access to no taxation forms that use any of these characters. :) > > Anyway, when I spoke if 2052 as an italic version of 00F7, I meant in > the, kind of, ?mathematical? sense: Unicode for instance contains both > MATHEMATICAL BOLD ITALIC CAPITAL A, and MATHEMATICAL BOLD CAPITAL A, > and even if they are (I believe) used for different mathematical > purposes, everyone sees and knows that they are variants of one and the > same letter - the capital A. And also, in some contexts, one might be > able to use a normal capital A instead of the mathematical ones. This is getting even less helpful. The mathematical alphabets exist, because in mathematics, you cannot substitute one shape for another without destroying the semantics (and there are general conventions about what shape to use where). The latter is similar to the uses of 00F7 and 2052 both. There are conventions where each of them is appropriate and these conventions depend on rathere selected user communities (school books, tax forms, accounting, math), just like the use of certain mathematical alphabet styles in physics may not be shared in all mathematical disciplines. Where the case for 00F7 and 2052 differs from the mathematical alphabets is that in the latter case the shape variants are (to a very large extent) accurately described by the typographical moniker. A bold is a bold. The only exception that I can think of is in the realm of "script", where some authors prefer a slightly different style that isn't tied to 18th century copperplate. > > The same knowledge is not present about 00F7 and 2052. The best would > have been if the two characters shared a similar name. For instance, if > 00F7 got an additional, synonymous name, like STRAIGHT COMMERCIAL > MINUS, or perhaps, better, COMMERCIAL HYPHEN-MINUS. Then the > relationship would be clear - or at least clearer. Like MATHEMATICAL > BOLD ITALIC CAPITAL A, and MATHEMATICAL BOLD CAPITAL A show, two > characters do not need to be 100% synonymous just because their names > only differs in a stylistic way, so to speak. Well, 00F7 is *most often* used as a division sign. Check calculator keys. > > When reading Unicode, one is only left to guess about the relationship > between 00F7 and 2052. For instance, 2052 is described in the general > punctuation - and distinguished there from the ?normal? minus and > hyphen-minus, whereas 0057 is not described there. A sentence, there, > that said that, in some countries, it is actually the 00F7 and not the > 2052, that is used, would be very helpful and enlightenting. Likewise, > there is no description of 00F7 amongst the dashes/hyphens. Suggest better text for the book chapter that details the precise places that have been established as using 00F7 in the capacity of "minus sign". That would be more helpful than trying to somehow treat 00F7 and 2052 as glyphic variants of each other. They are separate characters, with distinct usage conventions that simply happen to employ both a line and two dots. (The fallback of ./. for 2052 is interesting in this context). > > You wrote: > >> Further, while italic (as well as oblique fonts) tend to slant the letter >> forms, there's not a universal, established practice of turning horizontal >> dashes into slashes to mark the alternation between roman and >> italic fonts. From that perspective, considering one the "italic" >> variant of the other also appears to be a non-starter. > Right. And I can only underline once more that I meant ?italic? as part > of the name, see above. Actually, as I wrote at the top, you'd need "reverse italic" and in general, trying to establish this relation is a red herring. It does not improve the user experience. > > You: > >> However, it seems to be possible to establish that these two >> characters are indeed rather close variants: [?] > Indeed. Less close than it appears, because when I wrote this I did not include the notion of the most common use of 00F7, which is indeed for DIVISION. I was focused only at the minority use of 00F7 as a minus sign, in which case it and 2052 AND 002D and 2012 all function as variants of each other (but not as glyphic variants --- they are spelling variants). > >> The choice of variant, though, is driven by context and tradition >> for a given type of document, not by choice of font style. >> And, the choice of using 2052 instead of hyphen-minus or minus >> is deliberate and conscious, making it an alternate spelling rather >> than an alternate "glyph". > Well, yes. Because it's spelling, the "italic" is a red herring. >> If 00F7 can be used to stand in as a marked 2011, as claimed in >> the Unicode namelist annotation then that use is clearly NOT >> as a variant of 2052, because 2011 does not have >> any connotations of negation. > It is an argument for seeing 00F7 as (also) a hyphen-minus variant, no? Once you get into the dashes, there's tons of variant usage. What's documented in Unicode tends to be from predominantly English-language style manuals, but if you extend this to all publications in all (Western) languages including recent historic times, I'm sure you'd find surprising variations. For quotation marks we ran this to earth and the story is truly complex. > >> That means the semantic >> relations between 00F7 and 2052 only partially overlap, which >> is yet another indication that thinking of one as a font-style >> variant of the other is not particularly helpful - even if the >> ultimate origin may have derived from the same sign. >> >> At this stage of the game, they are properly disunified, >> just as i and j or u and v. > I am not really arguing for their unification - which anyhow is > impossible, if I have understood the stability rules of Unicode. > (Whereas an *additional* name is not ruled out, if I got it right.) I > am ?only? arguing that Unicode takes information that clearly links the > two together. As it is today, no one seems to realize how commercial > minus relates to ?division sign minus?. "additional" names are ruled out - except to fix something that's badly broken. Neither of these characters has names that are misleading, mistyped or both. There are many characters with deep relations that many users do no know about. And, in this case, there seem to be some issues with the precise relation you are trying to implement. A./ > > > [1] http://unicode.org/pipermail/unicode/2014-January/000013.html > [2] attachment of the file ?screenshot-of-minuses.png" > > > > Leif Halvard Silli > > Asmus Freytag, Wed, 15 Jan 2014 23:17:46 -0800: >> I find it unhelpful to consider 2052 as the italic variant of 00F7, and >> further find the "evidence" for that not all that germane. >> >> Both are variants of the "-" sign, and so ipso facto are variants of >> each other. >> >> However, to identify something as "italic" to me would require that >> one form is used in the context of italic fonts, while the other is not. >> >> I cannot see anything supporting that interpretation in the "evidence" >> adduced below. >> >> On the contrary, you would expect both forms available in sans-serif >> and typewriter fonts (those being perhaps the most common for >> accounting), and perhaps also roman. >> >> Further, while italic (as well as oblique fonts) tend to slant the letter >> forms, there's not a universal, established practice of turning horizontal >> dashes into slashes to mark the alternation between roman and >> italic fonts. From that perspective, considering one the "italic" >> variant of the other also appears to be a non-starter. >> >> However, it seems to be possible to establish that these two >> characters are indeed rather close variants: both are used >> to visually emphasize the minus sign by means of decorating >> it with a pair of dots. And both are employed in situations that >> are have a large semantic overlap. (Not surprisingly, because their >> meaning is based on the minus sign). >> >> The choice of variant, though, is driven by context and tradition >> for a given type of document, not by choice of font style. >> And, the choice of using 2052 instead of hyphen-minus or minus >> is deliberate and conscious, making it an alternate spelling rather >> than an alternate "glyph". >> >> If 00F7 can be used to stand in as a marked 2011, as claimed in >> the Unicode namelist annotation then that use is clearly NOT >> as a variant of 2052, because 2011 does not have >> any connotations of negation. That means the semantic >> relations between 00F7 and 2052 only partially overlap, which >> is yet another indication that thinking of one as a font-style >> variant of the other is not particularly helpful - even if the >> ultimate origin may have derived from the same sign. >> >> At this stage of the game, they are properly disunified, >> just as i and j or u and v. >> >> A./ >> >> >> >> >> On 1/15/2014 7:43 PM, Leif Halvard Silli wrote: >>> Thanks to our discussion in July 2012,[1] the Unicode code charts now >>> says, about 00F7 ? DIVISION SIGN, this: >>> >>> ?? occasionally used as an alternate, more visually >>> distinct version of 2212 ? {MINUS SIGN} or 2011 ? >>> {NON-BREAKING HYPHEN} in some contexts >>> [? snip ?] >>> ? 2052 ? commercial minus sign? >>> >>> However, I think it can also be added somewhere that commercial minus >>> is just the italic variant of ?division minus?. I?ll hereby argue for >>> this based on an old German book on ?commercial arithmetics? I have >>> come accross, plus what the the July 2012 discussion and what Unicode >>> already says about the commercial sign: >>> >>> FIRST: IDENTICAL CONTEXTS. >>> >>> German language is an important locale for the Commercial Minus. In >>> German, the Commercial minus is both referred to as ?kaufm?nnische >>> Minus(zeichen)? and as "buchhalterische Minus" (?Commercial Minus >>> Character? and ?Bookkeeper Minus?). And, speaking of ?division minus? >>> in the context I know best, Norway, we find it in advertising >>> (commercial context) and in book keeping documentation and taxation >>> forms. Simply put, what the Unicode 6.2 ?General Punctuation? section >>> says about Commercial Minus, can also be said about DIVISION SIGN used >>> as minus: ?U+2052 % commercial minus sign is used in commercial or tax >>> related forms or publications in several European countries, including >>> Germany and Scandinavia.? So, basically and for the most part, the >>> commercial minus and the ?division sign minus? occur in the very same >>> contexts, with very much the same meaning. This is a strong hint that >>> they are the same character. >>> >>> SECOND: GERMAN USE OF DIVISION SIGN FOR MINUS IN COMMERCIAL CONTEXT. >>> >>> Is there any proof that German used both an italics variant and a >>> non-italics variant of the ?division minus?? Seemingly yes. The book >>> ?Kaufm?nnische Arithmetik? (?Commercial arithmetics?) from 1825 by >>> Johann Philipp Schellenberg. By reading section 118 ?Anhang zur >>> Addition und Subtraction der Br?che? [?Appendix about the addition and >>> subtraction of fractions?]) at page 213 and onwards,[2] we can conclude >>> that he describes as ?commercial? use of the ? ?division minus?, where >>> the ? signifies a _negative remainder_ of a division (while the plus >>> sign is used to signify a positive remainder). Or to quote, from page >>> 214: ?so wird das Fehlende durch das [Zei]chen ? (minus) bemerkt, und >>> bei Berechn[nung der Preis der Waare abgezogen? [?then the lacking >>> remainder is marked with the ? (minus) and withdrawn when the price of >>> the commodity is calculated?]. {Note that some bits of the text are >>> lacking, I marked my guessed in square brackets.} I did not find (yet) >>> that he used the italic commercial minus, however, the context is >>> correct. (My guess is that the italics variant has been put to more >>> use, in the computer age, partly to separate it from the DIVISION SIGN >>> or may be simply because people started to see it often in handwriting >>> but seldom in print. And so would not have recognized it in the form of >>> the non-italic division sign.) >>> >>> THIRD: IDENTICAL INTERPRETATION >>> >>> The word ?abgezogen? in the above quote is interesting since the Code >>> Charts for 2052 ? COMMERCIAL MINUS cites the related German word >>> ?abz?glich?. And from the Swedish context, the charts quotes the >>> expression ?med avdrag?. English translation might be ?to be withdrawn? >>> or ?with subtraction/rebate [for]?. Simply put, we here see the >>> commercial meaning. >>> >>> WHAT ABOUT COMMERCIAL MINUS AS ?CORRECT? SIGN IN SCANDINAVIAN SCHOOLS? >>> >>> UNICODE 6.3 notes that in some European (e.g. Finnish, Swedish and >>> perhaps Norwegian) traditions, teachers use the Commercial Minus Sign >>> to signify that something is correct (whereas a red check mark is used >>> to signify error). If my theory is right, that commercial minus and >>> division sign minus are the same signs, how on earth is that possible? >>> How can a minus sign count as positive for the student? >>> >>> The answer is, I think, to be found in the Code Chart?s Swedish >>> description ("med avdrag"/"with subtraction/rebate"). Because, I think >>> that the correct understanding is not that it means "correct" or "OK". >>> Rather, it denotes something that is counted in the customer/student?s >>> favor. So, you could say it it really means "slack", or "rebate". So >>> it really mans ?good answer?. It is a ?rebate? that the student >>> rightfully deserves. >>> >>> FOURTH: A DEEPER MEANING >>> >>> If we look at it from a very high level, then we can say that the >>> division minus is used to signify something that is the result of a >>> calculation - such as a price, an entry in bookkeeping or, indeed, a >>> character/mark/point/score in a (home)work evaluated by a teacher. >>> Whereas the ?normal? minus sign is used to when we represent negative >>> data. For example, in taxation, all the numbers one reports, is the >>> result of some calculation. Likewise, when a teach ticks of an answer >>> as ?good answer?, then it is because the teacher has evaluated (a.k.a. >>> ?calculated?) the answer and found it to be good and that the student >>> has calculated correctly/well. >>> >>> CIRCUMSTANCIAL EVIDENCE >>> >>> The commercial minus looks like a percentage sign. And also, in >>> programming, e.g. JavaScript, the percentage sign is often used for the >>> modulo operator - which is an operator that finds the dividend of a >>> division. >>> >>> Hence, when we take all this together, I believe we have to conclude >>> that the COMMERCIAL MINUS is just the italic variant of the DIVISION >>> SIGN. >>> >>> PS: For more German documentation of this custom, it would probably be >>> wise to research books about bookkeeping as well as ?commercial >>> arithmetics?. I also have a suspicion that it would be worth >>> investigation contexts where modulo/division remainders operations are >>> found - for instance, in calendar calculations. >>> >>> [1] http://www.unicode.org/mail-arch/unicode-ml/y2012-m07/0053.html >>> [2] >>> https://archive.org/stream/kaufmnnischeari00schegoog#page/n229/mode/2up -------------- next part -------------- An HTML attachment was scrubbed... URL: From xn--mlform-iua at xn--mlform-iua.no Thu Jan 16 10:12:17 2014 From: xn--mlform-iua at xn--mlform-iua.no (Leif Halvard Silli) Date: Thu, 16 Jan 2014 17:12:17 +0100 Subject: Commercial minus as italic variant of division sign in German and Scandinavian context In-Reply-To: <52D7F9BD.8060106@ix.netcom.com> References: <20140116044305293116.f28ead07@xn--mlform-iua.no> <52D7879A.70103@ix.netcom.com> <20140116143423686172.a3f32e12@xn--mlform-iua.no> <52D7F9BD.8060106@ix.netcom.com> Message-ID: <20140116171217417407.024080bd@xn--mlform-iua.no> Asmus Freytag, Thu, 16 Jan 2014 07:24:45 -0800: > On 1/16/2014 5:34 AM, Leif Halvard Silli wrote: >> when looking at my message in Firefox [1], the commercial minus >> looks like a ?handwritten? variant of the division sign. > the fact that the "slant" is reverse, rather than forward, > is contrary to the way oblique or italic fonts usually work. > > So, again, I find your suggestion of "italic variant" not helpful. Got it. ;-) Will stop using "italic" about it! Meanwhile, I think there *is* something to say about the slant, the slant does seem to be primarily linked to *style*. Just now, at colourbox.de, I found some vector icons which are simply labelled as minus icons, and which both of them are shaped like the DIVISION SIGN, and which occurs side by side with a plus sign. The labels for the icons are simply ?Icon - minus - schwarz wei?? and ?Icon - minus - hellblau?. See: You find it in Google if you search for ?kaufm?nnische Minuszeichen?. Take that as a hint. >> Also, I wonder about the claim in the General Punctuation section that >> commercial minus is used in taxation forms in Scandinavia and Germany. [?] > I would not be surprised if the actual situation is a bit more > detailed than expressed in Unicode's namelist annotations (or > even the descriptions in the chapter texts). > > However, I can't assist you in tracking those down as I have access > to no taxation forms that use any of these characters. :) :-) >> Anyway, when I spoke if 2052 as an italic version of 00F7, I meant in >> the, kind of, ?mathematical? sense: [?] > Where the case for 00F7 and 2052 differs from the mathematical alphabets is > that in the latter case the shape variants are (to a very large > extent) accurately described by the typographical moniker. A bold is a bold. > > The only exception that I can think of is in the realm of "script", > where some authors prefer a slightly different style that isn't tied > to 18th century copperplate. And by script you mean "handwriting style". That makes sense. That is how I perceive the German, commercial minus. > Suggest better text for the book chapter that details the precise > places that have been established as using 00F7 in the capacity > of "minus sign". That would be more helpful than trying to somehow > treat 00F7 and 2052 as glyphic variants of each other. They are > separate characters, with distinct usage conventions that simply > happen to employ both a line and two dots. (The fallback of ./. for > 2052 is interesting in this context). Ok. Will try. Though I think better text would tie them, rather than separate them. But I think you are artificially separating them. > I was focused only at the minority use of 00F7 as a minus sign, in > which case > it and 2052 AND 002D and 2012 all function as variants of each other (but > not as glyphic variants --- they are spelling variants). Good point. It is like the V and U - they have a common history. >> It is an argument for seeing 00F7 as (also) a hyphen-minus variant, no? > Once you get into the dashes, there's tons of variant usage. What's > documented in Unicode tends to be from predominantly English-language > style manuals, but if you extend this to all publications in all > (Western) languages including recent historic times, I'm sure you'd > find surprising variations. Does the Unicode spec say this - that is is predominantly English language based? >> As it is today, no one seems to realize how commercial >> minus relates to ?division sign minus?. > "additional" names are ruled out - except to fix something that's > badly broken. > Neither of these characters has names that are misleading, mistyped or both. > > There are many characters with deep relations that many users do no know > about. And, in this case, there seem to be some issues with the precise > relation you are trying to implement. I saw it as if in a mist. Now it becomes clearer and clearer to me. :-) This fun page indicates that the ./. ?fallback? has a 35 year history. http://www.wertpapier-forum.de/topic/14587-kennzahlenanalyse/page__st__20 Which could fit well together with a theory that the script variant grew in popularity when the ?international? ? division sign of computers entered German math. That ? as minus ?went back? due to computers and calculators, seems to be the general trend. -- leif halvard silli From asmusf at ix.netcom.com Thu Jan 16 11:19:02 2014 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Thu, 16 Jan 2014 09:19:02 -0800 Subject: Commercial minus as italic variant of division sign in German and Scandinavian context In-Reply-To: <20140116171217417407.024080bd@xn--mlform-iua.no> References: <20140116044305293116.f28ead07@xn--mlform-iua.no> <52D7879A.70103@ix.netcom.com> <20140116143423686172.a3f32e12@xn--mlform-iua.no> <52D7F9BD.8060106@ix.netcom.com> <20140116171217417407.024080bd@xn--mlform-iua.no> Message-ID: <52D81486.6010700@ix.netcom.com> On 1/16/2014 8:12 AM, Leif Halvard Silli wrote: > Asmus Freytag, Thu, 16 Jan 2014 07:24:45 -0800: >> On 1/16/2014 5:34 AM, Leif Halvard Silli wrote: >>> when looking at my message in Firefox [1], the commercial minus >>> looks like a ?handwritten? variant of the division sign. >> the fact that the "slant" is reverse, rather than forward, >> is contrary to the way oblique or italic fonts usually work. >> >> So, again, I find your suggestion of "italic variant" not helpful. > Got it. ;-) Will stop using "italic" about it! OK. I'll hold you to it. > Meanwhile, I think there > *is* something to say about the slant, the slant does seem to be > primarily linked to *style*. Style in Unicode is used on two ways. A) to indicate that a distinction is glyphic and can be ignored B) to indicate that a glyph shape relates to a typographical style A is wrong for 00F7 vs 2012 vs 2052. The distinctions are deliberate and authors (and readers) would take exception if you substituted another "style" of symbol. The fact is that these are not simply accidental but correct disunifications. In a sense, it's no different from "z" being used for the soft-s in English (if not exclusively), "s" being used for both soft and hard s in German and never being used for soft-s in Scandinavia. When Unicode says it encodes the "semantics" of a character, it doesn't mean that these semantics can't be context sensitive or that different contexts can't call for different characters for the same semantics. (In the minus case we are talking mathematical semantics, while in the letter case we are talking phonetics, but otherwise there's not a whole lot of distinction in the context sensitive nature of character use). The most useful concept (I have found) in these kinds of investigations is "character identity". Here it is clear that something like 00F7 that can mean both division and minus (based on context) has a different identity from 2012 or 2052 that (in math use) can only mean minus. And 2052 is different from 2012 in that it is limited to certain contexts, and 2012 cannot be used in marking papers. So, just acknowledge that, and if you feel the need to add value, do so by better descriptions of which context which character is used in. B is relevant for math alphabets, because the glyphs really are constrained to match a typographical style. It's not relevant to the case here, because 2052 is not a specific "style" of the "same thing in another font". > Just now, at colourbox.de, I found some > vector icons which are simply labelled as minus icons, and which both > of them are shaped like the DIVISION SIGN, and which occurs side by > side with a plus sign. The labels for the icons are simply ?Icon - > minus - schwarz wei?? and ?Icon - minus - hellblau?. See: > > > You find it in Google if you search for ?kaufm?nnische Minuszeichen?. > Take that as a hint. This could be for two reasons. A) there is some use where 00F7 has the semantics of minus. B) the icon is misnamed in the source because of the visual similarity with a minus Unfortunately, by itself, you can't use that source to distinguish A from B. > >>> Anyway, when I spoke if 2052 as an italic version of 00F7, I meant in >>> the, kind of, ?mathematical? sense: [?] >> Where the case for 00F7 and 2052 differs from the mathematical alphabets is >> that in the latter case the shape variants are (to a very large >> extent) accurately described by the typographical moniker. A bold is a bold. >> >> The only exception that I can think of is in the realm of "script", >> where some authors prefer a slightly different style that isn't tied >> to 18th century copperplate. > And by script you mean "handwriting style". That makes sense. That is > how I perceive the German, commercial minus. It may be derived from a handwritten mark - most accounting wasn't typeset - but the exception that I was referring to are Knuth's "Euler" fonts which he uses instead of "script" in his mathematical works. Their ductus retains just faint traces of handwriting, and none of the elaborate styles of handwriting that typical "script" fonts are based on, but they serve their purpose in mathematics (unless you are a purist) because they are distinct from all the other styles and arguably a bit more readable. Your applying my comment to 2052 is taking it wildly out of context. > >> Suggest better text for the book chapter that details the precise >> places that have been established as using 00F7 in the capacity >> of "minus sign". That would be more helpful than trying to somehow >> treat 00F7 and 2052 as glyphic variants of each other. They are >> separate characters, with distinct usage conventions that simply >> happen to employ both a line and two dots. (The fallback of ./. for >> 2052 is interesting in this context). > Ok. Will try. Though I think better text would tie them, rather than > separate them. But I think you are artificially separating them. I am arguing that they have a distinct "identity". That doesn't mean that their usage can't overlap. (That's what you tend to think of as "ties".) I think it less helpful to consider the characters "tied" than to describe the usage. > >> I was focused only at the minority use of 00F7 as a minus sign, in >> which case >> it and 2052 AND 002D and 2012 all function as variants of each other (but >> not as glyphic variants --- they are spelling variants). > Good point. It is like the V and U - they have a common history. The U and V historically derive from the same letter. 00F7 and 2052 use the same elements in a different configuration. That's ALL that we know about them, unless you have additional research. Asserting a derivation is complete speculation at this point. What we can attest is that ./. is a typewriter-supported (if not caused) variant of 2052 (the exact elevation of the initial dot may have varied in hand writing as a "free variation", but the typewriter could only do the period). We cannot attest that ./. is a variant of 00F7 or that 2052 was ever a free variant of 00F7. In today's usage, the selection depends on context (user group, target audience) and is not a free variant. > >>> It is an argument for seeing 00F7 as (also) a hyphen-minus variant, no? >> Once you get into the dashes, there's tons of variant usage. What's >> documented in Unicode tends to be from predominantly English-language >> style manuals, but if you extend this to all publications in all >> (Western) languages including recent historic times, I'm sure you'd >> find surprising variations. > Does the Unicode spec say this - that is is predominantly English > language based? It goes without saying that authors working in English have easier access to manuals in that language. It's not intentional, but if you've been around you would find that in many cases, usage information from other languages has tended to be incorporated as changes to the original text, not from the start. So, go ahead and add more. > >>> As it is today, no one seems to realize how commercial >>> minus relates to ?division sign minus?. >> "additional" names are ruled out - except to fix something that's >> badly broken. >> Neither of these characters has names that are misleading, mistyped or both. >> >> There are many characters with deep relations that many users do no know >> about. And, in this case, there seem to be some issues with the precise >> relation you are trying to implement. > I saw it as if in a mist. Now it becomes clearer and clearer to me. :-) > > This fun page indicates that the ./. ?fallback? has a 35 year history. > http://www.wertpapier-forum.de/topic/14587-kennzahlenanalyse/page__st__20 No, it says that the history goes back *at least* 35 years. This figure is probably based on somebody's earliest *personal* recollection, not historical search, and 35 years tends to span a professional lifetime. > Which could fit well together with a theory that the script variant > grew in popularity when the ?international? ? division sign of > computers entered German math. That ? as minus ?went back? due to > computers and calculators, seems to be the general trend. That, my friend, is utter and pure nonsense. I would call it an urban legend in the making. Instead of "mists" you are creating "myths" here, from whole cloth, no less. Cheers, A./ From samjnaa at gmail.com Tue Jan 21 06:48:26 2014 From: samjnaa at gmail.com (Shriramana Sharma) Date: Tue, 21 Jan 2014 18:18:26 +0530 Subject: Offlist UniView mini-app Message-ID: Since I have connectivity problems now and then, I wrote a mini-app using PyQt to give me the basic features of Ishida's UniView (which also seems to have had some server problems recently)... Maybe it would be useful to others also so I'm posting here. It's under the GPL since I use PyQt under the GPL. Since it depends on PyQt, it is probably immediately usable by Linux users, esp. who use distros which have PyQt pre-installed or installable by a single command like apt-get or yum. On other platforms, you'll have to have installed Python and PyQt as appropriate... BTW I use Py3, so maybe a few tweaks would be needed to get it working with Py2. Since it's GPL, please feel free to make derivatives. I hope the name "UniView" is not copyrighted or anything. Certainly don't intend to infringe... -- Shriramana Sharma ???????????? ???????????? -------------- next part -------------- A non-text attachment was scrubbed... Name: uniview.py Type: text/x-python Size: 6386 bytes Desc: not available URL: From stephan.stiller at gmail.com Wed Jan 22 02:38:25 2014 From: stephan.stiller at gmail.com (Stephan Stiller) Date: Wed, 22 Jan 2014 00:38:25 -0800 Subject: Egyptian Demotic Message-ID: <52DF8381.5080804@gmail.com> Hi all, Is Egyptian Demotic on somebody's roadmap for Unicode? (Egyptian Demotic is what's on the middle third of the Rosetta Stone.) Stephan From frederic.grosshans at gmail.com Wed Jan 22 07:48:05 2014 From: frederic.grosshans at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Grosshans?=) Date: Wed, 22 Jan 2014 14:48:05 +0100 Subject: Egyptian Demotic In-Reply-To: <52DF8381.5080804@gmail.com> References: <52DF8381.5080804@gmail.com> Message-ID: <52DFCC15.20200@gmail.com> An HTML attachment was scrubbed... URL: From samjnaa at gmail.com Wed Jan 22 23:52:12 2014 From: samjnaa at gmail.com (Shriramana Sharma) Date: Thu, 23 Jan 2014 11:22:12 +0530 Subject: Offlist UniView mini-app In-Reply-To: References: Message-ID: Not sure if anyone actually tried this app, but just wanted to notify that I found a small bug. To correct it, insert "4 < " after "elif " on line 15. Shriramana. -------------- next part -------------- An HTML attachment was scrubbed... URL: From leob at mailcom.com Thu Jan 23 00:39:58 2014 From: leob at mailcom.com (Leo Broukhis) Date: Wed, 22 Jan 2014 22:39:58 -0800 Subject: Another Unicode viewing site Message-ID: I find http://unicode-table.com/ of which I cannot find a previous mention on the list, quite convenient (keep scrolling). Not all of Unicode 6.0 and 6.1 is there yet, though, as it is a hobby project of a multi-national team. Interface languages include English, German, Russian, Ukrainian, Chinese, and Thai. Leo -------------- next part -------------- An HTML attachment was scrubbed... URL: From boldewyn at gmail.com Thu Jan 23 08:19:50 2014 From: boldewyn at gmail.com (Manuel Strehl) Date: Thu, 23 Jan 2014 15:19:50 +0100 Subject: Another Unicode viewing site In-Reply-To: References: Message-ID: Yes, they have the huge advantage over my http://codepoints.net, that they have a team providing already so many translations. I envy them for that a bit. But competition is good for business. :-) Cheers, Manuel 2014/1/23 Leo Broukhis > I find http://unicode-table.com/ of which I cannot find a previous > mention on the list, quite convenient (keep scrolling). Not all of Unicode > 6.0 and 6.1 is there yet, though, as it is a hobby project of a > multi-national team. > Interface languages include English, German, Russian, Ukrainian, Chinese, > and Thai. > > Leo > > _______________________________________________ > Unicode mailing list > Unicode at unicode.org > http://unicode.org/mailman/listinfo/unicode > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From samjnaa at gmail.com Thu Jan 23 10:50:49 2014 From: samjnaa at gmail.com (Shriramana Sharma) Date: Thu, 23 Jan 2014 22:20:49 +0530 Subject: Offlist UniView mini-app In-Reply-To: <52E1472F.5040501@behdad.org> References: <52E1472F.5040501@behdad.org> Message-ID: On Thu, Jan 23, 2014 at 10:15 PM, Behdad Esfahbod wrote: > lol. How about you post on github at least? > OK good idea. Any objections to re-using the name UniView? I suppose Ishida is on either of these lists. I would like to hear from him especially. -- Shriramana Sharma ???????????? ???????????? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ishida at w3.org Thu Jan 23 12:00:01 2014 From: ishida at w3.org (Richard Ishida) Date: Thu, 23 Jan 2014 18:00:01 +0000 Subject: Offlist UniView mini-app In-Reply-To: References: <52E1472F.5040501@behdad.org> Message-ID: <52E158A1.3080005@w3.org> Well, I would prefer you don't use the name UniView, since that would create confusion. The reason my UniView (and UniView lite) tool is currently unavailable is that my site was hacked a week or so ago and I'm rebuilding it online. I'm still working on a solution for hosting the pages that need to run in PHP, but I expect UniView to back in operation soon. Thank you. RI On 23/01/2014 16:50, Shriramana Sharma wrote: > On Thu, Jan 23, 2014 at 10:15 PM, Behdad Esfahbod > wrote: > > lol. How about you post on github at least? > > > OK good idea. Any objections to re-using the name UniView? I suppose > Ishida is on either of these lists. I would like to hear from him > especially. > > -- > Shriramana Sharma ???????????? ???????????? > > > _______________________________________________ > Unicode mailing list > Unicode at unicode.org > http://unicode.org/mailman/listinfo/unicode > From samjnaa at gmail.com Thu Jan 23 12:01:25 2014 From: samjnaa at gmail.com (Shriramana Sharma) Date: Thu, 23 Jan 2014 23:31:25 +0530 Subject: Offlist UniView mini-app In-Reply-To: <52E158A1.3080005@w3.org> References: <52E1472F.5040501@behdad.org> <52E158A1.3080005@w3.org> Message-ID: On Thu, Jan 23, 2014 at 11:30 PM, Richard Ishida wrote: > Well, I would prefer you don't use the name UniView, since that would > create confusion. > OK thanks for that. I am not particular about the name -- though it is quite apt. Will think of something else... Just something bland like "Codepoint Viewer" would do I suppose... -- Shriramana Sharma ???????????? ???????????? -------------- next part -------------- An HTML attachment was scrubbed... URL: From samjnaa at gmail.com Thu Jan 23 12:26:53 2014 From: samjnaa at gmail.com (Shriramana Sharma) Date: Thu, 23 Jan 2014 23:56:53 +0530 Subject: Offlist UniView mini-app In-Reply-To: <52E1472F.5040501@behdad.org> References: <52E1472F.5040501@behdad.org> Message-ID: On Thu, Jan 23, 2014 at 10:15 PM, Behdad Esfahbod wrote: > lol. How about you post on github at least? > Thanks for the encouragement. I didn't think it would be *that* important to do that. Please visit now: https://github.com/jamadagni/cpview -- Shriramana Sharma ???????????? ???????????? -------------- next part -------------- An HTML attachment was scrubbed... URL: From johannes at bergerhausen.com Fri Jan 24 07:55:07 2014 From: johannes at bergerhausen.com (Johannes Bergerhausen) Date: Fri, 24 Jan 2014 14:55:07 +0100 Subject: Another Unicode viewing site In-Reply-To: References: Message-ID: <9B1C6C9E-CE67-436B-A990-98F1B98F011C@bergerhausen.com> We are working on an update of decodeunicode.org Johannes From verdy_p at wanadoo.fr Fri Jan 24 11:14:54 2014 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Fri, 24 Jan 2014 18:14:54 +0100 Subject: Another Unicode viewing site In-Reply-To: References: Message-ID: The bad thung is that the whole BMP is loaded in a giant HTML table encoded in a very unefficient way, but worse, everything is using Webfonts of 4K glyphs, the page takes a lot of memory with those temporary fonts. The webfonts do not seem to load dynamically on demand. Strane becaise the page is also full of Javascript, and Javascript would have just loaded the necessary webfonts on demand, and would have generated the page on the flow, with just enough rows to fit the screen and still the possibility to scroll the table, without leaving all those Webfonts active in the document. Javascript could also have detected suitable fonts already existing on the PC with the browser, and the page would have been much lighter. 2014/1/23 Leo Broukhis > I find http://unicode-table.com/ of which I cannot find a previous > mention on the list, quite convenient (keep scrolling). Not all of Unicode > 6.0 and 6.1 is there yet, though, as it is a hobby project of a > multi-national team. > Interface languages include English, German, Russian, Ukrainian, Chinese, > and Thai. > > Leo > > _______________________________________________ > Unicode mailing list > Unicode at unicode.org > http://unicode.org/mailman/listinfo/unicode > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From leob at mailcom.com Fri Jan 24 22:31:37 2014 From: leob at mailcom.com (Leo Broukhis) Date: Fri, 24 Jan 2014 20:31:37 -0800 Subject: Another Unicode viewing site In-Reply-To: References: Message-ID: Hi Philippe, I have no relation to the project; you may want to leave your feedback directly on the site. Leo On Fri, Jan 24, 2014 at 9:14 AM, Philippe Verdy wrote: > The bad thung is that the whole BMP is loaded in a giant HTML table > encoded in a very unefficient way, but worse, everything is using Webfonts > of 4K glyphs, the page takes a lot of memory with those temporary fonts. > > The webfonts do not seem to load dynamically on demand. Strane becaise the > page is also full of Javascript, and Javascript would have just loaded the > necessary webfonts on demand, and would have generated the page on the > flow, with just enough rows to fit the screen and still the possibility to > scroll the table, without leaving all those Webfonts active in the document. > Javascript could also have detected suitable fonts already existing on the > PC with the browser, and the page would have been much lighter. > > > > 2014/1/23 Leo Broukhis > >> I find http://unicode-table.com/ of which I cannot find a previous >> mention on the list, quite convenient (keep scrolling). Not all of Unicode >> 6.0 and 6.1 is there yet, though, as it is a hobby project of a >> multi-national team. >> Interface languages include English, German, Russian, Ukrainian, Chinese, >> and Thai. >> >> Leo >> >> _______________________________________________ >> Unicode mailing list >> Unicode at unicode.org >> http://unicode.org/mailman/listinfo/unicode >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kojiishi at gluesoft.co.jp Mon Jan 27 19:18:15 2014 From: kojiishi at gluesoft.co.jp (Koji Ishii) Date: Tue, 28 Jan 2014 01:18:15 +0000 Subject: [css-writing-modes] nit In-Reply-To: <5296D692.2020201@css-class.com> References: <52957E08.1060000@inkedblade.net> <52960D39.5050309@ix.netcom.com> <5296D692.2020201@css-class.com> Message-ID: > Possibly all notes and issues and use overflow:auto since any widths > also applied to the notes and issues may end up having things hidden. Thank you Alan for the suggestion, fixed in the editor's draft[1]. [1] http://dev.w3.org/csswg/css-writing-modes/ From kojiishi at gluesoft.co.jp Mon Jan 27 19:34:38 2014 From: kojiishi at gluesoft.co.jp (Koji Ishii) Date: Tue, 28 Jan 2014 01:34:38 +0000 Subject: [CSSWG][css-writing-modes] Last Call for Comments on CSS3 Writing Modes In-Reply-To: References: Message-ID: <5C1870EC-0ED7-400A-A469-FB6635D4FEB1@gluesoft.co.jp> On Dec 21, 2013, at 20:39, CE Whitehead > wrote: 4.3 "alphabetic The alphabetic baseline is assumed to be at the under margin edge. "central The central baseline is assumed to be halfway between the under and over margin edges of the box. " => "alphabetic The alphabetic baseline is assumed to be at the under-margin edge. "central The central baseline is assumed to be halfway between the under- and over-margin edges of the box. " {COMMENT: normally when you use two words to modify a single word, as when "under margin", "over margin" modify the word, "edge" or "edges", then it is customary to join the two modifying words with a hyphen.} Fixed. 6.2 inline-start "Nominally the side from which text of its inline base direction will start. For boxes with a used direction value of ltr, this means the line-left side. For boxes with a used direction value of rtl, this means the line-right side. " => "The side of a box from which text will start. For boxes with a used direction value of ltr, this means the line-left side. For boxes with a used direction value of rtl, this means the line-right side. " ? {COMMENT: This text is unclear to me; not sure what you mean by "its" -- the box's?; I am not sure thus how to reword "inline base direction" -- so I left this phrase out though you probably need something. Also do you need to say "Nominally"? Because "nominally" does not mean anything to me in this sentence, though normally "nominally" is defined as "in name" -- but I cannot see saying this here; it just seems to not be the right word. Also finally, and I know this is a dumb question, but why can the inline--start never be at the top or the bottom, when the lines run top-to-bottom or bottom-to-top? The diagram seems to suggest that inline-start can be at the bottom or top.} Please allow me to work on this later. 6.2 second paragraph (after the list of four "flow-relative directions" -- block-end, block-start, etc.) "Where unambiguous (or dual-meaning), the terms start and end are used in place of block-start/inline-start and block-end/inline-end, respectively." {COMMENT: "unambiguous" is the opposite of "dual-meaning" -- "dual meaning" means "ambiguous"; do you mean the following? (if so it's o.k. to eliminate the stuff in parentheses altogether):} Fixed. 6.3 Line-relative directions Figure 15, Figure 16 {COMMENT: is it possible to have more space between these two figures?} Fixed. /koji -------------- next part -------------- An HTML attachment was scrubbed... URL: From jknappen at web.de Wed Jan 29 08:59:43 2014 From: jknappen at web.de (=?UTF-8?Q?=22J=C3=B6rg_Knappen=22?=) Date: Wed, 29 Jan 2014 15:59:43 +0100 (CET) Subject: Aw: Re: Re: Re: Re: Re: Do you know a tool to decode "UTF-8 twice" In-Reply-To: References: <20131028113410.wIOIFPvYvi4rX2/05EAm1cSW@dietcurd.local> <20131028162337.ZLPXM/tUQIjeRcuZdnxGd5T6@dietcurd.local> <20131029100950.q993HcL4O6e8kWFFZojh+81p@dietcurd.local> <527118E0.90501@gmail.com> <52712C94.7040102@gmail.com> <52713A3D.4090306@gmail.com>, Message-ID: An HTML attachment was scrubbed... URL: From buck at yelp.com Wed Jan 29 12:21:55 2014 From: buck at yelp.com (Buck Golemon) Date: Wed, 29 Jan 2014 10:21:55 -0800 Subject: Do you know a tool to decode "UTF-8 twice" In-Reply-To: References: <20131028113410.wIOIFPvYvi4rX2/05EAm1cSW@dietcurd.local> <20131028162337.ZLPXM/tUQIjeRcuZdnxGd5T6@dietcurd.local> <20131029100950.q993HcL4O6e8kWFFZojh+81p@dietcurd.local> <527118E0.90501@gmail.com> <52712C94.7040102@gmail.com> <52713A3D.4090306@gmail.com> Message-ID: J?rg: This is the definition of cp1252 used by the whatwg and all current browser implementations. I've appealed to the cp1252 maintainer to update the definition so that we don't have two competing standards, but I was rejected. I've been considering naming it cp1252-whatwg. On Wed, Jan 29, 2014 at 6:59 AM, "J?rg Knappen" wrote: > A little postscrptum to this old thread: > > On pyPi, there is now a codec available that handles the peculiar > definition of "latin1" inside mysql. > The package is called mysql-latin1-codec and features an encoding > consisting of cp1252 plus > 0x81, 0x8D, 0x8F, 0x90, 0x9D (the latter five characters are undefined in > the python codec for cp1252). > > https://pypi.python.org/pypi/mysql-latin1-codec/1.0 > > --J?rg Knappen > > *Gesendet:* Mittwoch, 30. Oktober 2013 um 19:14 Uhr > *Von:* "Buck Golemon" > *An:* "Fr?d?ric Grosshans" > *Cc:* "J?rg Knappen" , unicode > *Betreff:* Re: Aw: Re: Re: Re: Re: Do you know a tool to decode "UTF-8 > twice" > > > On Wed, Oct 30, 2013 at 9:56 AM, Fr?d?ric Grosshans < > frederic.grosshans at gmail.com> wrote: >> >> Le 30/10/2013 17:32, "J?rg Knappen" a ?crit : >> >>> >>> The data did not only contain latin-1 type mangling for the non-existent >>> Windows characters, but also sequences with the raw >>> C1 control characters for all of latin-1. So I had to do them, too. >>> The data weren't consistent at all, not even in their errors. >>> --J?rg Knappen >> >> Your question helped me dust off and repair a non working python snippet >> I wrote for a similar problem. I was stuck with the mixing of windows-1252 >> and latin1 controls (linked with a chinese characters). I write it below >> for reference. >> >> The python snippet below does not need sed, defines a function >> (unscramble(S)) which works on strings. The extension to files should be >> easy. >> >> Fr?d?ric Grosshans >> >> >> def Step1Filter(S): >> for c in S : >> #works character/character because of the cp1252/latin1 ambiguity >> try : >> yield c.encode('cp1252') >> except UnicodeEncodeError : >> yield c.encode('latin1') >> #Useful where cp1252 is undefined (81, 8D, 8F, 90, 9D) >> >> def unscramble(S): >> return b''.join(c for c in Step1Filter(S)).decode('utf8') >> >> PS: If anyone is interested in a licence, I consider this simple enough >> to be in the public domain an uncopyrightable. >> > > This encoding you've implemented above is known as windows-1252 by the > whatwg and all browsers [1][2]. > The implementation of cp1252 in python is instead a direct consequence of > the unicode.org definition [3]. > > [1] http://encoding.spec.whatwg.org/index-windows-1252.txt > [2] http://bukzor.github.io/encodings/cp1252.html > [3] > http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kojiishi at gluesoft.co.jp Wed Jan 29 12:24:11 2014 From: kojiishi at gluesoft.co.jp (Koji Ishii) Date: Wed, 29 Jan 2014 18:24:11 +0000 Subject: [CSSWG][css-writing-modes] Last Call for Comments on CSS3 Writing Modes In-Reply-To: <5C1870EC-0ED7-400A-A469-FB6635D4FEB1@gluesoft.co.jp> References: <5C1870EC-0ED7-400A-A469-FB6635D4FEB1@gluesoft.co.jp> Message-ID: On Jan 27, 2014, at 17:34, Koji Ishii > wrote: On Dec 21, 2013, at 20:39, CE Whitehead > wrote: 6.2 inline-start "Nominally the side from which text of its inline base direction will start. For boxes with a used direction value of ltr, this means the line-left side. For boxes with a used direction value of rtl, this means the line-right side. " => "The side of a box from which text will start. For boxes with a used direction value of ltr, this means the line-left side. For boxes with a used direction value of rtl, this means the line-right side. " ? {COMMENT: This text is unclear to me; not sure what you mean by "its" -- the box's?; I am not sure thus how to reword "inline base direction" -- so I left this phrase out though you probably need something. Also do you need to say "Nominally"? Because "nominally" does not mean anything to me in this sentence, though normally "nominally" is defined as "in name" -- but I cannot see saying this here; it just seems to not be the right word. Also finally, and I know this is a dumb question, but why can the inline--start never be at the top or the bottom, when the lines run top-to-bottom or bottom-to-top? The diagram seems to suggest that inline-start can be at the bottom or top.} Please allow me to work on this later. Fixed. /koji -------------- next part -------------- An HTML attachment was scrubbed... URL: From buck at yelp.com Wed Jan 29 12:32:05 2014 From: buck at yelp.com (Buck Golemon) Date: Wed, 29 Jan 2014 10:32:05 -0800 Subject: Do you know a tool to decode "UTF-8 twice" In-Reply-To: References: <20131028113410.wIOIFPvYvi4rX2/05EAm1cSW@dietcurd.local> <20131028162337.ZLPXM/tUQIjeRcuZdnxGd5T6@dietcurd.local> <20131029100950.q993HcL4O6e8kWFFZojh+81p@dietcurd.local> <527118E0.90501@gmail.com> <52712C94.7040102@gmail.com> <52713A3D.4090306@gmail.com> Message-ID: J?rg: I case you want to see the previous discussions on the subject, here they are: * "data for cp1252" http://www.unicode.org/mail-arch/unicode-ml/y2012-m11/0233".html * "cp1252 decoder implementation" http://www.unicode.org/mail-arch/unicode-ml/y2012-m11/0167.html * tangential "latin1 decoder implementation" http://www.unicode.org/mail-arch/unicode-ml/y2012-m11/0146.html On Wed, Jan 29, 2014 at 10:21 AM, Buck Golemon wrote: > J?rg: > > This is the definition of cp1252 used by the whatwg and all current > browser implementations. > I've appealed to the cp1252 maintainer to update the definition so that we > don't have two competing standards, but I was rejected. > I've been considering naming it cp1252-whatwg. > > > On Wed, Jan 29, 2014 at 6:59 AM, "J?rg Knappen" wrote: > >> A little postscrptum to this old thread: >> >> On pyPi, there is now a codec available that handles the peculiar >> definition of "latin1" inside mysql. >> The package is called mysql-latin1-codec and features an encoding >> consisting of cp1252 plus >> 0x81, 0x8D, 0x8F, 0x90, 0x9D (the latter five characters are undefined in >> the python codec for cp1252). >> >> https://pypi.python.org/pypi/mysql-latin1-codec/1.0 >> >> --J?rg Knappen >> >> *Gesendet:* Mittwoch, 30. Oktober 2013 um 19:14 Uhr >> *Von:* "Buck Golemon" >> *An:* "Fr?d?ric Grosshans" >> *Cc:* "J?rg Knappen" , unicode >> *Betreff:* Re: Aw: Re: Re: Re: Re: Do you know a tool to decode "UTF-8 >> twice" >> >> >> On Wed, Oct 30, 2013 at 9:56 AM, Fr?d?ric Grosshans < >> frederic.grosshans at gmail.com> wrote: >>> >>> Le 30/10/2013 17:32, "J?rg Knappen" a ?crit : >>> >>>> >>>> The data did not only contain latin-1 type mangling for the >>>> non-existent Windows characters, but also sequences with the raw >>>> C1 control characters for all of latin-1. So I had to do them, too. >>>> The data weren't consistent at all, not even in their errors. >>>> --J?rg Knappen >>> >>> Your question helped me dust off and repair a non working python >>> snippet I wrote for a similar problem. I was stuck with the mixing of >>> windows-1252 and latin1 controls (linked with a chinese characters). I >>> write it below for reference. >>> >>> The python snippet below does not need sed, defines a function >>> (unscramble(S)) which works on strings. The extension to files should be >>> easy. >>> >>> Fr?d?ric Grosshans >>> >>> >>> def Step1Filter(S): >>> for c in S : >>> #works character/character because of the cp1252/latin1 ambiguity >>> try : >>> yield c.encode('cp1252') >>> except UnicodeEncodeError : >>> yield c.encode('latin1') >>> #Useful where cp1252 is undefined (81, 8D, 8F, 90, 9D) >>> >>> def unscramble(S): >>> return b''.join(c for c in Step1Filter(S)).decode('utf8') >>> >>> PS: If anyone is interested in a licence, I consider this simple enough >>> to be in the public domain an uncopyrightable. >>> >> >> This encoding you've implemented above is known as windows-1252 by the >> whatwg and all browsers [1][2]. >> The implementation of cp1252 in python is instead a direct consequence of >> the unicode.org definition [3]. >> >> [1] http://encoding.spec.whatwg.org/index-windows-1252.txt >> [2] http://bukzor.github.io/encodings/cp1252.html >> [3] >> http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From markus.icu at gmail.com Wed Jan 29 13:22:35 2014 From: markus.icu at gmail.com (Markus Scherer) Date: Wed, 29 Jan 2014 11:22:35 -0800 Subject: Do you know a tool to decode "UTF-8 twice" In-Reply-To: References: <20131028113410.wIOIFPvYvi4rX2/05EAm1cSW@dietcurd.local> <20131028162337.ZLPXM/tUQIjeRcuZdnxGd5T6@dietcurd.local> <20131029100950.q993HcL4O6e8kWFFZojh+81p@dietcurd.local> <527118E0.90501@gmail.com> <52712C94.7040102@gmail.com> <52713A3D.4090306@gmail.com> Message-ID: On Wed, Jan 29, 2014 at 10:21 AM, Buck Golemon wrote: > I've been considering naming it cp1252-whatwg. > It would be nicer to put the organization name first, such as whatwg-cp1252 or maybe better html-cp1252. That would be more like ibm-932 and such. markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From buck at yelp.com Wed Jan 29 13:57:08 2014 From: buck at yelp.com (Buck Golemon) Date: Wed, 29 Jan 2014 11:57:08 -0800 Subject: Do you know a tool to decode "UTF-8 twice" In-Reply-To: References: <20131028113410.wIOIFPvYvi4rX2/05EAm1cSW@dietcurd.local> <20131028162337.ZLPXM/tUQIjeRcuZdnxGd5T6@dietcurd.local> <20131029100950.q993HcL4O6e8kWFFZojh+81p@dietcurd.local> <527118E0.90501@gmail.com> <52712C94.7040102@gmail.com> <52713A3D.4090306@gmail.com> Message-ID: Anne: Given that the intent is to implement exactly the whatwg spec, and the group is currently called "whatwg" (even though it may eventually become a historical artifact), is "whatwg-1252" most appropriate? Norbert Lindenberg previously suggested standardizing some kind of disambiguation. http://www.unicode.org/mail-arch/unicode-ml/y2012-m12/0022.html Do you most prefer the s/web-/cp/ pattern? On Wed, Jan 29, 2014 at 11:53 AM, Anne van Kesteren wrote: > On Wed, Jan 29, 2014 at 11:22 AM, Markus Scherer > wrote: > > On Wed, Jan 29, 2014 at 10:21 AM, Buck Golemon wrote: > >> I've been considering naming it cp1252-whatwg. > > > > It would be nicer to put the organization name first, such as > whatwg-cp1252 > > or maybe better html-cp1252. That would be more like ibm-932 and such. > > If you want to support more encodings than > http://encoding.spec.whatwg.org/ defines I suggest using the prefix > "web-". The organization may change and this is not tied to HTML. > > > -- > http://annevankesteren.nl/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From craig.gallacher at gmail.com Wed Jan 29 14:04:53 2014 From: craig.gallacher at gmail.com (Craig Gallacher) Date: Wed, 29 Jan 2014 20:04:53 +0000 Subject: Do you know a tool to decode "UTF-8 twice" In-Reply-To: References: <20131028113410.wIOIFPvYvi4rX2/05EAm1cSW@dietcurd.local> <20131028162337.ZLPXM/tUQIjeRcuZdnxGd5T6@dietcurd.local> <20131029100950.q993HcL4O6e8kWFFZojh+81p@dietcurd.local> <527118E0.90501@gmail.com> <52712C94.7040102@gmail.com> <52713A3D.4090306@gmail.com> Message-ID: Apologies I know this is on the website, but how do I unsubscribe from this list? Cheers C ? grampianmountains.net ?44 (0)7877 990538 On 29 January 2014 19:57, Buck Golemon wrote: > Anne: Given that the intent is to implement exactly the whatwg spec, and > the group is currently called "whatwg" (even though it may eventually > become a historical artifact), is "whatwg-1252" most appropriate? > > Norbert Lindenberg previously suggested standardizing some kind of > disambiguation. > http://www.unicode.org/mail-arch/unicode-ml/y2012-m12/0022.html > > Do you most prefer the s/web-/cp/ pattern? > > > On Wed, Jan 29, 2014 at 11:53 AM, Anne van Kesteren wrote: > >> On Wed, Jan 29, 2014 at 11:22 AM, Markus Scherer >> wrote: >> > On Wed, Jan 29, 2014 at 10:21 AM, Buck Golemon wrote: >> >> I've been considering naming it cp1252-whatwg. >> > >> > It would be nicer to put the organization name first, such as >> whatwg-cp1252 >> > or maybe better html-cp1252. That would be more like ibm-932 and such. >> >> If you want to support more encodings than >> http://encoding.spec.whatwg.org/ defines I suggest using the prefix >> "web-". The organization may change and this is not tied to HTML. >> >> >> -- >> http://annevankesteren.nl/ >> > > > _______________________________________________ > Unicode mailing list > Unicode at unicode.org > http://unicode.org/mailman/listinfo/unicode > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From craig.gallacher at gmail.com Wed Jan 29 14:04:53 2014 From: craig.gallacher at gmail.com (Craig Gallacher) Date: Wed, 29 Jan 2014 20:04:53 +0000 Subject: Do you know a tool to decode "UTF-8 twice" In-Reply-To: References: <20131028113410.wIOIFPvYvi4rX2/05EAm1cSW@dietcurd.local> <20131028162337.ZLPXM/tUQIjeRcuZdnxGd5T6@dietcurd.local> <20131029100950.q993HcL4O6e8kWFFZojh+81p@dietcurd.local> <527118E0.90501@gmail.com> <52712C94.7040102@gmail.com> <52713A3D.4090306@gmail.com> Message-ID: Apologies I know this is on the website, but how do I unsubscribe from this list? Cheers C ? grampianmountains.net ?44 (0)7877 990538 On 29 January 2014 19:57, Buck Golemon wrote: > Anne: Given that the intent is to implement exactly the whatwg spec, and > the group is currently called "whatwg" (even though it may eventually > become a historical artifact), is "whatwg-1252" most appropriate? > > Norbert Lindenberg previously suggested standardizing some kind of > disambiguation. > http://www.unicode.org/mail-arch/unicode-ml/y2012-m12/0022.html > > Do you most prefer the s/web-/cp/ pattern? > > > On Wed, Jan 29, 2014 at 11:53 AM, Anne van Kesteren wrote: > >> On Wed, Jan 29, 2014 at 11:22 AM, Markus Scherer >> wrote: >> > On Wed, Jan 29, 2014 at 10:21 AM, Buck Golemon wrote: >> >> I've been considering naming it cp1252-whatwg. >> > >> > It would be nicer to put the organization name first, such as >> whatwg-cp1252 >> > or maybe better html-cp1252. That would be more like ibm-932 and such. >> >> If you want to support more encodings than >> http://encoding.spec.whatwg.org/ defines I suggest using the prefix >> "web-". The organization may change and this is not tied to HTML. >> >> >> -- >> http://annevankesteren.nl/ >> > > > _______________________________________________ > Unicode mailing list > Unicode at unicode.org > http://unicode.org/mailman/listinfo/unicode > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From textexin at xencraft.com Wed Jan 29 15:09:16 2014 From: textexin at xencraft.com (Tex Texin) Date: Wed, 29 Jan 2014 13:09:16 -0800 Subject: Do you know a tool to decode "UTF-8 twice" In-Reply-To: References: <20131028113410.wIOIFPvYvi4rX2/05EAm1cSW@dietcurd.local> <20131028162337.ZLPXM/tUQIjeRcuZdnxGd5T6@dietcurd.local> <20131029100950.q993HcL4O6e8kWFFZojh+81p@dietcurd.local> <527118E0.90501@gmail.com> <52712C94.7040102@gmail.com> <52713A3D.4090306@gmail.com> Message-ID: <00df01cf1d36$70f30d10$52d92730$@com> Since it isn?t cp1252 nor iso8859, perhaps call it whatwg-latin or whatwg-1. If, or when, 1252 is updated to assign a character to an undefined codepoint, it will be problematic to have them both refer to 1252. For example, if a new currency symbol is added in Latin America, as has been discussed from time to time. Anyone writing decoders for the Whatwg encoding should also be on notice that it is not necessarily a superset of 1252 going forward, and should design for the potential distinction down the road. I am tempted to suggest we call it ?Whatwg-Not-your-fathers-1252? which also would serve appropriate notice? tex From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Buck Golemon Sent: Wednesday, January 29, 2014 11:57 AM To: Anne van Kesteren Cc: unicode; unicode at norbertlindenberg.com; J?rg Knappen; Fr?d?ric Grosshans; Markus Scherer Subject: Re: Re: Re: Re: Re: Re: Do you know a tool to decode "UTF-8 twice" Anne: Given that the intent is to implement exactly the whatwg spec, and the group is currently called "whatwg" (even though it may eventually become a historical artifact), is "whatwg-1252" most appropriate? Norbert Lindenberg previously suggested standardizing some kind of disambiguation. http://www.unicode.org/mail-arch/unicode-ml/y2012-m12/0022.html Do you most prefer the s/web-/cp/ pattern? On Wed, Jan 29, 2014 at 11:53 AM, Anne van Kesteren wrote: On Wed, Jan 29, 2014 at 11:22 AM, Markus Scherer wrote: > On Wed, Jan 29, 2014 at 10:21 AM, Buck Golemon wrote: >> I've been considering naming it cp1252-whatwg. > > It would be nicer to put the organization name first, such as whatwg-cp1252 > or maybe better html-cp1252. That would be more like ibm-932 and such. If you want to support more encodings than http://encoding.spec.whatwg.org/ defines I suggest using the prefix "web-". The organization may change and this is not tied to HTML. -- http://annevankesteren.nl/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From prosfilaes at gmail.com Wed Jan 29 15:45:09 2014 From: prosfilaes at gmail.com (David Starner) Date: Wed, 29 Jan 2014 13:45:09 -0800 Subject: Do you know a tool to decode "UTF-8 twice" In-Reply-To: <00df01cf1d36$70f30d10$52d92730$@com> References: <20131028113410.wIOIFPvYvi4rX2/05EAm1cSW@dietcurd.local> <20131028162337.ZLPXM/tUQIjeRcuZdnxGd5T6@dietcurd.local> <20131029100950.q993HcL4O6e8kWFFZojh+81p@dietcurd.local> <527118E0.90501@gmail.com> <52712C94.7040102@gmail.com> <52713A3D.4090306@gmail.com> <00df01cf1d36$70f30d10$52d92730$@com> Message-ID: On Wed, Jan 29, 2014 at 1:09 PM, Tex Texin wrote: > If, or when, 1252 is updated to assign a character to an undefined > codepoint, it will be problematic to have them both refer to 1252. > > For example, if a new currency symbol is added in Latin America, as has been > discussed from time to time. > > > > Anyone writing decoders for the Whatwg encoding should also be on notice > that it is not necessarily a superset of 1252 going forward, and should > design for the potential distinction down the road. I don't believe there's any chance that CP-1252 is going to get new changes. Unicode is king and the value for Microsoft of patching all the supported Windows editions versus just telling people to use Unicode is minimal. In any case, Microsoft has to interact with the Whatwg definition of Latin-1/CP-1252 just as much as anyone else. -- Kie ekzistas vivo, ekzistas espero. From buck at yelp.com Wed Jan 29 17:17:32 2014 From: buck at yelp.com (Buck Golemon) Date: Wed, 29 Jan 2014 15:17:32 -0800 Subject: Do you know a tool to decode "UTF-8 twice" In-Reply-To: References: <20131028113410.wIOIFPvYvi4rX2/05EAm1cSW@dietcurd.local> <20131028162337.ZLPXM/tUQIjeRcuZdnxGd5T6@dietcurd.local> <20131029100950.q993HcL4O6e8kWFFZojh+81p@dietcurd.local> <527118E0.90501@gmail.com> <52712C94.7040102@gmail.com> <52713A3D.4090306@gmail.com> <00df01cf1d36$70f30d10$52d92730$@com> Message-ID: On Wed, Jan 29, 2014 at 1:45 PM, David Starner wrote: > On Wed, Jan 29, 2014 at 1:09 PM, Tex Texin wrote: > > If, or when, 1252 is updated to assign a character to an undefined > > codepoint, it will be problematic to have them both refer to 1252. > > > > For example, if a new currency symbol is added in Latin America, as has > been > > discussed from time to time. > > > > > > > > Anyone writing decoders for the Whatwg encoding should also be on notice > > that it is not necessarily a superset of 1252 going forward, and should > > design for the potential distinction down the road. > > I don't believe there's any chance that CP-1252 is going to get new > changes. Unicode is king and the value for Microsoft of patching all > the supported Windows editions versus just telling people to use > Unicode is minimal. In any case, Microsoft has to interact with the > Whatwg definition of Latin-1/CP-1252 just as much as anyone else. > > Shawn Steele, the cp1252 owner said: Our legacy code pages aren't going to change. We won't add more characters > to 1252. http://www.unicode.org/mail-arch/unicode-ml/y2012-m11/0202.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From jknappen at web.de Thu Jan 30 02:21:46 2014 From: jknappen at web.de (=?UTF-8?Q?=22J=C3=B6rg_Knappen=22?=) Date: Thu, 30 Jan 2014 09:21:46 +0100 (CET) Subject: Aw: Re: Re: Re: Re: Re: Re: Do you know a tool to decode "UTF-8 twice" In-Reply-To: References: <20131028113410.wIOIFPvYvi4rX2/05EAm1cSW@dietcurd.local> <20131028162337.ZLPXM/tUQIjeRcuZdnxGd5T6@dietcurd.local> <20131029100950.q993HcL4O6e8kWFFZojh+81p@dietcurd.local> <527118E0.90501@gmail.com> <52712C94.7040102@gmail.com> <52713A3D.4090306@gmail.com> , Message-ID: An HTML attachment was scrubbed... URL: From buck at yelp.com Thu Jan 30 12:15:47 2014 From: buck at yelp.com (Buck Golemon) Date: Thu, 30 Jan 2014 10:15:47 -0800 Subject: Aw: Re: Re: Re: Re: Re: Re: Do you know a tool to decode "UTF-8 twice" In-Reply-To: References: <20131028113410.wIOIFPvYvi4rX2/05EAm1cSW@dietcurd.local> <20131028162337.ZLPXM/tUQIjeRcuZdnxGd5T6@dietcurd.local> <20131029100950.q993HcL4O6e8kWFFZojh+81p@dietcurd.local> <527118E0.90501@gmail.com> <52712C94.7040102@gmail.com> <52713A3D.4090306@gmail.com> Message-ID: While I understand your argument, my intent was to suggest that "mysql-latin1" was *not* as good as some other name. Surely you're not arguing that all names are equivalently good. Obviously "mnmmmnmn" is a worse name than "mysql-latin1". "Mysql" has less to do with the issue than "whatwg" or "web", since this codec is necessary any time you want to reproduce browser decoding, regardless of whether mysql is involved. I contend that mysql adopted this implementation because it is so popularly used for web applications. "latin1" is less directly accurate than "cp1252". While whatwg requires that latin1 be an alias of cp1252, it does the same for ascii, and it maintains that the canonical name is "windows-1252". Ideally you'd want to update the name of your project, but if not, that's your preference :) However if I can get some consensus on a least-bad name ("web-cp1252" with alias "web-windows-1252" seems to be in the lead), I plan to release such a codec. This issue also extends far beyond python. Any language that deals with the web (ie all of them) and wants to be able to interpret (legacy) bytes exactly as a browser would (admittedly a niche, but still important task) needs such a codec. I believe unicode.org should eventually recognize such a codec. Ideally it would reflect that this is the most-common implementation of cp1252, but if I need to use a different name, that's better than nothing at all. On Jan 30, 2014 12:31 AM, J?rg Knappen wrote: > When you are looking for a *new* name for that encoding, why don't you > just adopt the pythonese precedent > mysql-latin1 ? It is as good or as bad as any other name, but has some > footing just now. > > --J?rg Knappen > > *Gesendet:* Mittwoch, 29. Januar 2014 um 21:12 Uhr > *Von:* "Anne van Kesteren" > *An:* "Buck Golemon" > *Cc:* "Markus Scherer" , "J?rg Knappen" < > jknappen at web.de>, "Fr?d?ric Grosshans" , > unicode , unicode at norbertlindenberg.com > *Betreff:* Re: Re: Re: Re: Re: Re: Do you know a tool to decode "UTF-8 > twice" > On Wed, Jan 29, 2014 at 11:57 AM, Buck Golemon wrote: > > Anne: Given that the intent is to implement exactly the whatwg spec, and > the > > group is currently called "whatwg" (even though it may eventually become > a > > historical artifact), is "whatwg-1252" most appropriate? > > It's up to you I suppose, but "whatwg-1252" just seems like long term > it will lose its meaning. For the web "windows-1252" will always have > this meaning due to deployed content, so "web-windows-1252" if you > need to disambiguate from a different implementation of windows-1252 > makes sense to me. > > > -- > http://annevankesteren.nl/ > > _______________________________________________ > Unicode mailing list > Unicode at unicode.org > http://unicode.org/mailman/listinfo/unicode > > -------------- next part -------------- An HTML attachment was scrubbed... URL: