From rxaviers at gmail.com Mon Feb 13 10:07:13 2017 From: rxaviers at gmail.com (Rafael Xavier) Date: Mon, 13 Feb 2017 14:07:13 -0200 Subject: Full wide numbers Message-ID: Hi everyone, I have a question for you about numbering systems... Chinese uses latn for the default nu (zh), hanidec for the native nu (zh-u-nu-native), hans for the traditional nu (zh-u-nu-traditio), and hansfin for finance nu (zh-u-nu-finance). CLDR also includes data for the *numeric* cited numbering systems, i.e., latn and hanidec (e.g., decimalFormats-numberSystem-latn and decimalFormats-numberSystem-hanidec ). So far so good... My question is, what decimalFormats (percentFormats, currencyFormats, etc) should be used for another arbitrary numbering system? For example, using fullwide numbering system in Chinese (zu-u-nu-fullwide). There's no decimalFormats-numberSystem-fullwide. I didn't find anything in UTS Part 3: Numbers about defining what to do in such case. ICU seems to handle that case fine though: http://demo.icu-project.org/icu4jweb/flexTest.jsp?pat=yMd&_=zh%40numbers%3Dfullwide Is there any recommendation implementations should follow? Thanks -- +55 (16) 98138-1583, skype: rxaviers http://rafael.xavier.blog.br -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimckenna at paypal.com Mon Feb 13 16:42:31 2017 From: mimckenna at paypal.com (Mckenna, Mike) Date: Mon, 13 Feb 2017 22:42:31 +0000 Subject: Full wide numbers In-Reply-To: References: Message-ID: ICU made the right choice in allowing both. I would think normalize from fullwide to normal first before parsing would be a good tack to use. Mike McKenna Internationalization Technology Architect +1-408-967-3631 (desk), +1-510-332-7820 (mobile) PayPal 2211 N. First Street, San Jose CA 95131 - USA From: CLDR-Users on behalf of Rafael Xavier Date: Monday, February 13, 2017 at 8:07 AM To: "cldr-users at unicode.org" Subject: Full wide numbers Hi everyone, I have a question for you about numbering systems... Chinese uses latn for the default nu (zh), hanidec for the native nu (zh-u-nu-native), hans for the traditional nu (zh-u-nu-traditio), and hansfin for finance nu (zh-u-nu-finance). CLDR also includes data for the numeric cited numbering systems, i.e., latn and hanidec (e.g., decimalFormats-numberSystem-latn and decimalFormats-numberSystem-hanidec). So far so good... My question is, what decimalFormats (percentFormats, currencyFormats, etc) should be used for another arbitrary numbering system? For example, using fullwide numbering system in Chinese (zu-u-nu-fullwide). There's no decimalFormats-numberSystem-fullwide. I didn't find anything in UTS Part 3: Numbers about defining what to do in such case. ICU seems to handle that case fine though: http://demo.icu-project.org/icu4jweb/flexTest.jsp?pat=yMd&_=zh%40numbers%3Dfullwide Is there any recommendation implementations should follow? Thanks -- +55 (16) 98138-1583, skype: rxaviers http://rafael.xavier.blog.br -------------- next part -------------- An HTML attachment was scrubbed... URL: From rajavelmani at gmail.com Thu Feb 16 18:30:59 2017 From: rajavelmani at gmail.com (Manikandan Ramalingam Kandaswamy) Date: Thu, 16 Feb 2017 16:30:59 -0800 Subject: Regarding Time zone name usage Message-ID: Hi CLDR, I need some clarification on using time zone names spec. I am implementing the date/time format for date symbols `zzzz` and I have questions on the regarding the time zone format spec . I am focusing on regionFormat-standard and regionFormat-daylight related to `zzzz` format ?{0} Standard Time? *or* ?{COUNTRY} Standard Time / {CITY} Standard Time?. Based on tr35 documentation spec I use metaZone and golden Zone in TimeZoneNames. But, I have these questions which I could not get the answers from tr35 documentation . ? Is there any time zone which is not in a metaZone? ? If there is a time zone which is not in a metaZone o Where is the localized COUNTRY or CITY name for constructing ?{COUNTRY} Standard Time / {CITY} Standard Time? o Should I use exemplarCity in TimeZoneNames.json ? Secondly, for existing metaZone in TimeZoneNames for `z?zzz` there are cases where there is no short time zone names like for Asia/Calcutta . In this case, I have these questions to be clarified ? Should I fall back to ?O? (GMT) format? ? Can we have some algorithm in zone format spec about short time zone name usage? Can someone help me to clarify the above questions? Thanks Mani -------------- next part -------------- An HTML attachment was scrubbed... URL: From iz6445a at student.american.edu Thu Feb 23 11:18:25 2017 From: iz6445a at student.american.edu (Isabelle Zaugg) Date: Thu, 23 Feb 2017 20:18:25 +0300 Subject: question about identifying CLDR coverage % for Amharic Message-ID: Dear All, I am working on my dissertation research and would like to identify the percentage of CLDR coverage for Amharic and the other languages utilizing the Ethiopic script. I would like to get a percentage coverage for today, as well as look at the increase over time. So far here is what I have been told: - ?The files in common/main are organized by locale, which uses a code, eg fr.xml for French.? So you can look for the ones you want. To get the codes, you can look at http://www.unicode.org/cldr/charts/latest/ supplemental/languages_and_scripts.html . For example, Ethiopic is 'am'. For more information see the spec: http://unicode.org/reports/tr35/ - To get changes over versions, you would download successive versions. Just the am.xml should be good enough; the regional variants typically inherit. Following these guidelines, I was still unable to identify the percentage of CLDR coverage for Amharic. Is there anyone who could help me with this issue? Thank you, Isabelle Isabelle Zaugg Fulbright-Hays Doctoral Dissertation Research Abroad Fellow in ????? PhD Candidate in Communication American University Washington, D.C. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjl at sugarlabs.org Thu Feb 23 13:01:53 2017 From: cjl at sugarlabs.org (Chris Leonard) Date: Thu, 23 Feb 2017 14:01:53 -0500 Subject: question about identifying CLDR coverage % for Amharic In-Reply-To: References: Message-ID: Download the latest version of CLDR http://unicode.org/Public/cldr/30.0.3/ specifically the core.zip file unzip the core.zip file, open the common folder, open the main folder, look for am.xml and am_ET.xml, find the attached files. Is that what you are looking for? cjl On Thu, Feb 23, 2017 at 12:18 PM, Isabelle Zaugg < iz6445a at student.american.edu> wrote: > Dear All, > > I am working on my dissertation research and would like to identify the > percentage of CLDR coverage for Amharic and the other languages utilizing > the Ethiopic script. I would like to get a percentage coverage for today, > as well as look at the increase over time. So far here is what I have been > told: > > > - ?The files in common/main are organized by locale, which uses a > code, eg fr.xml for French.? So you can look for the ones you want. To get > the codes, you can look at http://www.unicode.org/cldr > /charts/latest/supplemental/languages_and_scripts.html > . > For example, Ethiopic is 'am'. For more information see the spec: > http://unicode.org/reports/tr35/ > - To get changes over versions, you would download successive > versions. Just the am.xml should be good enough; the regional variants > typically inherit. > > Following these guidelines, I was still unable to identify the percentage > of CLDR coverage for Amharic. Is there anyone who could help me with this > issue? > > Thank you, > > Isabelle > > > Isabelle Zaugg > Fulbright-Hays Doctoral Dissertation Research Abroad Fellow in ????? > PhD Candidate in Communication > American University > Washington, D.C. > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: am.xml Type: text/xml Size: 341520 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: am_ET.xml Type: text/xml Size: 535 bytes Desc: not available URL: From verdy_p at wanadoo.fr Thu Feb 23 13:39:04 2017 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Thu, 23 Feb 2017 20:39:04 +0100 Subject: question about identifying CLDR coverage % for Amharic In-Reply-To: References: Message-ID: You should not attach files ("am.xml" and am_ET.xml) in such talk, just providing the download links is enough ("am.xml" alone is already 334KB, but in fact even more when it is reencoded in a MIME attachment). 2017-02-23 20:01 GMT+01:00 Chris Leonard : > Download the latest version of CLDR > > http://unicode.org/Public/cldr/30.0.3/ > > specifically the core.zip file > > unzip the core.zip file, open the common folder, open the main folder, > look for am.xml and am_ET.xml, find the attached files. > > Is that what you are looking for? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tangmu at wenlin.com Thu Feb 23 15:58:01 2017 From: tangmu at wenlin.com (Tom Bishop, Wenlin Institute) Date: Thu, 23 Feb 2017 16:58:01 -0500 Subject: question about identifying CLDR coverage % for Amharic In-Reply-To: References: Message-ID: <49138035-387E-40A4-A7BB-B802AFFC45B4@wenlin.com> In this part of am.xml the content all looks Amharic: ???? ???? ???? ??? ???? ????? ???? ??? ???? ???? ????? ??? ???? In this part English and Amharic content are mixed: Modifier ???? ????? ???? Nonspacing ???? Objects ??? Is there an established system to derive a meaningful "percentage of CLDR coverage for Amharic" from the data? Just from these (not really random) examples one might estimate 70%. Tom > On Feb 23, 2017, at 2:01 PM, Chris Leonard wrote: > > Download the latest version of CLDR > > http://unicode.org/Public/cldr/30.0.3/ > > specifically the core.zip file > > unzip the core.zip file, open the common folder, open the main folder, look for am.xml and am_ET.xml, find the attached files. > > Is that what you are looking for? > > cjl > > > > > > On Thu, Feb 23, 2017 at 12:18 PM, Isabelle Zaugg > wrote: > Dear All, > > I am working on my dissertation research and would like to identify the percentage of CLDR coverage for Amharic and the other languages utilizing the Ethiopic script. I would like to get a percentage coverage for today, as well as look at the increase over time. So far here is what I have been told: > > ?The files in common/main are organized by locale, which uses a code, eg fr.xml for French.? So you can look for the ones you want. To get the codes, you can look at http://www.unicode.org/cldr/charts/latest/supplemental/languages_and_scripts.html . For example, Ethiopic is 'am'. For more information see the spec: http://unicode.org/reports/tr35/ > To get changes over versions, you would download successive versions. Just the am.xml should be good enough; the regional variants typically inherit. > Following these guidelines, I was still unable to identify the percentage of CLDR coverage for Amharic. Is there anyone who could help me with this issue? > > Thank you, > > Isabelle > > > Isabelle Zaugg > Fulbright-Hays Doctoral Dissertation Research Abroad Fellow in ????? > PhD Candidate in Communication > American University > Washington, D.C. > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users Wenlin Institute, Inc. SPC (a Social Purpose Corporation) ??????????? Software for Learning Chinese E-mail: wenlin at wenlin.com Web: http://www.wenlin.com Telephone: 1-877-4-WENLIN (1-877-493-6546) ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Thu Feb 23 18:17:28 2017 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Fri, 24 Feb 2017 01:17:28 +0100 Subject: question about identifying CLDR coverage % for Amharic In-Reply-To: <49138035-387E-40A4-A7BB-B802AFFC45B4@wenlin.com> References: <49138035-387E-40A4-A7BB-B802AFFC45B4@wenlin.com> Message-ID: Normally the English terms included should have a "provisional" status if it was not vetted and approved. This is probably a bug in the data, or in the way the XML files were derived from the survey. 2017-02-23 22:58 GMT+01:00 Tom Bishop, Wenlin Institute : > In this part of am.xml the content all looks Amharic: > > ???? ???? ???? ??? ???? > ????? ???? ??? ???? > ???? ????? ??? ???? > > In this part English and Amharic content are mixed: > > Modifier > ???? ????? > ???? > Nonspacing > ???? > Objects > ??? > > Is there an established system to derive a meaningful "percentage of CLDR > coverage for Amharic" from the data? Just from these (not really random) > examples one might estimate 70%. > > Tom > > On Feb 23, 2017, at 2:01 PM, Chris Leonard wrote: > > Download the latest version of CLDR > > http://unicode.org/Public/cldr/30.0.3/ > > specifically the core.zip file > > unzip the core.zip file, open the common folder, open the main folder, > look for am.xml and am_ET.xml, find the attached files. > > Is that what you are looking for? > > cjl > > > > > > On Thu, Feb 23, 2017 at 12:18 PM, Isabelle Zaugg < > iz6445a at student.american.edu> wrote: > >> Dear All, >> >> I am working on my dissertation research and would like to identify the >> percentage of CLDR coverage for Amharic and the other languages utilizing >> the Ethiopic script. I would like to get a percentage coverage for today, >> as well as look at the increase over time. So far here is what I have been >> told: >> >> >> - ?The files in common/main are organized by locale, which uses a >> code, eg fr.xml for French.? So you can look for the ones you want. To get >> the codes, you can look at http://www.unicode.org/cldr >> /charts/latest/supplemental/languages_and_scripts.html >> . >> For example, Ethiopic is 'am'. For more information see the spec: >> http://unicode.org/reports/tr35/ >> - To get changes over versions, you would download successive >> versions. Just the am.xml should be good enough; the regional variants >> typically inherit. >> >> Following these guidelines, I was still unable to identify the percentage >> of CLDR coverage for Amharic. Is there anyone who could help me with this >> issue? >> >> Thank you, >> >> Isabelle >> >> >> Isabelle Zaugg >> Fulbright-Hays Doctoral Dissertation Research Abroad Fellow in ????? >> PhD Candidate in Communication >> American University >> Washington, D.C. >> >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> >> > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > > > Wenlin Institute, Inc. SPC (a Social Purpose Corporation) > ??????????? > Software for Learning Chinese > E-mail: wenlin at wenlin.com Web: http://www.wenlin.com > Telephone: 1-877-4-WENLIN (1-877-493-6546) > ? > > > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Fri Feb 24 00:24:58 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Fri, 24 Feb 2017 06:24:58 +0000 Subject: question about identifying CLDR coverage % for Amharic In-Reply-To: <49138035-387E-40A4-A7BB-B802AFFC45B4@wenlin.com> References: <49138035-387E-40A4-A7BB-B802AFFC45B4@wenlin.com> Message-ID: <20170224062458.23d13e27@JRWUBU2> > > On Thu, Feb 23, 2017 at 12:18 PM, Isabelle Zaugg > > > > wrote: >>> I am working on my dissertation research and would like to identify >>> the percentage of CLDR coverage for Amharic and the other languages >>> utilizing the Ethiopic script. I would like to get a percentage >>> coverage for today, as well as look at the increase over time. What about the decrease over time? If the 'number of items' in CLDR increases, the percentage will drop unless new entries are added for Amharic. On Thu, 23 Feb 2017 16:58:01 -0500 "Tom Bishop, Wenlin Institute" wrote: > Is there an established system to derive a meaningful "percentage of > CLDR coverage for Amharic" from the data? Just from these (not really > random) examples one might estimate 70%. It gets worse than this. Sometimes default data is appropriate, sometimes it isn't. For example, there is no explicit coverage for collation (or at least, there wasn't back in Version 27.0.1). However, if the CLDR default gives the correct results for Amharic, then that part of the coverage is complete. There may even be cases that CLDR refuses to cover. An example in English is that CLDR refuses to handle the difference in indefinite article between "a 3-page letter" and "an 8-page letter". What percentage of non-coverage would one calculate for this? Richard. From mark at macchiato.com Fri Feb 24 02:17:01 2017 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Fri, 24 Feb 2017 09:17:01 +0100 Subject: question about identifying CLDR coverage % for Amharic In-Reply-To: <20170224062458.23d13e27@JRWUBU2> References: <49138035-387E-40A4-A7BB-B802AFFC45B4@wenlin.com> <20170224062458.23d13e27@JRWUBU2> Message-ID: A few items. Modifier ???? ????? ???? We do flag error cases to vetters (contributors) where we can, and give warnings. But if they feel that a term is better in a different language or script, that is up to them to decide. Mark On Fri, Feb 24, 2017 at 7:24 AM, Richard Wordingham < richard.wordingham at ntlworld.com> wrote: > > > On Thu, Feb 23, 2017 at 12:18 PM, Isabelle Zaugg > > > > > > wrote: > > >>> I am working on my dissertation research and would like to identify > >>> the percentage of CLDR coverage for Amharic and the other languages > >>> utilizing the Ethiopic script. I would like to get a percentage > >>> coverage for today, as well as look at the increase over time. > > What about the decrease over time? If the 'number of items' in CLDR > increases, the percentage will drop unless new entries are added for > Amharic. > ?Yes, that is what happens with http://cldr.unicode.org/index/downloads/cldr-30#TOC-Growth. We are moving the bar up all the time. What we do for that graph is measure the number of items in the past vs the current set. ? > > On Thu, 23 Feb 2017 16:58:01 -0500 > "Tom Bishop, Wenlin Institute" wrote: > > > Is there an established system to derive a meaningful "percentage of > > CLDR coverage for Amharic" from the data? Just from these (not really > > random) examples one might estimate 70%. > ?The way we measure modern coverage is against a set of data described in http://unicode.org/reports/tr35/tr35-info.html#Coverage_Levels. ? > > > > It gets worse than this. Sometimes default data is appropriate, > sometimes it isn't. For example, there is no explicit coverage for > collation (or at least, there wasn't back in Version 27.0.1). However, > if the CLDR default gives the correct results for Amharic, then that > part of the coverage is complete. > ?v27 is (relatively) ancient, so I'd suggest you look at more recent versions, rather than start off with "It gets worse...". Where we have confirmation that the root collation is sufficient for the language, then an empty file can be added to http://unicode.org/repos/cldr/tags/latest/common/collation/. (We used to use a "validSublocales" attributed, but found that simply having empty files worked better, procedurally.) In that directory, you'll see an item for am.xml. It isn't completely empty: it just rearranges the Ethi script ahead of Latin. There may even be cases that CLDR refuses to cover. An example in > English is that CLDR refuses to handle the difference in indefinite > article between "a 3-page letter" and "an 8-page letter". What > percentage of non-coverage would one calculate for this? > ? "Refuses"? That is a loaded term, usually part of an accusation. Joe could accuse, for example, you, Richard Wordingham, of "refusing" to run a 4 minute mile, even though: nobody ever asked you; it wouldn't probably be possible; and if it were, you probably wouldn't be able to spend the amount of time and effort to do so; or want to, given your busy life. The scope of CLDR is to provide a core set of locale data for internationalization services. It does not have as a goal the ability to grammatically compose messages in all of the languages it covers. That is a huge task that many, many people are developing sophisticated ML models for doing. We extend the scope of CLDR periodically when we get proposals for doing so that are feasible, and have a high enough priority given the many, many items on our "todo" list (1821, currently). We were able to do so with plural rules, for example. And it isn't out of scope in the future for us to support data for doing a limited set of local-scope adjustments across languages, if we have a practical proposal for doing so. We haven't "refused" to do a/an. If you or others are interested in contributing to CLDR, please let us know. (One caveat; sometimes there are practical limitations on our accepting contributions because the size of the contribution imposes to high a cost on just the assessment of it.) ? > > Richard. > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Fri Feb 24 15:42:54 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Fri, 24 Feb 2017 21:42:54 +0000 Subject: question about identifying CLDR coverage % for Amharic In-Reply-To: References: <49138035-387E-40A4-A7BB-B802AFFC45B4@wenlin.com> <20170224062458.23d13e27@JRWUBU2> Message-ID: <20170224214254.368ca8a7@JRWUBU2> On Fri, 24 Feb 2017 09:17:01 +0100 Mark Davis ?? wrote: > ?v27 is (relatively) ancient, so I'd suggest you look at more recent > versions, rather than start off with "It gets worse...". > Where we have confirmation that the root collation is sufficient for > the language, then an empty file can be added to > http://unicode.org/repos/cldr/tags/latest/common/collation/. (We used > to use a "validSublocales" attributed, but found that simply having > empty files worked better, procedurally.) > In that directory, you'll see an item for am.xml. It isn't completely > empty: it just rearranges the Ethi script ahead of Latin. I notice a very similar file lo.xml. When did Laos haul up the white flag and more or less adopt the modern Thai collation order for Lao? I was startled to see the following statement therein, "The root collation order is valid for this language. Just move the native script first". The Lao collations I am acquainted with require large numbers of contractions, as one cannot leverage the lesser significance of tone marks; each syllable has its own primary weight. I suspended my development for one of the simpler systems when I discovered my tables were much more accurate than my test data - I need to buy a different Lao dictionary. Now, if Laos hasn't more or less standardised on DUCET, assessing its coverage would not be easy. Even at the discrete level, would it have reached the 'core' level? Although the collation would be wrong, it would still be usable. This is how the task of assessment 'gets worse'. > There may even be cases that CLDR refuses to cover. An example in > > English is that CLDR refuses to handle the difference in indefinite > > article between "a 3-page letter" and "an 8-page letter". What > > percentage of non-coverage would one calculate for this? > "Refuses"? > That is a loaded term, usually part of an accusation. Joe could > accuse, for example, you, Richard Wordingham, of "refusing" to run a > 4 minute mile, even though: nobody ever asked you; it wouldn't > probably be possible; and if it were, you probably wouldn't be able > to spend the amount of time and effort to do so; or want to, given > your busy life. Are you saying that this refusal in the LDML specification is not a response to my pointing out that the English plural rules didn't handle this subtlety? I suppose someone else may also have stumbled over the issue. > The scope of CLDR is to provide a core set of locale data for > internationalization services. It does not have as a goal the ability > to grammatically compose messages in all of the languages it covers. > That is a huge task that many, many people are developing > sophisticated ML models for doing. The 'plural rules' had already struck me as a large undertaking. The occurrence of the nasal mutation after some Welsh numerals seems to vary from valley to valley, though I suspect it's even less systematic. And that's before one reaches the point where one has to ask whether the numbers are being said decimally or vigesimally. > > We extend the scope of CLDR periodically when we get proposals for > doing so that are feasible, and have a high enough priority given the > many, many items on our "todo" list > > (1821, currently). We were able to do so with plural rules, for > example. And it isn't out of scope in the future for us to support > data for doing a limited set of local-scope adjustments across > languages, if we have a practical proposal for doing so. We haven't > "refused" to do a/an. UTS#35 Version 30 Part 3 Section 5 (http://unicode.org/reports/tr35/tr35-numbers.html#Language_Plural_Rules) reads like a refusal: "On the other hand, the above constructions are relatively rare in messages constructed using numeric placeholders, so the disruption for implementations currently using CLDR plural categories wouldn't be worth the small gain." There is a data synchronisation issue, unfortunately. Is 1800 "eighteen hundred" or "one thousand eight hundred"? > If you or others are interested in contributing to CLDR, please let us > know. (One caveat; sometimes there are practical limitations on our > accepting contributions because the size of the contribution imposes > to high a cost on just the assessment of it.) I have pi_Thai word- and line-breaking rules to provide once there is a home for them. They're not perfect, as I don't resolve sandhi. Richard. From verdy_p at wanadoo.fr Sat Feb 25 09:22:49 2017 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sat, 25 Feb 2017 16:22:49 +0100 Subject: question about identifying CLDR coverage % for Amharic In-Reply-To: <20170224214254.368ca8a7@JRWUBU2> References: <49138035-387E-40A4-A7BB-B802AFFC45B4@wenlin.com> <20170224062458.23d13e27@JRWUBU2> <20170224214254.368ca8a7@JRWUBU2> Message-ID: 2017-02-24 22:42 GMT+01:00 Richard Wordingham < richard.wordingham at ntlworld.com>: > > There is a data synchronisation issue, unfortunately. Is 1800 > "eighteen hundred" or "one thousand eight hundred"? > Both are valid, but not at the same ranks in term of use, depending on the context. The former is mostly encountered in dates (for years, which are not really cardinal quantities, but discrete ordinal values). But the generic pattern focuses on generic numbers for quantities (discrete or not). If used with amounts of currencies, "eithen hundreds dollars" is possible and understood, but rare. The same remark applies for other European languages with a large romance history (like French, but also including English with its historic important use of Latin in former administrations and interchange in Europe for commercial transactions): they tend to prefer the thousands form, but the hundreds forms were kept due to the frequent reference to centuries in historical papers, arts, culture, and heritage (that are still promoted using centuries as the most frequent scale for dates, when the exact precision of years is rapidly confusing or imprecise after just a few decenials). We start thinking about thousands in dates only in prehistoric dates and dates prior the creation of the Greek empire (before the Roman empire itself), or recent dates since the start of the second millenium. There's just an exception for year 1000 ("l'an mil") in French: note that the standard orthography for translating "thousand" changes from "mille" to just "mil", but only for year ordinals, and that "mil" is used only in the range 1000-1099, then preferably switches back to hundreds forms for all years in range 1100-1699: "onze cent" for 1100, then hesitates between both forms for years in range 1700-1999: "dix-sept cents" or "mille sept cents", as they are allophonic and equally long to pronounce and hear, some people prefering one form to the other to avoid pronunciation difficulties of some consonnant clusters like /ls/ in "mille sept cents" /mi:lsets?/ using the CVCCVCV pattern (while "dix-sept cents" use a simpler CVCVCV pattern which may be pronounced a bit faster). When spelling amounts of money, we want to be clear and avoid fast speech: the more regular thousands forms is prefered. -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Sat Feb 25 12:33:40 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sat, 25 Feb 2017 18:33:40 +0000 Subject: question about identifying CLDR coverage % for Amharic In-Reply-To: References: <49138035-387E-40A4-A7BB-B802AFFC45B4@wenlin.com> <20170224062458.23d13e27@JRWUBU2> <20170224214254.368ca8a7@JRWUBU2> Message-ID: <20170225183340.565ac9c5@JRWUBU2> On Sat, 25 Feb 2017 16:22:49 +0100 Philippe Verdy wrote: > 2017-02-24 22:42 GMT+01:00 Richard Wordingham < > richard.wordingham at ntlworld.com>: > > There is a data synchronisation issue, unfortunately. Is 1800 > > "eighteen hundred" or "one thousand eight hundred"? > Both are valid, but not at the same ranks in term of use, depending > on the context. The former is mostly encountered in dates (for years, > which are not really cardinal quantities, but discrete ordinal > values). But the generic pattern focuses on generic numbers for > quantities (discrete or not). If used with amounts of currencies, > "eithen hundreds dollars" is possible and understood, but rare. True, it's normally "eighteen hundred pounds" in en_GB. -:) More seriously, the relative appears to vary significantly, and it may well be that in the lack of contextual information, en_GB and en_US should perhaps have different spell out rules. The synchronisation required would be between the plural rules and the spell-out rules - unless I've missed a feature in the spell-out rules that would make the plural rules redundant in this case. Richard. From emmo at us.ibm.com Sun Feb 26 21:34:33 2017 From: emmo at us.ibm.com (John Emmons) Date: Sun, 26 Feb 2017 21:34:33 -0600 Subject: question about identifying CLDR coverage % for Amharic In-Reply-To: <20170225183340.565ac9c5@JRWUBU2> References: <49138035-387E-40A4-A7BB-B802AFFC45B4@wenlin.com> <20170224062458.23d13e27@JRWUBU2> <20170224214254.368ca8a7@JRWUBU2> <20170225183340.565ac9c5@JRWUBU2> Message-ID: Both varieties ( one thousand eight hundred ) and ( eighteen hundred ) are supported in CLDR. Use "%spellout-numbering-year" for the latter, while simply "%spellout-numbering" for the former. Regards, John C. Emmons Senior Software Engineer Unicode CLDR TC Vice Chairman IBM Global Foundations Technology Team e-mail: emmo at us.ibm.com From: Richard Wordingham To: cldr-users at unicode.org Date: 02/25/2017 12:36 PM Subject: Re: question about identifying CLDR coverage % for Amharic Sent by: "CLDR-Users" On Sat, 25 Feb 2017 16:22:49 +0100 Philippe Verdy wrote: > 2017-02-24 22:42 GMT+01:00 Richard Wordingham < > richard.wordingham at ntlworld.com>: > > There is a data synchronisation issue, unfortunately. Is 1800 > > "eighteen hundred" or "one thousand eight hundred"? > Both are valid, but not at the same ranks in term of use, depending > on the context. The former is mostly encountered in dates (for years, > which are not really cardinal quantities, but discrete ordinal > values). But the generic pattern focuses on generic numbers for > quantities (discrete or not). If used with amounts of currencies, > "eithen hundreds dollars" is possible and understood, but rare. True, it's normally "eighteen hundred pounds" in en_GB. -:) More seriously, the relative appears to vary significantly, and it may well be that in the lack of contextual information, en_GB and en_US should perhaps have different spell out rules. The synchronisation required would be between the plural rules and the spell-out rules - unless I've missed a feature in the spell-out rules that would make the plural rules redundant in this case. Richard. _______________________________________________ CLDR-Users mailing list CLDR-Users at unicode.org http://unicode.org/mailman/listinfo/cldr-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.muller at efele.net Tue Feb 28 13:04:44 2017 From: eric.muller at efele.net (Eric Muller) Date: Tue, 28 Feb 2017 11:04:44 -0800 Subject: =?UTF-8?Q?Chinese_typography_and_U+FF5E_=ef=bd=9e_FULLWIDTH_TILDE?= Message-ID: CLREQ currently says that U+FF5E ? FULLWIDTH TILDE is prohibited at line start, not prohibited at line end (Appendix A). Its Unicode lb property is ID, which allows this character to be a line start in most cases, and therefore does not satisfy JLREQ. There is no mention of U+301C ? WAVE DASH. JLREQ lists U+301C ? WAVE DASH in cl-03 hyphens, prohibits it at line start, and not at line end (just like CLREQ does for U+FF5E). Its Unicode lb property is NS, which satisfies JLREQ. There is no mention of U+FF5E (JLREQ ignores all fullwidth characters). U+007F TILDE is listed as a western character, proportional. I can think of three solutions: - use U+301C ? WAVE DASH in CLREQ - tailor lb for Chinese to make U+FF5E have lb = NS - just make U+FF5E hae lb = NS In a corpus of ~30K Chinese books, I find 681,803 occurrences of U+FF5E ? FULLWIDTH TILDE, but only 3,258 occurrences of U+301C ? WAVE DASH. It seems to me that Chinese users have voted on U+FF5E, and that the first solution is not viable. I don't see a downside to the third solution, so it is my current best proposal. Other solutions? suggestions? Thanks, Eric.