From emmo at us.ibm.com Thu May 1 11:44:20 2014 From: emmo at us.ibm.com (John Emmons) Date: Thu, 1 May 2014 11:44:20 -0500 Subject: CLDR Survey Tool open for BETA testing. Message-ID: The Unicode CLDR Technical Committee is pleased to announce the opening of the CLDR Survey Tool for beta testing for Version 26 of CLDR, on May 1, 2014. CLDR provides key building blocks for software to support the world's languages. The beta test will give CLDR contributors a chance to try out the new features of the tool, without having to worry about the potential impacts on CLDR itself. If all seems to be going well during the beta test period, we plan to open the survey tool for "official" data submission on or about Thursday, May 8. We plan to allow data submission until June 19, and data vetting until July 3. Version 26 is scheduled to be released in September 2014. Highlights for the CLDR 26 release: - Microsoft has agreed to join the CLDR project as a major contributing partner. - The survey tool user interface has undergone a major overhaul, thanks to the hard work of our friends at Apple. Hopefully, users will find the interface more intuitive and easier to navigate. - Google and IBM have also contributed significantly, especially in the area of improving performance. We have also upgraded our hardware, so we are hoping for less down-time and fewer interruptions to your work. - Many new types of fields and structure, including many additional types of units. - The first version to support the new characters in the Unicode encoding standard, Version 7.0, due for release in July, 2014. The CLDR survey tool can be reached by going to http://unicode.org/cldr/apps/survey . For known issues in the beta version, see Known Issues. Anyone is welcome to try out the tool, although only those with accounts will be able to make changes. To get an account, or if you have forgotten... If you have forgotten your login ID or password, please contact your CLDR TC representative. If you don't belong to a Unicode member organization, and are a native speaker of a language other than American English, you can get an account. Any bugs with the tool can be reported to the CLDR committee by opening a new ticket at http://unicode.org/cldr/trac/newticket Thanks in advance for your participation in the Unicode CLDR project! Regards, John C. Emmons Globalization Architect & Unicode CLDR TC Chairman IBM Software Group Internet: emmo at us.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Thu May 1 18:10:40 2014 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Fri, 2 May 2014 00:10:40 +0100 Subject: [icu-design] CLDR/ICU proposal: collation rules for import only In-Reply-To: References:

Message-ID: <20140502001040.68fd2a38@JRWUBU2> On Wed, 23 Apr 2014 09:01:40 -0700 Markus Scherer wrote: > In CLDR team discussion today we settled on a more obvious, less > "ugly" naming convention, using a two-part type that turns into two > language subtags. > > In CLDR data: > > ... > > > > ... > > which in ICU would turn into > [import ja-u-co-private-kana] I've been struggling to follow this, but should the key word be 'partial' or 'component' rather than 'private'. There's nothing private about the rules one intends to share. Richard. From srl at icu-project.org Thu May 1 18:19:37 2014 From: srl at icu-project.org (Steven R. Loomis) Date: Thu, 01 May 2014 16:19:37 -0700 Subject: [icu-design] CLDR/ICU proposal: collation rules for import only In-Reply-To: <20140502001040.68fd2a38@JRWUBU2> References:

<20140502001040.68fd2a38@JRWUBU2> Message-ID: <5362D689.2000108@icu-project.org> On 01/05/14 16:10, Richard Wordingham wrote: > On Wed, 23 Apr 2014 09:01:40 -0700 > Markus Scherer wrote: > >> In CLDR team discussion today we settled on a more obvious, less >> "ugly" naming convention, using a two-part type that turns into two >> language subtags. >> >> In CLDR data: >> >> ... >> >> >> >> ... >> >> which in ICU would turn into >> [import ja-u-co-private-kana] > I've been struggling to follow this, but should the key word be > 'partial' or 'component' rather than 'private'. There's nothing private > about the rules one intends to share. But these aren't shared with 'real users'. You don't sit in front of a CLDR-compliant machine and get "ja-u-co-private-kana" in your pop-up list of collators. It's an implementation detail of other collators. -s -- IBMer but all opinions are mine. https://www.ohloh.net/accounts/srl295 // fingerprint @ https://ssl.icu-project.org/trac/wiki/Srl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 555 bytes Desc: OpenPGP digital signature URL: From richard.wordingham at ntlworld.com Thu May 1 20:05:15 2014 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Fri, 2 May 2014 02:05:15 +0100 Subject: [icu-design] CLDR/ICU proposal: collation rules for import only In-Reply-To: <5362D689.2000108@icu-project.org> References:

<20140502001040.68fd2a38@JRWUBU2> <5362D689.2000108@icu-project.org> Message-ID: <20140502020515.5d0193e1@JRWUBU2> On Thu, 01 May 2014 16:19:37 -0700 "Steven R. Loomis" wrote: > On 01/05/14 16:10, Richard Wordingham wrote: > > On Wed, 23 Apr 2014 09:01:40 -0700 > > Markus Scherer wrote: > >> which in ICU would turn into > >> [import ja-u-co-private-kana] > > I've been struggling to follow this, but should the key word be > > 'partial' or 'component' rather than 'private'. There's nothing > > private about the rules one intends to share. > But these aren't shared with 'real users'. You don't sit in front of a > CLDR-compliant machine and get "ja-u-co-private-kana" in your pop-up > list of collators. It's an implementation detail of other collators. I was thinking more along the lines of asking for something like de-u-co-rl-akk-private-assyrlog to get Latin script sorted by German rules and cuneiform by Assyriological order, but I see there is nothing like a key 'rl' to import a set of collation rules. Richard. From aleksandr.andreev at gmail.com Fri May 2 02:51:11 2014 From: aleksandr.andreev at gmail.com (Aleksandr Andreev) Date: Fri, 2 May 2014 11:51:11 +0400 Subject: Adding number system Message-ID: Dear list members, Question: how does one add a new numbering system to the CLDR? Should I file a ticket? I also noticed here: http://cldr.unicode.org/translation/numbering-systems that numbering systems may be "algorithmic". Are the algorithms themselves described in the CLDR? Cordially, Aleksandr From mark at macchiato.com Fri May 2 03:14:18 2014 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Fri, 2 May 2014 10:14:18 +0200 Subject: Adding number system In-Reply-To: References: Message-ID: +unicode@ On 2 May 2014 09:51, Aleksandr Andreev wrote: > Dear list members, > > Question: how does one add a new numbering system to the CLDR? Should > I file a ticket? > ?Yes, and please provide supporting information about usage and identification (see below). ? > > I also noticed here: > > http://cldr.unicode.org/translation/numbering-systems > > that numbering systems may be "algorithmic". Are the algorithms > themselves described in the CLDR? > ?No, only enough information is provided by CLDR so as to uniquely identify the numbering system, not to provide a complete specification of the exact behavior. (The same is true of calendar systems.) ? > > Cordially, > > Aleksandr > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Fri May 2 10:14:39 2014 From: doug at ewellic.org (Doug Ewell) Date: Fri, 02 May 2014 08:14:39 -0700 Subject: [icu-design] CLDR/ICU proposal: collation rules for import only Message-ID: <20140502081439.665a7a7059d7ee80bb4d670165c8327d.ebfe49b895.wbe@email03.secureserver.net> Richard Wordingham wrote: >> which in ICU would turn into >> [import ja-u-co-private-kana] > > I've been struggling to follow this, but should the key word be > 'partial' or 'component' rather than 'private'. There's nothing > private about the rules one intends to share. In standards like these, "private" doesn't mean "secret." It means "not defined by the standard, but by end users, using a mechanism built into the standard for that purpose." Private agreements can be distributed very publicly. See, for example, the Unicode Private Use Area and the code elements in ISO 15924 that are "reserved for private use." -- Doug Ewell | Thornton, CO, USA http://ewellic.org | @DougEwell From richard.wordingham at ntlworld.com Fri May 2 12:03:35 2014 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Fri, 2 May 2014 18:03:35 +0100 Subject: [icu-design] CLDR/ICU proposal: collation rules for import only In-Reply-To: <20140502081439.665a7a7059d7ee80bb4d670165c8327d.ebfe49b895.wbe@email03.secureserver.net> References: <20140502081439.665a7a7059d7ee80bb4d670165c8327d.ebfe49b895.wbe@email03.secureserver.net> Message-ID: <20140502180335.3df7ff76@JRWUBU2> On Fri, 02 May 2014 08:14:39 -0700 "Doug Ewell" wrote: > Richard Wordingham wrote: > > >> which in ICU would turn into > >> [import ja-u-co-private-kana] > > > > I've been struggling to follow this, but should the key word be > > 'partial' or 'component' rather than 'private'. There's nothing > > private about the rules one intends to share. > > In standards like these, "private" doesn't mean "secret." It means > "not defined by the standard, but by end users, using a mechanism > built into the standard for that purpose." Private agreements can be > distributed very publicly. Are these then meant to be sets of rules that will not be stable from one issue of CLDR to the next? My example of a German Assyriological collation was not mean to be flippant, though it is seems to be prohibited from being a variant in CLDR - http://cldr.unicode.org/index/cldr-spec/collation-guidelines says, "The CLDR goals are to match the sorting of exemplar letters and common punctuation and leave everything else to the standard UCA ordering". Steven Loomis suggested a very different meaning of 'private' - 'not accessible to end users'. > See, for example, the Unicode Private Use Area and the code elements > in ISO 15924 that are "reserved for private use." End user control of the PUA is generally pretty feeble, though some components provide the ability to implement end user control. Richard. From markus.icu at gmail.com Fri May 2 13:10:57 2014 From: markus.icu at gmail.com (Markus Scherer) Date: Fri, 2 May 2014 11:10:57 -0700 Subject: [icu-design] CLDR/ICU proposal: collation rules for import only In-Reply-To: <20140502180335.3df7ff76@JRWUBU2> References: <20140502081439.665a7a7059d7ee80bb4d670165c8327d.ebfe49b895.wbe@email03.secureserver.net> <20140502180335.3df7ff76@JRWUBU2> Message-ID: On Fri, May 2, 2014 at 10:03 AM, Richard Wordingham < richard.wordingham at ntlworld.com> wrote: > Are these then meant to be sets of rules that will not be stable from > one issue of CLDR to the next? Unrelated. We simply want to import partial rules into multiple real tailorings, (a) to avoid duplication and simplify maintenance and (b) save a bit of space in the rule strings, for implementations that store them. For example, http://unicode.org/cldr/trac/browser/trunk/common/collation/ja.xml has two tailorings with the same rules for Kana. We want to pull the Kana rules out into a "private" set of rules that is then imported into those two tailorings. markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From duerst at it.aoyama.ac.jp Sat May 3 03:27:02 2014 From: duerst at it.aoyama.ac.jp (=?UTF-8?B?Ik1hcnRpbiBKLiBEw7xyc3Qi?=) Date: Sat, 03 May 2014 17:27:02 +0900 Subject: [icu-design] CLDR/ICU proposal: collation rules for import only In-Reply-To: References: <20140502081439.665a7a7059d7ee80bb4d670165c8327d.ebfe49b895.wbe@email03.secureserver.net> <20140502180335.3df7ff76@JRWUBU2> Message-ID: <5364A856.3000302@it.aoyama.ac.jp> If the rules are partial, why not call them partial, rather than private? Regards, Martin. On 2014/05/03 03:10, Markus Scherer wrote: > On Fri, May 2, 2014 at 10:03 AM, Richard Wordingham < > richard.wordingham at ntlworld.com> wrote: > >> Are these then meant to be sets of rules that will not be stable from >> one issue of CLDR to the next? > > > Unrelated. > > We simply want to import partial rules into multiple real tailorings, (a) > to avoid duplication and simplify maintenance and (b) save a bit of space > in the rule strings, for implementations that store them. > > For example, > http://unicode.org/cldr/trac/browser/trunk/common/collation/ja.xml has two > tailorings with the same rules for Kana. We want to pull the Kana rules out > into a "private" set of rules that is then imported into those two > tailorings. > > markus > > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > From kent.karlsson14 at telia.com Sat May 3 11:59:04 2014 From: kent.karlsson14 at telia.com (Kent Karlsson) Date: Sat, 03 May 2014 18:59:04 +0200 Subject: [icu-design] CLDR/ICU proposal: collation rules for import only In-Reply-To: <20140502081439.665a7a7059d7ee80bb4d670165c8327d.ebfe49b895.wbe@email03.secureserver.net> Message-ID: Den 2014-05-02 17:14, skrev "Doug Ewell" : > In standards like these, "private" doesn't mean "secret." It means "not > defined by the standard, but by end users, using a mechanism built into > the standard for that purpose." Private agreements can be distributed > very publicly. > > See, for example, the Unicode Private Use Area and the code elements in > ISO 15924 that are "reserved for private use." Well, not in this case. For the collation rules (as suggested in this thread) or the RBNF rules, "private" means "internal" in about the same way as "methods" can be "private" (internal to a class) in C++, Java and several other programming languages. They are still not "secret", but is is not "private" in the sense as used in various "tagging" standards (where it is "private-use" rather than "private"). /Kent K From ga_murr at yahoo.com Thu May 8 11:34:26 2014 From: ga_murr at yahoo.com (Georges MURR) Date: Thu, 8 May 2014 09:34:26 -0700 (PDT) Subject: currency format for ar locale Message-ID: <1399566866.81327.YahooMailNeo@web140202.mail.bf1.yahoo.com> Hi, The currency format for ar locale is defined as ??#,##0.00 which is in visual order. Usually application take the input in logical order and do the necessary processing before rendering the data. For example If I pass a string to the browser based on this pattern, the output is incorrect. The currency symbol is displayed to the right of the number instead of the left. Any special reason why the pattern is defined this way? Thanks.? -------------- next part -------------- An HTML attachment was scrubbed... URL: From emmo at us.ibm.com Mon May 12 22:02:44 2014 From: emmo at us.ibm.com (John Emmons) Date: Mon, 12 May 2014 22:02:44 -0500 Subject: CLDR v26 open for data submission. Message-ID: The Unicode CLDR Technical Committee is pleased to announce the opening of the CLDR Survey Tool for data submission for Version 26 of CLDR, on May 13, 2014. We plan to allow data submission until June 19, and vetting of the submitted data until July 3. Version 26 is scheduled to be released in September 2014. CLDR provides key building blocks for software to support the world's languages, and is used by much of the world?s software. Highlights for the CLDR 26 release are: Microsoft has joined the CLDR project as a major contributing partner. The survey tool user interface has undergone a major overhaul, thanks to the hard work of our friends at Apple. Google and IBM have focused on performance of the software, and we?ve also upgraded our hardware. New types of fields and structure are added, including many additional types of units. The new characters in the Unicode encoding standard (Version 7.0, due for release in July, 2014) are supported. The CLDR survey tool can be reached by going to http://st.unicode.org/cldr-apps/survey. To view known issues with the tool, see the Known Issues page at http://cldr.unicode.org/index/survey-tool/known-bugs. For example, we are still putting some finishing touches on some of the survey tool documentation. Anyone is welcome to try out the tool, although only those with accounts will be able to make changes. To get an account, or if you have forgotten your login ID or password, please contact your CLDR TC representative. If you don't belong to a Unicode member organization, and are a native speaker of a language other than American English, you can obtain a guest account. Any bugs with the tool can be reported to the CLDR committee by opening a New CLDR Ticket at http://unicode.org/cldr/trac/newticket Thanks in advance for your participation in the Unicode CLDR project! Regards, John C. Emmons Globalization Architect & Unicode CLDR TC Chairman IBM Software Group Internet: emmo at us.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From franklinwhale at hotmail.com Thu May 15 06:32:11 2014 From: franklinwhale at hotmail.com (Franklin Tse) Date: Thu, 15 May 2014 19:32:11 +0800 Subject: Gregorian Date & Time Formats of en-HK Message-ID: Hi all, I am writing to propose some changes to the "Approved" Gregorian date and time formats in the English (Hong Kong SAR) [en-HK] locale: 1. Short Date Format Currently, the approved short date format is "d/M/yy", which has 2-digit year format. However, 2-digit year format is not common in Hong Kong after Year 2000. We use 4-digit year instead. Therefore, I suggest that be changed to "d/M/y". 2. Comma between Year and English Month Currently, the approved values of medium, long and full date formats have a comma between Year (Y) and the English month (MMM/MMMM). This is not common in Hong Kong. We simply use "d MMM y" or "d MMMM y". I suggest those values be changed too. 3. Flexible Date Formats and Intervals Date Formats I suggest that the values in the flexible date formats and intervals date formats be modified to match with the changes in #1 and #2. I have already added the values to the Survey Tool at http://st.unicode.org/cldr-apps/v#/en_HK/Gregorian/22b38b49476d5bfd and hope that people in Hong Kong or familiar with the locale can help to vote for the changes. Thanks! Regards, Franklin Tse From fios at foramnagaidhlig.net Thu May 15 10:17:57 2014 From: fios at foramnagaidhlig.net (=?ISO-8859-1?Q?F=F2ram_na_G=E0idhlig?=) Date: Thu, 15 May 2014 16:17:57 +0100 Subject: Getting entries approved for minority languages Message-ID: <5374DAA5.9040103@foramnagaidhlig.net> Hi all, I have started setting up a locale for Scottish Gaelic (gd) and have been busy on the Survey Tool this week. Whenever I change a term, the Survey Tool tells me "Changes to this item require 4 votes." I now fear that all my work might just go down the drain, because I don't know where to get 4 more voters from. It is good QA for big languages to require 5 people to approve a term, but a huge problem for us, because we won't be getting any changes in this way, ever. There are only 2 people in the world who localize pro bono into Scottish Gaelic, and I can't expect the other localizer to proofread the complete locale. So, even if we had 100% of available localizers working on this, we would still be 3 votes short. My fellow localizer has actually written an article about this problem: http://akerbeltzalba.wordpress.com/2014/01/29/when-peer-review-goes-pear-shaped/ So, could we please make an exception for my language and simply get any winning items in, even if they only have 1 vote from me? Does anybody else on this list have the same problem, and how did you manage to solve it? Moreover, I have found a few already approved entries that will be even harder to change. We do need to change some of them though; I have already found a grammar error in one of them that I can't imagine would have slipped past 5 reviewers. So, no idea how those got in. From emmo at us.ibm.com Thu May 15 11:36:29 2014 From: emmo at us.ibm.com (John Emmons) Date: Thu, 15 May 2014 11:36:29 -0500 Subject: Getting entries approved for minority languages In-Reply-To: <5374DAA5.9040103@foramnagaidhlig.net> References: <5374DAA5.9040103@foramnagaidhlig.net> Message-ID: People in this situation can contact me directly to request moving their voting status from "guest" to "vetter". I will review these on a case by case basis... Regards, John C. Emmons Globalization Architect & Unicode CLDR TC Chairman IBM Software Group Internet: emmo at us.ibm.com From: F?ram na G?idhlig To: cldr-users at unicode.org, Date: 05/15/2014 11:10 AM Subject: Getting entries approved for minority languages Sent by: "CLDR-Users" Hi all, I have started setting up a locale for Scottish Gaelic (gd) and have been busy on the Survey Tool this week. Whenever I change a term, the Survey Tool tells me "Changes to this item require 4 votes." I now fear that all my work might just go down the drain, because I don't know where to get 4 more voters from. It is good QA for big languages to require 5 people to approve a term, but a huge problem for us, because we won't be getting any changes in this way, ever. There are only 2 people in the world who localize pro bono into Scottish Gaelic, and I can't expect the other localizer to proofread the complete locale. So, even if we had 100% of available localizers working on this, we would still be 3 votes short. My fellow localizer has actually written an article about this problem: http://akerbeltzalba.wordpress.com/2014/01/29/when-peer-review-goes-pear-shaped/ So, could we please make an exception for my language and simply get any winning items in, even if they only have 1 vote from me? Does anybody else on this list have the same problem, and how did you manage to solve it? Moreover, I have found a few already approved entries that will be even harder to change. We do need to change some of them though; I have already found a grammar error in one of them that I can't imagine would have slipped past 5 reviewers. So, no idea how those got in. _______________________________________________ CLDR-Users mailing list CLDR-Users at unicode.org http://unicode.org/mailman/listinfo/cldr-users -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From mark at macchiato.com Thu May 15 12:07:28 2014 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Thu, 15 May 2014 10:07:28 -0700 Subject: Getting entries approved for minority languages In-Reply-To: <5374DAA5.9040103@foramnagaidhlig.net> References: <5374DAA5.9040103@foramnagaidhlig.net> Message-ID: We realize that the process for "long-tail" languages is a problem; the committee is engaged at looking into the issue so that we can get the data confirmed. There are several ways to handle it, and we are scheduled to work through it at next week's meeting. Mark *? Il meglio ? l?inimico del bene ?* On Thu, May 15, 2014 at 8:17 AM, F?ram na G?idhlig wrote: > Hi all, > > I have started setting up a locale for Scottish Gaelic (gd) and have > been busy on the Survey Tool this week. Whenever I change a term, the > Survey Tool tells me "Changes to this item require 4 votes." I now fear > that all my work might just go down the drain, because I don't know > where to get 4 more voters from. > > It is good QA for big languages to require 5 people to approve a term, > but a huge problem for us, because we won't be getting any changes in > this way, ever. There are only 2 people in the world who localize pro > bono into Scottish Gaelic, and I can't expect the other localizer to > proofread the complete locale. So, even if we had 100% of available > localizers working on this, we would still be 3 votes short. > > My fellow localizer has actually written an article about this problem: > > > http://akerbeltzalba.wordpress.com/2014/01/29/when-peer-review-goes-pear-shaped/ > > So, could we please make an exception for my language and simply get any > winning items in, even if they only have 1 vote from me? > > Does anybody else on this list have the same problem, and how did you > manage to solve it? > > Moreover, I have found a few already approved entries that will be even > harder to change. We do need to change some of them though; I have > already found a grammar error in one of them that I can't imagine would > have slipped past 5 reviewers. So, no idea how those got in. > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dzo at bisharat.net Thu May 15 12:32:21 2014 From: dzo at bisharat.net (dzo at bisharat.net) Date: Thu, 15 May 2014 17:32:21 +0000 Subject: Getting entries approved for minority languages In-Reply-To: References: <5374DAA5.9040103@foramnagaidhlig.net> Message-ID: <1914071177-1400175144-cardhu_decombobulator_blackberry.rim.net-218298030-@b2.c10.bise6.blackberry> I haven't been watching locale issues closely for a while, but this sort of situation seems very relevant to a lot of languages in Africa, and more broadly, a lot of "less-resourced" and less widely spoken languages worldwide. BTW, we lack a good term for these languages, but "long tail" languages seems useful. Don Osborn Sent via BlackBerry by AT&T -----Original Message----- From: Mark Davis Sender: "CLDR-Users" Date: Thu, 15 May 2014 10:07:28 To: F??ram na G??idhlig Cc: cldr-users at unicode.org Subject: Re: Getting entries approved for minority languages _______________________________________________ CLDR-Users mailing list CLDR-Users at unicode.org http://unicode.org/mailman/listinfo/cldr-users From mark at macchiato.com Thu May 15 12:38:05 2014 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Thu, 15 May 2014 10:38:05 -0700 Subject: Getting entries approved for minority languages In-Reply-To: <1914071177-1400175144-cardhu_decombobulator_blackberry.rim.net-218298030-@b2.c10.bise6.blackberry> References: <5374DAA5.9040103@foramnagaidhlig.net> <1914071177-1400175144-cardhu_decombobulator_blackberry.rim.net-218298030-@b2.c10.bise6.blackberry> Message-ID: > this sort of situation seems very relevant to a lot of languages in Africa, and more broadly, a lot of "less-resourced" and less widely spoken languages worldwide. Yes, it is; and we do want to encourage the development of resources for these languages, which can involve a somewhat different process than for "short-tail" languages. > BTW, we lack a good term for these languages, but "long tail" languages seems useful. We've started using that, because it is difficult to come up with another good, short, descriptive name. Mark *? Il meglio ? l?inimico del bene ?* On Thu, May 15, 2014 at 10:32 AM, wrote: > I haven't been watching locale issues closely for a while, but this sort > of situation seems very relevant to a lot of languages in Africa, and more > broadly, a lot of "less-resourced" and less widely spoken languages > worldwide. > > BTW, we lack a good term for these languages, but "long tail" languages > seems useful. > > Don Osborn > > > Sent via BlackBerry by AT&T > > -----Original Message----- > From: Mark Davis > Sender: "CLDR-Users" Date: Thu, 15 May > 2014 10:07:28 > To: F??ram na G? idhlig > Cc: cldr-users at unicode.org > Subject: Re: Getting entries approved for minority languages > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Thu May 15 12:46:24 2014 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Thu, 15 May 2014 10:46:24 -0700 Subject: Gregorian Date & Time Formats of en-HK In-Reply-To: References: Message-ID: On Thu, May 15, 2014 at 4:32 AM, Franklin Tse wrote: > I am writing to propose some changes to the "Approved" Gregorian date and > time formats in the English (Hong Kong SAR) [en-HK] locale: > > 1. Short Date Format > > Currently, the approved short date format is "d/M/yy", which has 2-digit > year format. However, 2-digit year format is not common in Hong Kong after > Year 2000. We use 4-digit year instead. Therefore, I suggest that be > changed to "d/M/y". > > 2. Comma between Year and English Month > > Currently, the approved values of medium, long and full date formats have > a comma between Year (Y) and the English month (MMM/MMMM). This is not > common in Hong Kong. We simply use "d MMM y" or "d MMMM y". I suggest those > values be changed too. > > 3. Flexible Date Formats and Intervals Date Formats > > I suggest that the values in the flexible date formats and intervals date > formats be modified to match with the changes in #1 and #2. > > I have already added the values to the Survey Tool at > http://st.unicode.org/cldr-apps/v#/en_HK/Gregorian/22b38b49476d5bfd and > hope that people in Hong Kong or familiar with the locale can help to vote > for the changes. > Best to put requests like this into the forum in the Survey tool, rather than on this mailing list.? You only want to do them here if you don't get responses on the forum. Mark *? Il meglio ? l?inimico del bene ?* -------------- next part -------------- An HTML attachment was scrubbed... URL: From skeet at pobox.com Thu May 15 12:49:52 2014 From: skeet at pobox.com (Jon Skeet) Date: Thu, 15 May 2014 18:49:52 +0100 Subject: Should yeartype be a distinguishing attribute? Message-ID: I'm fairly new to CLDR and LDML in general, but I'm investigating the Hebrew calendar for the sake of interest. My understanding is that any attribute not mentioned in any distinguishingItems element is *not* distinguishing. Therefore, in CLDRv25, yeartype is not a distinguishing attribute. However, given its *usage*, I'd expect it to be a distinguishing attribute. For example, in root.xml, under //ldml/dates/calendars/calendar[@type='hebrew']/months/monthContext[@type='format']/monthWidth[@type='wide'] we have: Adar Adar II If yeartype is non-distinguishing, that would lead to two values with the same element chain, which is forbidden. Please could someone enlighten me as to whether this is a data problem (e.g. fixed by adding yeartype to the list of distinguishing attributes) or a problem with my understanding of distinguishing attributes? Many thanks, Jon Skeet -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Thu May 15 13:45:56 2014 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Thu, 15 May 2014 20:45:56 +0200 Subject: Should yeartype be a distinguishing attribute? In-Reply-To: References: Message-ID: The problem is that the "year type" is the same during the two distinct month Adar and Adar II because they occur in the *same year*: for me it should better be or "7b" to exhivit that it is effectively a distinct month, if you don't want to change the month numbering after it The alternative being to only include "Adar" as a single month, and adjusting this in the day number. Or create a separate field between the month and the day number, for the occurence number (empty, 1, or 2): in most calendars and most months of calendars needing it, this field will have an empty value. 2014-05-15 19:49 GMT+02:00 Jon Skeet : > I'm fairly new to CLDR and LDML in general, but I'm investigating the > Hebrew calendar for the sake of interest. > > My understanding is that any attribute not mentioned in any > distinguishingItems element is *not* distinguishing. Therefore, in > CLDRv25, yeartype is not a distinguishing attribute. > > However, given its *usage*, I'd expect it to be a distinguishing > attribute. For example, in root.xml, under > //ldml/dates/calendars/calendar[@type='hebrew']/months/monthContext[@type='format']/monthWidth[@type='wide'] > we have: > > Adar > Adar II > > If yeartype is non-distinguishing, that would lead to two values with the > same element chain, which is forbidden. > > Please could someone enlighten me as to whether this is a data problem > (e.g. fixed by adding yeartype to the list of distinguishing attributes) or > a problem with my understanding of distinguishing attributes? > > Many thanks, > Jon Skeet > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Thu May 15 18:25:28 2014 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Thu, 15 May 2014 16:25:28 -0700 Subject: Should yeartype be a distinguishing attribute? In-Reply-To: References: Message-ID: "yearType" is already a distinguishing element, in the internal tooling. The problem is that the internal tooling (in CLDRFile.java) is more complete than the supplementalMetadata.xml file, which only has a very small subset of the real items. Jon, I suggest that you file a bug to move the data for the distinguishing elements from CLDRFile to supplementalMetadata.xml. Mark *? Il meglio ? l?inimico del bene ?* On Thu, May 15, 2014 at 11:45 AM, Philippe Verdy wrote: > The problem is that the "year type" is the same during the two distinct > month Adar and Adar II because they occur in the *same year*: > for me it should better be or "7b" to exhivit that it > is effectively a distinct month, if you don't want to change the month > numbering after it > > The alternative being to only include "Adar" as a single month, and > adjusting this in the day number. Or create a separate field between the > month and the day number, for the occurence number (empty, 1, or 2): in > most calendars and most months of calendars needing it, this field will > have an empty value. > > > > > 2014-05-15 19:49 GMT+02:00 Jon Skeet : > >> I'm fairly new to CLDR and LDML in general, but I'm investigating the >> Hebrew calendar for the sake of interest. >> >> My understanding is that any attribute not mentioned in any >> distinguishingItems element is *not* distinguishing. Therefore, in >> CLDRv25, yeartype is not a distinguishing attribute. >> >> However, given its *usage*, I'd expect it to be a distinguishing >> attribute. For example, in root.xml, under >> //ldml/dates/calendars/calendar[@type='hebrew']/months/monthContext[@type='format']/monthWidth[@type='wide'] >> we have: >> >> Adar >> Adar II >> >> If yeartype is non-distinguishing, that would lead to two values with the >> same element chain, which is forbidden. >> >> Please could someone enlighten me as to whether this is a data problem >> (e.g. fixed by adding yeartype to the list of distinguishing attributes) or >> a problem with my understanding of distinguishing attributes? >> >> Many thanks, >> Jon Skeet >> >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> >> > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Thu May 15 22:19:33 2014 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Fri, 16 May 2014 05:19:33 +0200 Subject: Getting entries approved for minority languages In-Reply-To: <1914071177-1400175144-cardhu_decombobulator_blackberry.rim.net-218298030-@b2.c10.bise6.blackberry> References: <5374DAA5.9040103@foramnagaidhlig.net> <1914071177-1400175144-cardhu_decombobulator_blackberry.rim.net-218298030-@b2.c10.bise6.blackberry> Message-ID: May be it's possible to - adjust the voting threshold according to the number of participants - reduce the vetting score for major companies (like Google, IBM, Apple, Oracle, SAP, Microsoft/Nokia, Facebook, Twitter, Mozilla Foundation, Launchpad, Wikimedia Translate.net, the FSF translators list, Samsung, HTC..., or even national linguistic institutes and libraries and national standard bodies, or gaming developement companies, or manufacturers of various automated domestic appliances), that still have not enough time to inverst in those minority languages with a confirmed interest and activity to these languages, even if they are full CLDR TC members. Note also that their interest may not be on the whole comprehensive dataset, but only on some core data (or just the "basic" or "modern" coverages; for example they will not need to include all possible calendars and onlya subset of date and number formats). This way those languages can have a possible start even with small participation (this won't hurt the business of CLDR TC members that have still no specific interest in those languages, they are not required to provide these CLDR data wit htheir products, or can provide them provisionally by a specific installation option). If there are errors that need correction, more people will join the program to paraticipate in the next release. This will help bootstart these languages, increase the number of users of the published data, and finally will increase the level of particpation of "major players" that will add some more of them in their monitored data, and when this will occur, the betting thresholds will be raised a bit. 2014-05-15 19:32 GMT+02:00 : > I haven't been watching locale issues closely for a while, but this sort > of situation seems very relevant to a lot of languages in Africa, and more > broadly, a lot of "less-resourced" and less widely spoken languages > worldwide. > > BTW, we lack a good term for these languages, but "long tail" languages > seems useful. > > Don Osborn > > > Sent via BlackBerry by AT&T > > -----Original Message----- > From: Mark Davis > Sender: "CLDR-Users" Date: Thu, 15 May > 2014 10:07:28 > To: F??ram na G? idhlig > Cc: cldr-users at unicode.org > Subject: Re: Getting entries approved for minority languages > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From srl at icu-project.org Thu May 15 23:53:30 2014 From: srl at icu-project.org (Steven R. Loomis) Date: Thu, 15 May 2014 21:53:30 -0700 Subject: Getting entries approved for minority languages In-Reply-To: References: <5374DAA5.9040103@foramnagaidhlig.net> <1914071177-1400175144-cardhu_decombobulator_blackberry.rim.net-218298030-@b2.c10.bise6.blackberry> Message-ID: <537599CA.700@icu-project.org> On 05/15/2014 08:19 PM, Philippe Verdy wrote: > May be it's possible to > - adjust the voting threshold according to the number of participants I think that's basically what's done on a manual basis such as in the case of Scottish Gaelic. I don't think it would be done on an automated basis. > - reduce the vetting score for major companies (like Google, IBM, > Apple, Oracle, SAP, Microsoft/Nokia, Facebook, Twitter, Mozilla > Foundation, Launchpad, Wikimedia Translate.net, the FSF translators > list, Samsung, HTC..., or even national linguistic institutes and > libraries and national standard bodies, or gaming developement > companies, or manufacturers of various automated domestic appliances), > that still have not enough time to inverst in those minority languages > with a confirmed interest and activity to these languages, even if > they are full CLDR TC members. Note also that their interest may not > be on the whole comprehensive dataset, but only on some core data (or > just the "basic" or "modern" coverages; for example they will not need > to include all possible calendars and onlya subset of date and number > formats). > > This way those languages can have a possible start even with small > participation (this won't hurt the business of CLDR TC members that > have still no specific interest in those languages, they are not > required to provide these CLDR data wit htheir products, or can > provide them provisionally by a specific installation option). > > If there are errors that need correction, more people will join the > program to paraticipate in the next release. This will help bootstart > these languages, increase the number of users of the published data, > and finally will increase the level of particpation of "major players" > that will add some more of them in their monitored data, and when this > will occur, the betting thresholds will be raised a bit. I don't see why any such changes need to be done preemptively, though. As long as it is understood that votes don't go to waste, just log in and vote as much as you can. Saying, "See, I've contributed this data and I need X" makes more sense than changing the rules ahead of time, without knowing what the participation actually will be. Perhaps something such as, the first time you cast a vote that doesn't win or the first time you encounter ("Changes to this item require 4 votes." ) a message comes up that explains the process, explains why signing up more people doesn't help, and basically says, send us a mail/file a bug if you are stuck (just as F?ram did.) And F?ram, by the way, "needs 4 votes" doesn't mean it needs four other people, it's a vote weighting. Steven > > 2014-05-15 19:32 GMT+02:00 >: > > I haven't been watching locale issues closely for a while, but > this sort of situation seems very relevant to a lot of languages > in Africa, and more broadly, a lot of "less-resourced" and less > widely spoken languages worldwide. > > BTW, we lack a good term for these languages, but "long tail" > languages seems useful. > > Don Osborn > > > Sent via BlackBerry by AT&T > > -----Original Message----- > From: Mark Davis > > Sender: "CLDR-Users"

>Date: Thu, 15 May 2014 > 10:07:28 > To: F??ram na G? idhlig > > Cc: cldr-users at unicode.org >

> > Subject: Re: Getting entries approved for minority languages > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users -- IBMer but all opinions are mine. https://www.ohloh.net/accounts/srl295 // fingerprint @ https://ssl.icu-project.org/trac/wiki/Srl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 555 bytes Desc: OpenPGP digital signature URL: From fios at foramnagaidhlig.net Fri May 16 00:41:29 2014 From: fios at foramnagaidhlig.net (=?ISO-8859-1?Q?F=F2ram_na_G=E0idhlig?=) Date: Fri, 16 May 2014 06:41:29 +0100 Subject: Getting entries approved for minority languages In-Reply-To: <537599CA.700@icu-project.org> References: <5374DAA5.9040103@foramnagaidhlig.net> <1914071177-1400175144-cardhu_decombobulator_blackberry.rim.net-218298030-@b2.c10.bise6.blackberry> <537599CA.700@icu-project.org> Message-ID: <5375A509.400@foramnagaidhlig.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > On 05/15/2014 08:19 PM, Philippe Verdy wrote: >> May be it's possible to - adjust the voting threshold according >> to the number of participants > I think that's basically what's done on a manual basis such as in > the case of Scottish Gaelic. I don't think it would be done on an > automated basis. As long as you have enough man power to do so, manual is the ticket. After I asked, I got pointed to this mailing list, and after posting my problem got resolved really fast. >> - reduce the vetting score for major companies (like Google, IBM, >> Apple, Oracle, SAP, Microsoft/Nokia, Facebook, Twitter, Mozilla >> Foundation, Launchpad, Wikimedia Translate.net, the FSF >> translators list, Samsung, HTC..., or even national linguistic >> institutes and libraries and national standard bodies, or gaming >> developement companies, or manufacturers of various automated >> domestic appliances), that still have not enough time to inverst >> in those minority languages with a confirmed interest and >> activity to these languages, even if they are full CLDR TC >> members. Note also that their interest may not be on the whole >> comprehensive dataset, but only on some core data (or just the >> "basic" or "modern" coverages; for example they will not need to >> include all possible calendars and onlya subset of date and >> number formats). >> >> This way those languages can have a possible start even with >> small participation (this won't hurt the business of CLDR TC >> members that have still no specific interest in those languages, >> they are not required to provide these CLDR data wit htheir >> products, or can provide them provisionally by a specific >> installation option). >> >> If there are errors that need correction, more people will join >> the program to paraticipate in the next release. This will help >> bootstart these languages, increase the number of users of the >> published data, and finally will increase the level of >> particpation of "major players" that will add some more of them >> in their monitored data, and when this will occur, the betting >> thresholds will be raised a bit. More major player don't necessarily means more localizers. For my language, no matter which translation agency you contact, things will eventually end up wth our team, because we generally don't nave enough translators to go around, and localizing is a special skill set as well. So, there are no more localizers available. I think it will be similar for many long tail languages, because the speakers tend to live in economically deprived areas and/or to have relatively few speakers. > I don't see why any such changes need to be done preemptively, > though. As long as it is understood that votes don't go to waste, > just log in and vote as much as you can. Saying, "See, I've > contributed this data and I need X" makes more sense than changing > the rules ahead of time, without knowing what the participation > actually will be. > > Perhaps something such as, the first time you cast a vote that > doesn't win or the first time you encounter ("Changes to this item > require 4 votes." ) a message comes up that explains the process, > explains why signing up more people doesn't help, and basically > says, send us a mail/file a bug if you are stuck (just as F?ram > did.) I think it's a good idea to have a message, or to put it in the instructions somewhere. Seeing the person has already contributed is also a good thing, so you get an idea that they are willing and able to deliver something usable. Maybe we could have an official threshold that should be completed first for new locales, say, the minimal data set? For locales already worked on like mine, such a threshold wouldn't be easi to define - we could have a more loose criterium here - - if we see you're putting in an effort, ask us for more voting power. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (MingW32) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJTdaUIAAoJEFBz9PVwa++TbwIP/iqtJ/32iNBkBwuSShYgRibd zMfBjjikYwTTvL3UvzdpayKLG/TeeyCjemnR7hLY3FKKAPswtrgZ+FmeWjCl1Zmq jMXnEDapwPVJ24xyXkWaYBClXyXK0N4IiPdLPHqA6R8BxSXNUHeq/rzRMf6FbXdQ VKa2VT/4yEqGxk6FbCLkMAqzJUzWxAxQ4ReU2+ABpdoi3ANCK7AqlH04v3XRjpJO 5t0bmMixfgLrzdCTVEgdPUSBRUw9amlJfXescagHP6ihCaxG8V+bHX+wkoRBpipG 7K67vVmVuRsncdcs/BDJldAaNcShJpAaWElYpxjwga8CsI7pyvQ7lUx40dQIiFuR 4A1lulqupv6ZD0ibal4BcpxHCYe2C9C6dRHerp6xUibocVZnN1EZBvkHuK/G8Sfr svBvJ/4O+XYxl8z0AcCHVSRaZcKPjqwRGn1MsbBeVwr1RT58v3PoUVEYxjt4bWlY FItS1xHfUfF/b+SLgBcvUWIvHX9wbeqyG69pfAMFtHnT/zDS7EjwKQxWPyX9an4a AuUUEcZJ/2798nkcWTU4V46z84pmzlBRZnYqjwh2DY8EpfE8qKAqUnVsojEVHQxo A3496wg98F1u5y4Gbu0zVGvVKVGwg4PFJJ48twTSMUwaAPr+DxgXhSeDtVu4zC5S GdAHK8pwFr5ObAlmM4IA =luTl -----END PGP SIGNATURE----- From franklinwhale at hotmail.com Fri May 16 06:38:58 2014 From: franklinwhale at hotmail.com (Franklin Tse) Date: Fri, 16 May 2014 19:38:58 +0800 Subject: Gregorian Date & Time Formats of en-HK In-Reply-To: References:

Message-ID: Thanks for the useful tip. I have posted my requests in the forum. -----Original Message----- From: Mark Davis ?? Date: Friday, 16 May, 2014 01:46 To: Franklin Tse Cc: cldr-users at unicode.org Subject: Re: Gregorian Date & Time Formats of en-HK On Thu, May 15, 2014 at 4:32 AM, Franklin Tse wrote: I am writing to propose some changes to the "Approved" Gregorian date and time formats in the English (Hong Kong SAR) [en-HK] locale: 1. Short Date Format Currently, the approved short date format is "d/M/yy", which has 2-digit year format. However, 2-digit year format is not common in Hong Kong after Year 2000. We use 4-digit year instead. Therefore, I suggest that be changed to "d/M/y". 2. Comma between Year and English Month Currently, the approved values of medium, long and full date formats have a comma between Year (Y) and the English month (MMM/MMMM). This is not common in Hong Kong. We simply use "d MMM y" or "d MMMM y". I suggest those values be changed too. 3. Flexible Date Formats and Intervals Date Formats I suggest that the values in the flexible date formats and intervals date formats be modified to match with the changes in #1 and #2. I have already added the values to the Survey Tool at http://st.unicode.org/cldr-apps/v#/en_HK/Gregorian/22b38b49476d5bfd and hope that people in Hong Kong or familiar with the locale can help to vote for the changes. Best to put requests like this into the forum in the Survey tool, rather than on this mailing list.? You only want to do them here if you don't get responses on the forum. Mark ? Il meglio ? l?inimico del bene ? From fios at foramnagaidhlig.net Fri May 16 14:58:30 2014 From: fios at foramnagaidhlig.net (=?ISO-8859-1?Q?F=F2ram_na_G=E0idhlig?=) Date: Fri, 16 May 2014 20:58:30 +0100 Subject: Getting entries approved for minority languages In-Reply-To: References: <5374DAA5.9040103@foramnagaidhlig.net> <1914071177-1400175144-cardhu_decombobulator_blackberry.rim.net-218298030-@b2.c10.bise6.blackberry> <537599CA.700@icu-project.org> <5375A509.400@foramnagaidhlig.net> Message-ID: <53766DE6.90107@foramnagaidhlig.net> Hi Agustin, nice to bypass the layers of agencies and actually e-meet my employer :) I am actually one of the proofreaders on your team for Scottish Gaelic and it has indeed been an important step for us that Microsoft has been willing to pay for localization into our language. I have met with my fellow localizer today, who is the person who provided the original locale data. at the time he provided the original data, our localizing efforts were still in their infancy. So, with 4 year more of experience under our belts, we plan to extend the locale data and to do some minor fixes on what had already been submitted - e.g. making narrow entires more narrow. Also, our terminology took some time to stabilize. So, please do take on the changes we are making in the Survey tool; they are coming from the same people, and we will keep consistency with anything you send our way in the future :) 16/05/2014 17:20, sgr?obh Agustin Da Fieno Delucchi: > Hi all, > > I just wanted to provide my feedback on behalf of Microsoft. > > Scottish Gaelic is a language for which Microsoft has been providing support for a good number of years. In doing so, we have engaged with local language institutions. > > We do have locale data for this language, as well as extensive terminology and translation memories, so we certainly care about this language and locale. > > I am sure that the locale information that Microsoft has will differ much from what the F?ram na G?idhlig is providing, but we (Microsoft) would prefer that our vetting score is not reduced for this and other locales that we are adding to CLDR v 26. > > Please let me know if you have further questions or comment. > > Thanks, > > Agust?n > > > -----Original Message----- > From: CLDR-Users [mailto:cldr-users-bounces at unicode.org] On Behalf Of F?ram na G?idhlig > Sent: Thursday, May 15, 2014 10:41 PM > To: cldr-users at unicode.org > Subject: Re: Getting entries approved for minority languages > >> On 05/15/2014 08:19 PM, Philippe Verdy wrote: >>> May be it's possible to - adjust the voting threshold according to >>> the number of participants >> I think that's basically what's done on a manual basis such as in the >> case of Scottish Gaelic. I don't think it would be done on an >> automated basis. > > As long as you have enough man power to do so, manual is the ticket. > After I asked, I got pointed to this mailing list, and after posting my problem got resolved really fast. > > >>> - reduce the vetting score for major companies (like Google, IBM, >>> Apple, Oracle, SAP, Microsoft/Nokia, Facebook, Twitter, Mozilla >>> Foundation, Launchpad, Wikimedia Translate.net, the FSF translators >>> list, Samsung, HTC..., or even national linguistic institutes and >>> libraries and national standard bodies, or gaming developement >>> companies, or manufacturers of various automated domestic >>> appliances), that still have not enough time to inverst in those >>> minority languages with a confirmed interest and activity to these >>> languages, even if they are full CLDR TC members. Note also that >>> their interest may not be on the whole comprehensive dataset, but >>> only on some core data (or just the "basic" or "modern" coverages; >>> for example they will not need to include all possible calendars and >>> onlya subset of date and number formats). >>> >>> This way those languages can have a possible start even with small >>> participation (this won't hurt the business of CLDR TC members that >>> have still no specific interest in those languages, they are not >>> required to provide these CLDR data wit htheir products, or can >>> provide them provisionally by a specific installation option). >>> >>> If there are errors that need correction, more people will join the >>> program to paraticipate in the next release. This will help bootstart >>> these languages, increase the number of users of the published data, >>> and finally will increase the level of particpation of "major >>> players" that will add some more of them in their monitored data, and >>> when this will occur, the betting thresholds will be raised a bit. > > More major player don't necessarily means more localizers. For my language, no matter which translation agency you contact, things will eventually end up wth our team, because we generally don't nave enough translators to go around, and localizing is a special skill set as well. So, there are no more localizers available. > > I think it will be similar for many long tail languages, because the speakers tend to live in economically deprived areas and/or to have relatively few speakers. > > > >> I don't see why any such changes need to be done preemptively, though. >> As long as it is understood that votes don't go to waste, just log in >> and vote as much as you can. Saying, "See, I've contributed this data >> and I need X" makes more sense than changing the rules ahead of time, >> without knowing what the participation actually will be. > >> Perhaps something such as, the first time you cast a vote that doesn't >> win or the first time you encounter ("Changes to this item require 4 >> votes." ) a message comes up that explains the process, explains why >> signing up more people doesn't help, and basically says, send us a >> mail/file a bug if you are stuck (just as F?ram >> did.) > > I think it's a good idea to have a message, or to put it in the instructions somewhere. Seeing the person has already contributed is also a good thing, so you get an idea that they are willing and able to deliver something usable. Maybe we could have an official threshold that should be completed first for new locales, say, the minimal data set? For locales already worked on like mine, such a threshold wouldn't be easi to define - we could have a more loose criterium here > - if we see you're putting in an effort, ask us for more voting power. > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > > ----- > No virus found in this message. > Checked by AVG - www.avg.com > Version: 2014.0.4570 / Virus Database: 3950/7503 - Release Date: 05/16/14 > > > From skeet at pobox.com Mon May 19 11:31:07 2014 From: skeet at pobox.com (Jon Skeet) Date: Mon, 19 May 2014 17:31:07 +0100 Subject: Clarifying the LDML data model Message-ID: Hi folks, I'm trying to get my head round the LDML data model in as clear a way as possible, and I have a few questions - basically around how to interpret TR-35 part 1. For the moment I'm only interested in non-blocking elements (although at some point I'm going to need to get my head round the exact meaning of serialElements and blockingItems...) 1. Are nondistinguishing attributes ever valid for non-end-nodes? (I can imagine the draft attribute being one exception to this.) It makes life easier if we can think of the "value" for a node as being the non-distinguishing attributes of just the *deepest* element in the chain, along with the text content of that element. 2. Is it valid for a nondistinguishing attribute to occur on an element whose content is an element? If so, do the nondistinguishing attributes of that element override those in the target of the alias? As an example, consider: text value If I ask for the nondistinguishing attribute "bar" on //foo/element[@type='x'] would I get bar-on-x or bar-on-y? 3. Is there any way to tell the difference between an end-node with an empty text value and a node which *could* have child elements, but happens not to for a specific locale? As an aside, while the spec talks about a locale data file as being a *list* of element-chain/value pairs, I'm finding it hard to shake the idea of it being a tree (or possibly a forest). If anyone has any feedback about whether that's likely to cause me problems later, I'd welcome it. Jon -------------- next part -------------- An HTML attachment was scrubbed... URL: From srl at icu-project.org Mon May 19 12:11:47 2014 From: srl at icu-project.org (Steven R. Loomis) Date: Mon, 19 May 2014 10:11:47 -0700 Subject: Clarifying the LDML data model In-Reply-To: References: Message-ID: <537A3B53.8090802@icu-project.org> On 05/19/2014 09:31 AM, Jon Skeet wrote: > Hi folks, > > I'm trying to get my head round the LDML data model in as clear a way > as possible, and I have a few questions - basically around how to > interpret TR-35 part 1. For the moment I'm only interested in > non-blocking elements (although at some point I'm going to need to get > my head round the exact meaning of serialElements and blockingItems...) Diagrams might help in explaining serialElements and blockingItems. > 1. Are nondistinguishing attributes ever valid for non-end-nodes? (I > can imagine the draft attribute being one exception to this.) Yes, "draft" as you noted, and for example "references" on . Some normalization happens as part of the CLDR release, however, the question here is what is valid for LDML. > It makes life easier if we can think of the "value" for a node as > being the non-distinguishing attributes of just the /deepest/ element > in the chain, along with the text content of that element. I don't know what the context of your processing is, but you might not want to consider the non-distinguishing attributes at all, they are for informative purposes only. Or, perform your processing ignoring non-distinguishing attributes, and then look up the non-distinguishing attributes on an as needed basis. > 2.Is it valid for a nondistinguishing attribute to occur on an element > whose content is an element? If so, do the nondistinguishing > attributes of that element override those in the target of the alias? > As an example, consider: > > > > > > text value > > > > If I ask for the nondistinguishing attribute "bar" on > //foo/element[@type='x'] would I get bar-on-x or bar-on-y? This seems at first glance to not be defined by the spec. > 3. Is there any way to tell the difference between an end-node with an > empty text value and a node which /could/ have child elements, but > happens not to for a specific locale? > > As an aside, while the spec talks about a locale data file as being a > /list/ of element-chain/value pairs, I'm finding it hard to shake the > idea of it being a tree (or possibly a forest). If anyone has any > feedback about whether that's likely to cause me problems later, I'd > welcome it. > Those are actually related questions. Quoting 4.2.1 definitions http://www.unicode.org/reports/tr35/#Definitions/ - "An LDML file can be thought of as an ordered list of //element pairs//: , where the element chains are all the chains for the end-nodes. (This works because of restrictions on the structure of LDML, including that it does not allow mixed content.) The ordering is the ordering that the element chains are found in the file, and thus determined by the DTD."/ //No, there's no way to tell the difference by inspecting the XML, BUT the DTD, and especially the supplemental metadata, will tell you what is valid at that level. So, .. is valid by the DTD, but Foo .. is not. So if you were to represent the first (valid) example as you would completely omit the "" - it has no value as LDML, basically, and I think that CLDR tools would strip it away completely. Hope this helps. -- IBMer but all opinions are mine. https://www.ohloh.net/accounts/srl295 // fingerprint @ https://ssl.icu-project.org/trac/wiki/Srl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 555 bytes Desc: OpenPGP digital signature URL: From srl at icu-project.org Mon May 19 12:18:37 2014 From: srl at icu-project.org (Steven R. Loomis) Date: Mon, 19 May 2014 10:18:37 -0700 Subject: Clarifying the LDML data model In-Reply-To: <537A3B53.8090802@icu-project.org> References: <537A3B53.8090802@icu-project.org> Message-ID: <537A3CED.3040804@icu-project.org> On 05/19/2014 10:11 AM, Steven R. Loomis wrote: > On 05/19/2014 09:31 AM, Jon Skeet wrote: >> Hi folks, >> >> I'm trying to get my head round the LDML data model in as clear a way >> as possible, and I have a few questions - basically around how to >> interpret TR-35 part 1. For the moment I'm only interested in >> non-blocking elements (although at some point I'm going to need to get >> my head round the exact meaning of serialElements and blockingItems...) > Diagrams might help in explaining serialElements and blockingItems. Not a diagram, but an example (assume only blockingItem is blocking). aa.xml: Foo Bar Baz Bat aa_BB.xml: Quux Quux Resolved aa_BB.xml: Quux Baz Bat Quux -- IBMer but all opinions are mine. https://www.ohloh.net/accounts/srl295 // fingerprint @ https://ssl.icu-project.org/trac/wiki/Srl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 555 bytes Desc: OpenPGP digital signature URL: From mark at macchiato.com Mon May 19 13:51:58 2014 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Mon, 19 May 2014 20:51:58 +0200 Subject: Clarifying the LDML data model In-Reply-To: <537A3B53.8090802@icu-project.org> References: <537A3B53.8090802@icu-project.org> Message-ID: A couple of quick notes Mark *? Il meglio ? l?inimico del bene ?* On Mon, May 19, 2014 at 7:11 PM, Steven R. Loomis wrote: > On 05/19/2014 09:31 AM, Jon Skeet wrote: > > Hi folks, > > > > I'm trying to get my head round the LDML data model in as clear a way > > as possible, and I have a few questions - basically around how to > > interpret TR-35 part 1. For the moment I'm only interested in > > non-blocking elements (although at some point I'm going to need to get > > my head round the exact meaning of serialElements and blockingItems...) > > Diagrams might help in explaining serialElements and blockingItems. > > > 1. Are nondistinguishing attributes ever valid for non-end-nodes? (I > > can imagine the draft attribute being one exception to this.) > > Yes, "draft" as you noted, and for example "references" on > . Some normalization happens as part of the CLDR release, > however, the question here is what is valid for LDML. > > > It makes life easier if we can think of the "value" for a node as > > being the non-distinguishing attributes of just the /deepest/ element > > in the chain, along with the text content of that element. > > I don't know what the context of your processing is, but you might not > want to consider the non-distinguishing attributes at all, they are for > informative purposes only. ?Be careful here. While the non-distinguishing attributes are "mostly" informative for common/main, they are vital for essentially all? other files, like supplemental. Or, perform your processing ignoring > non-distinguishing attributes, and then look up the non-distinguishing > attributes on an as needed basis. > ?Again, mostly valid only for common/main. ? > > > 2.Is it valid for a nondistinguishing attribute to occur on an element > > whose content is an element? If so, do the nondistinguishing > > attributes of that element override those in the target of the alias? > > As an example, consider: > > > > > > > > > > > > text value > > > > > > > > If I ask for the nondistinguishing attribute "bar" on > > //foo/element[@type='x'] would I get bar-on-x or bar-on-y? > This seems at first glance to not be defined by the spec. > > 3. Is there any way to tell the difference between an end-node with an > > empty text value and a node which /could/ have child elements, but > > happens not to for a specific locale? > > > > As an aside, while the spec talks about a locale data file as being a > > /list/ of element-chain/value pairs, I'm finding it hard to shake the > > idea of it being a tree (or possibly a forest). If anyone has any > > feedback about whether that's likely to cause me problems later, I'd > > welcome it. > > > > Those are actually related questions. Quoting 4.2.1 definitions > http://www.unicode.org/reports/tr35/#Definitions/ - "An LDML file can > be thought of as an ordered list of //element pairs//: data>, where the element chains are all the chains for the end-nodes. > (This works because of restrictions on the structure of LDML, including > that it does not allow mixed content.) The ordering is the ordering that > the element chains are found in the file, and thus determined by the DTD."/ > > //No, there's no way to tell the difference by inspecting the XML, BUT > the DTD, and especially the supplemental metadata, will tell you what is > valid at that level. > > So, > > > > > > .. is valid by the DTD, but > > > Foo > > > .. is not. > > So if you were to represent the first (valid) example as data> you would completely omit the "" - it has no > value as LDML, basically, and I think that CLDR tools would strip it > away completely. > > Hope this helps. > > -- > > IBMer but all opinions are mine. > https://www.ohloh.net/accounts/srl295 // fingerprint @ > https://ssl.icu-project.org/trac/wiki/Srl > > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdaoden at yandex.com Mon May 26 06:10:57 2014 From: sdaoden at yandex.com (Steffen Nurpmeso) Date: Mon, 26 May 2014 13:10:57 +0200 Subject: character.jsp says that U+00E5 isUppercase Message-ID: <20140526121057.qZx2a+T9%sdaoden@yandex.com> Happy monday and hello, i stumbled over that on saturday while following Karl Williamson's advise to deal with DerivedCoreProperties.txt (many thanks for perl(1) again) and getting bored over that [1]. Shouldn't isUppercase map to D140 isUppercase(X): isUppercase(X) is true when toUppercase(Y) = Y. [1] Ciao from a tea drinker (though i like the Esperan?a fair trade coffee pretty much; pretty smooth), --steffen From jkorpela at cs.tut.fi Mon May 26 08:08:08 2014 From: jkorpela at cs.tut.fi (Jukka K. Korpela) Date: Mon, 26 May 2014 16:08:08 +0300 Subject: character.jsp says that U+00E5 isUppercase In-Reply-To: <20140526121057.qZx2a+T9%sdaoden@yandex.com> References: <20140526121057.qZx2a+T9%sdaoden@yandex.com> Message-ID: <53833CB8.4010301@cs.tut.fi> 2014-05-26 14:10, Steffen Nurpmeso wrote: > i stumbled over that on saturday Stumbled on what? Especially in a medium like e-mail, you should not imply the heading (the Subject line) in the body. So are you saying that ?character.jsp says that U+00E5 isUppercase?? And by ?character.jsp? you mean ? Well, when I checked it said: isUppercase false However, now that I check again, it says isUppercase Yes using different notation and different value! It also says that isLowercase is true (Yes). Following the link ?Yes?, pointing to http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:isUppercase=Yes:] shows that quite a few other letters have been incorrectly classified as being uppercase. So there is surely something wrong with the information. Yucca From sdaoden at yandex.com Mon May 26 09:36:55 2014 From: sdaoden at yandex.com (Steffen Nurpmeso) Date: Mon, 26 May 2014 16:36:55 +0200 Subject: character.jsp says that U+00E5 isUppercase In-Reply-To: <53833CB8.4010301@cs.tut.fi> References: <20140526121057.qZx2a+T9%sdaoden@yandex.com> <53833CB8.4010301@cs.tut.fi> Message-ID: <20140526153655.jNFDSM6i%sdaoden@yandex.com> huhu, "Jukka K. Korpela" wrote: |So there is surely something wrong with the information. something is wrong, heh? Well, terrible weather here, very low gray clowds, sorry if that brings me down. In Finland that'd surely be covered by overall darkness, but here it is hand-in-hand with a lot of concrete :) --steffen From mark at macchiato.com Mon May 26 10:38:59 2014 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Mon, 26 May 2014 17:38:59 +0200 Subject: character.jsp says that U+00E5 isUppercase In-Reply-To: <20140526121057.qZx2a+T9%sdaoden@yandex.com> References: <20140526121057.qZx2a+T9%sdaoden@yandex.com> Message-ID: Whoops, bug in the program. Mark *? Il meglio ? l?inimico del bene ?* On Mon, May 26, 2014 at 1:10 PM, Steffen Nurpmeso wrote: > Happy monday and hello, > > i stumbled over that on saturday while following Karl Williamson's > advise to deal with DerivedCoreProperties.txt (many thanks for > perl(1) again) and getting bored over that [1]. > Shouldn't isUppercase map to > > D140 isUppercase(X): > isUppercase(X) is true when toUppercase(Y) = Y. > > [1] > > Ciao from a tea drinker (though i like the Esperan?a fair trade > coffee pretty much; pretty smooth), > > --steffen > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: