From me at erik.tw Thu May 14 09:11:16 2020 From: me at erik.tw (Erik Williamson) Date: Thu, 14 May 2020 10:11:16 -0400 Subject: Connecting with other voting members of my locale? Message-ID: Hello, I am new to the CLDR voting process, and recently submitted a locale bug report and applied for a CLDR Survey Tool guest access. I'm looking to connect with other participants in es-CL and es-MX to discuss my proposed change to currency formatting. I have already attempted to contact the national language academies of Chile and Mexico, but so far have not established communication. Is there a way for me to connect with other members, or view organizational membership by locale? Thanks, -- Erik Williamson me at erik.tw -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Thu May 14 15:03:35 2020 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Thu, 14 May 2020 13:03:35 -0700 Subject: Connecting with other voting members of my locale? In-Reply-To: References: Message-ID: Once the tool opens submission, there is a forum for discussing issues among people working on the same locale. On Thu, May 14, 2020, 07:55 Erik Williamson via CLDR-Users < cldr-users at unicode.org> wrote: > Hello, > > I am new to the CLDR voting process, and recently submitted a locale bug > report and applied for a CLDR Survey Tool guest access. I'm looking to > connect with other participants in es-CL and es-MX to discuss my proposed > change to currency formatting. I have already attempted to contact the > national language academies of Chile and Mexico, but so far have not > established communication. > > Is there a way for me to connect with other members, or view > organizational membership by locale? > > Thanks, > > -- > Erik Williamson > me at erik.tw > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at corp.unicode.org > https://corp.unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kipcole9 at gmail.com Sat May 16 02:26:05 2020 From: kipcole9 at gmail.com (Kip Cole) Date: Sat, 16 May 2020 15:26:05 +0800 Subject: Merging unit skeletons for output - a better way? Message-ID: Congratulations to those who implemented the new Unit conversion and preference data in CLDR 37. Its been a joy to implement on top of the data, and not without a few challenges :-) One area that appears undocumented, and one that is quite tricky to implement, is merging unit skeletons when outputting a string representation. I will use some examples to illustrate. All examples are using a unit value of ?3? unless otherwise indicated, and all in the ?en? locale. ## Basic question Is there a better heuristic or some algorithm I?m missing that would improve this? Totally ok that this is a new part of CLDR working around some heuristics is also fine. Just after the communities view of the best approach to take. ## Outputting a translatable unit (meaning it has a single skeleton in CLDR) ?Kilometer-per-hour? => ?{0} kilometers per hour? This is a simple case and the merging of the value into the skeleton is deterministic. No issues, simple substitutions. My implementation produces "3 kilometres per hour" ## Outputting a compound unit (no direct translation, composing is required) ?Kilometer per second? => ?{0} kilometers?, ? per ? and ?{0} second? Now we have three skeletons that need to be merged. Here are the Issues as I see them: 1. In order to resolve the skeleton for the denominator ?second? I take the plural value for ?1? (ie always singular form) 2. Ignore the placeholder in the denominator so ?{0} second? becomes ? second? 3. String join the three skeletons 4. Merge the number value into the placeholder ?{0}? 5. Replace the double space between ?per? and ?second? that arises because there is a trailing space in the ?per? skeleton and a leading space in the ? second? skeleton All of this is a heuristic and I?m not at all sure it transitive for all other locales. My implementation produces "3 kilometres per second" ## Outputting with an SI prefix (and/or square and cubic prefix) This is the case when the applied SI prefix has no direct translation and we are composting the translation. ?Millifurlong? => ?milli{0}?, ?{0} furlongs? The heuristic I currently apply is: 1. Since the prefix skeleton has the placeholder after the text it is merged in front of the unit 2. The placeholder of the prefix skeleton is deleted => ?milli?, ?{0} furlongs" 3. The prefix is merged to the front of the text in the unit skeleton => ?{0} millifurlongs? 4. Merge the number value into the placeholder The heuristic of merging the SI (or other) prefix into the unit skeleton is unlikely to be correct for all locales. My implementation produces "3 millifurlongs" ## Outputting a compound unit This is where we have a unit leveraging the ?times? skeleton. ?Furlong light year? => ?{0} light years?, ???, ? {0} furlongs? 1. The order of the skeletons is determined by the canonical sort order in Units.xml 2. The ?times? skeleton is introduced between the two units 3. Current heuristic is to omit the placeholder on all but the first skeleton (there may be n skeletons) 4. String join skeletons 5. Replace duplicate whitespace This has similar issues as the previous ?prefix? example - collapsing duplicate whitespace is required. It also has the heuristic of determining when to use the plural form for a sub-unit or the singular form. My implementation produces: "3 light years?furlongs? It uses the same plural form for all sub units. Its not ?correct? English and its just as likely to be the wrong strategy for most locales (this is a guess). Many thanks for any help or suggestions, ?Kip PS: In case anyone get this far, the implementation is in the Elixir language at https://github.com/elixir-cldr/cldr_units From mark at macchiato.com Sat May 16 15:00:00 2020 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Sat, 16 May 2020 13:00:00 -0700 Subject: Merging unit skeletons for output - a better way? In-Reply-To: References: Message-ID: Thanks for the detailed report. Can you file this as a ticket? The biggest problem is that the spec for constructing the fallback compound names is missing details for the times pattern, the power patterns, and the prefix patterns. The per pattern appears to be complete, https://unicode.org/reports/tr35/tr35-general.html#perUnitPatterns, and some of that is described there goes also for the other complex names, such as that the fallback name construction may not work well for languages with inflections. And that the "remove the placeholder" step does require removing spaces around the {0}. Note that the "square" doesn't not work right for gendered languages because it often then needs to agree with the base unit. First, we need to add full descriptions for all the complex fallback names. Second, we need to add a test file with construction of some more complicated names. As to the details, the heuristics do have to play with the plurals, including having all but the last in a times sequence use the singular. We are in the process of gathering information for including gender and case, and will need heuristics for those as well. For example, my current draft has: 1. Prefixes & powers: the gender of the whole is the same as the gender of the operand. In pseudocode: 1. gender(square, meter) = gender(meter) 2. gender(kilo, meter) = gender(meter) 2. Per: the gender of the whole is the gender of the numerator 1. gender(gram per meter) = gender(gram) 3. Times: the gender of the whole is the gender of the last operand 1. gender(gram-meter) = gender(gram) NOTE: I'm sure that we will find cases of languages that have different strategies for dealing with the plural, gender, and case in the complex cases; so we'll undoubtedly need to refine as we go along. Mark On Sat, May 16, 2020 at 12:27 AM Kip Cole via CLDR-Users < cldr-users at unicode.org> wrote: > Congratulations to those who implemented the new Unit conversion and > preference data in CLDR 37. Its been a joy to implement on top of the data, > and not without a few challenges :-) > > One area that appears undocumented, and one that is quite tricky to > implement, is merging unit skeletons when outputting a string > representation. I will use some examples to illustrate. All examples are > using a unit value of ?3? unless otherwise indicated, and all in the ?en? > locale. > > ## Basic question > > Is there a better heuristic or some algorithm I?m missing that would > improve this? Totally ok that this is a new part of CLDR working around > some heuristics is also fine. Just after the communities view of the best > approach to take. > > ## Outputting a translatable unit (meaning it has a single skeleton in > CLDR) > > ?Kilometer-per-hour? => ?{0} kilometers per hour? > > This is a simple case and the merging of the value into the skeleton is > deterministic. > No issues, simple substitutions. > > My implementation produces "3 kilometres per hour" > > ## Outputting a compound unit (no direct translation, composing is > required) > > ?Kilometer per second? => ?{0} kilometers?, ? per ? and ?{0} second? > > Now we have three skeletons that need to be merged. Here are the > Issues as I see them: > > 1. In order to resolve the skeleton for the denominator ?second? I take > the plural value for ?1? (ie always singular form) > 2. Ignore the placeholder in the denominator so ?{0} second? becomes ? > second? > 3. String join the three skeletons > 4. Merge the number value into the placeholder ?{0}? > 5. Replace the double space between ?per? and ?second? that arises because > there is a trailing space in the ?per? skeleton and a leading space in the > ? second? skeleton > > All of this is a heuristic and I?m not at all sure it transitive for all > other locales. > > My implementation produces "3 kilometres per second" > > ## Outputting with an SI prefix (and/or square and cubic prefix) > > This is the case when the applied SI prefix has no direct translation and > we are composting the translation. > > ?Millifurlong? => ?milli{0}?, ?{0} furlongs? > > The heuristic I currently apply is: > > 1. Since the prefix skeleton has the placeholder after the text it is > merged in front of the unit > 2. The placeholder of the prefix skeleton is deleted => ?milli?, ?{0} > furlongs" > 3. The prefix is merged to the front of the text in the unit skeleton => > ?{0} millifurlongs? > 4. Merge the number value into the placeholder > > The heuristic of merging the SI (or other) prefix into the unit skeleton > is unlikely to be correct for all locales. > > My implementation produces "3 millifurlongs" > > ## Outputting a compound unit > > This is where we have a unit leveraging the ?times? skeleton. > > ?Furlong light year? => ?{0} light years?, ???, ? {0} furlongs? > > 1. The order of the skeletons is determined by the canonical sort order in > Units.xml > 2. The ?times? skeleton is introduced between the two units > 3. Current heuristic is to omit the placeholder on all but the first > skeleton (there may be n skeletons) > 4. String join skeletons > 5. Replace duplicate whitespace > > This has similar issues as the previous ?prefix? example - collapsing > duplicate whitespace is required. > It also has the heuristic of determining when to use the plural form for a > sub-unit or the singular form. > > My implementation produces: "3 light years?furlongs? > > It uses the same plural form for all sub units. Its not ?correct? English > and its just as likely to be the wrong strategy for most locales (this is a > guess). > > Many thanks for any help or suggestions, > > ?Kip > > PS: In case anyone get this far, the implementation is in the Elixir > language at https://github.com/elixir-cldr/cldr_units > > > > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at corp.unicode.org > https://corp.unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Sat May 16 15:03:42 2020 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sat, 16 May 2020 21:03:42 +0100 Subject: Merging unit skeletons for output - a better way? In-Reply-To: References: Message-ID: <20200516210342.0463a032@JRWUBU2> On Sat, 16 May 2020 15:26:05 +0800 Kip Cole via CLDR-Users wrote: > ## Outputting a compound unit (no direct translation, composing is > required) > > ?Kilometer per second? => ?{0} kilometers?, ? per ? and ?{0} second? > > Now we have three skeletons that need to be merged. Here are the > Issues as I see them: > > 1. In order to resolve the skeleton for the denominator ?second? I > take the plural value for ?1? (ie always singular form) 2. Ignore the > placeholder in the denominator so ?{0} second? becomes ? second? 3. > String join the three skeletons 4. Merge the number value into the > placeholder ?{0}? 5. Replace the double space between ?per? and > ?second? that arises because there is a trailing space in the ?per? > skeleton and a leading space in the ? second? skeleton Are we supposed to be able to see how the three elements are joined together? The Thai word for 'per' I learnt (??) has the numerator and denominator the opposite way round to English, but I see there is a now a word (???) that allows the same order as English. Chinese ????? has the syntactic order (source: Google Translate) You haven't accounted for the case of the denominator unit. It's accusative in Russian and Polish, e.g. "trzy kilometry na sekund?" according to Google Translate. From the same source, Finnish doesn't use a joining preposition, just the inessive singular on its own. > PS: In case anyone get this far, the implementation is in the Elixir > language at https://github.com/elixir-cldr/cldr_units Would you give us the file names for the generation of the strings? I'm not sure it's obvious even if one uses Elixir. Richard. From richard.wordingham at ntlworld.com Sun May 17 03:30:33 2020 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sun, 17 May 2020 09:30:33 +0100 Subject: Merging unit skeletons for output - a better way? In-Reply-To: References: Message-ID: <20200517093033.3ea5a1ad@JRWUBU2> On Sat, 16 May 2020 15:26:05 +0800 Kip Cole via CLDR-Users wrote: > 1. In order to resolve the skeleton for the denominator ?second? I > take the plural value for ?1? (ie always singular form) > 2. Ignore the placeholder in the denominator so ?{0} second? becomes > ? second? > 3. String join the three skeletons > 4. Merge the number value into the placeholder ?{0}? > 5. Replace the double space between ?per? and > ?second? that arises because there is a trailing space in the ?per? > skeleton and a leading space in the ? second? skeleton So, even where the compoundUnit pattern should work, your space stripping algorithm seems to be wrong. (For example, it should only go wrong for Russian with feminine denominator units.) The rules for inserting spaces can be complicated. I don't know of anything more complicated than Thai, but it may exist. At step 2, you strip the placeholder and its surrounding spaces. This makes step 5 redundant. Step 3 is a substitution, not a concatenation. Perhaps that is what you mean by 'join' - I couldn't find the code that performs this step. Not only can denominator unit precede the numerator unit, but a quick glance at translations indicates that the denominator can occur multiple times, as though a fairly literal translation were "3 kilometres second by second". (Hawaiian is the example I have in mind.) I think step 4 should be done before step 3. Does the compound skeleton inherit the plural rules? Richard. From dewi.kerbrat at opab.bzh Wed May 20 07:08:37 2020 From: dewi.kerbrat at opab.bzh (Dewi Kerbrat) Date: Wed, 20 May 2020 14:08:37 +0200 Subject: collation test Message-ID: <65fc0c35-4536-7c6e-33f3-4ed300778a4e@opab.bzh> Hello, I work in public office for Breton language. We take part in the CLDR survey tool for Breton locale data. We would like to add a collation for breton language, but I have a few questions. I have read the collation guideline here http://cldr.unicode.org/index/cldr-spec/collation-guidelines, but the links to demos seem to be dead so I can?t test the rules I?ve built. The alphabet order in breton would be : a, b, c, ch, c'h, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z So I built those rules : &C www.brezhoneg.bzh -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Wed May 20 13:05:57 2020 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Wed, 20 May 2020 11:05:57 -0700 Subject: collation test In-Reply-To: <65fc0c35-4536-7c6e-33f3-4ed300778a4e@opab.bzh> References: <65fc0c35-4536-7c6e-33f3-4ed300778a4e@opab.bzh> Message-ID: The demo appears to be working, with the link redirecting to https://icu4c-demos-7hxm2n5zgq-uc.a.run.app/icu-bin/collation.html Mark On Wed, May 20, 2020 at 8:48 AM Dewi Kerbrat via CLDR-Users < cldr-users at unicode.org> wrote: > Hello, > > I work in public office for Breton language. We take part in the CLDR > survey tool for Breton locale data. > We would like to add a collation for breton language, but I have a few > questions. I have read the collation guideline here > http://cldr.unicode.org/index/cldr-spec/collation-guidelines, but the > links to demos seem to be dead so I can?t test the rules I?ve built. > The alphabet order in breton would be : > a, b, c, ch, c'h, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, > w, x, y, z > > So I built those rules : > > &C &n< > Is there a way to test them ? If we don?t add rules for accentuated > characters, will they be considered as equal to the non-accentuated > character or with a secondary level rule ? (for example, a<< ? ?) > > Thank you for your help, > Best regards > -- > > *Dewi Kerbrat* > Penn Raktres Yezh ha Nevezi? Niverel ? Chef de Projet Langue et Innovation > Num?rique > > *Ofis Publik Ar Brezhoneg* > 8 plasenn ar Marichal Juin ? place du Mar?chal Juin > 35000 ROAZHON ? RENNES > dewi.kerbrat at opab.bzh > www.brezhoneg.bzh > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at corp.unicode.org > https://corp.unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kipcole9 at gmail.com Sat May 23 22:18:59 2020 From: kipcole9 at gmail.com (Kip Cole) Date: Sun, 24 May 2020 11:18:59 +0800 Subject: Util to produce final merged locale.xml file? Message-ID: <11707FAA-1E62-4F5F-BF05-1CD6270B923B@gmail.com> My current CLDR-based implementation is based upon the JSON content. This is primarily because I?m not confident implementing the inheritance rules which are quite complex. I?d like to move to using the canonical XML data from CLDR-38 and I?m hoping there is a util that will implement the inheritance rules and export a fully merged main/locale.xml file. Any chance such a tool exists? Many thanks, ?Kip From mark at macchiato.com Tue May 26 20:03:26 2020 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Tue, 26 May 2020 18:03:26 -0700 Subject: Util to produce final merged locale.xml file? In-Reply-To: <11707FAA-1E62-4F5F-BF05-1CD6270B923B@gmail.com> References: <11707FAA-1E62-4F5F-BF05-1CD6270B923B@gmail.com> Message-ID: There is tooling to do that. The workhorse is CLDRFile. You create a file by getting a factory for the file (see CLDRConfig.java), then calling make(yourlocale,true) to get a resolved version. You can then call CLDRFile's write(PrintWriter) to write out to a file. Sorry I don't have more time to go into detail. Mark On Tue, May 26, 2020 at 2:39 PM Kip Cole via CLDR-Users < cldr-users at unicode.org> wrote: > My current CLDR-based implementation is based upon the JSON content. This > is primarily because I?m not confident implementing the inheritance rules > which are quite complex. > > I?d like to move to using the canonical XML data from CLDR-38 and I?m > hoping there is a util that will implement the inheritance rules and export > a fully merged main/locale.xml file. > > Any chance such a tool exists? > > Many thanks, ?Kip > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at corp.unicode.org > https://corp.unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: