From me at erik.tw  Thu May 14 09:11:16 2020
From: me at erik.tw (Erik Williamson)
Date: Thu, 14 May 2020 10:11:16 -0400
Subject: Connecting with other voting members of my locale?
Message-ID: <CAJ=dQEjqwS45hC1s2D4L3-gFux+hgOnVVNi+viPa_WfdhkjBaQ@mail.gmail.com>

Hello,

I am new to the CLDR voting process, and recently submitted a locale bug
report and applied for a CLDR Survey Tool guest access. I'm looking to
connect with other participants in es-CL and es-MX to discuss my proposed
change to currency formatting. I have already attempted to contact the
national language academies of Chile and Mexico, but so far have not
established communication.

Is there a way for me to connect with other members, or view organizational
membership by locale?

Thanks,

-- 
Erik Williamson
me at erik.tw
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/cldr-users/attachments/20200514/ed7351a6/attachment.htm>

From mark at macchiato.com  Thu May 14 15:03:35 2020
From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=)
Date: Thu, 14 May 2020 13:03:35 -0700
Subject: Connecting with other voting members of my locale?
In-Reply-To: <CAJ=dQEjqwS45hC1s2D4L3-gFux+hgOnVVNi+viPa_WfdhkjBaQ@mail.gmail.com>
References: <CAJ=dQEjqwS45hC1s2D4L3-gFux+hgOnVVNi+viPa_WfdhkjBaQ@mail.gmail.com>
Message-ID: <CAJ2xs_EW1MVKitM84JHM6jPy2VurveLvwEZqkZ0p+GmAcr6TQQ@mail.gmail.com>

Once the tool opens submission, there is a forum for discussing issues
among people working on the same locale.

On Thu, May 14, 2020, 07:55 Erik Williamson via CLDR-Users <
cldr-users at unicode.org> wrote:

> Hello,
>
> I am new to the CLDR voting process, and recently submitted a locale bug
> report and applied for a CLDR Survey Tool guest access. I'm looking to
> connect with other participants in es-CL and es-MX to discuss my proposed
> change to currency formatting. I have already attempted to contact the
> national language academies of Chile and Mexico, but so far have not
> established communication.
>
> Is there a way for me to connect with other members, or view
> organizational membership by locale?
>
> Thanks,
>
> --
> Erik Williamson
> me at erik.tw
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at corp.unicode.org
> https://corp.unicode.org/mailman/listinfo/cldr-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/cldr-users/attachments/20200514/19bec642/attachment.htm>

From kipcole9 at gmail.com  Sat May 16 02:26:05 2020
From: kipcole9 at gmail.com (Kip Cole)
Date: Sat, 16 May 2020 15:26:05 +0800
Subject: Merging unit skeletons for output - a better way?
Message-ID: <A9A43191-4C7C-4C82-8225-46C99E8A9728@gmail.com>

Congratulations to those who implemented the new Unit conversion and preference data in CLDR 37. Its been a joy to implement on top of the data, and not without a few challenges :-)

One area that appears undocumented, and one that is quite tricky to implement, is merging unit skeletons when outputting a string representation. I will use some examples to illustrate. All examples are using a unit value of ?3? unless otherwise indicated, and all in the ?en? locale.

## Basic question

Is there a better heuristic or some algorithm I?m missing that would improve this?  Totally ok that this is a new part of CLDR working around some heuristics is also fine. Just after the communities view of the best approach to take.

## Outputting a translatable unit (meaning it has a single skeleton in CLDR)

?Kilometer-per-hour? => ?{0} kilometers per hour?

This is a simple case and the merging of the value into the skeleton is deterministic.
No issues, simple substitutions.

My implementation produces "3 kilometres per hour"

## Outputting a compound unit (no direct translation, composing is required)

?Kilometer per second? => ?{0} kilometers?, ? per ? and ?{0} second?

Now we have three skeletons that need to be merged. Here are the 
Issues as I see them:

1. In order to resolve the skeleton for the denominator ?second? I take the plural value for ?1? (ie always singular form)
2. Ignore the placeholder in the denominator so ?{0} second? becomes ? second?
3. String join the three skeletons
4. Merge the number value into the placeholder ?{0}?
5. Replace the double space between ?per? and ?second? that arises because there is a trailing space in the ?per? skeleton and a leading space in the ? second? skeleton

All of this is a heuristic and I?m not at all sure it transitive for all other locales.

My implementation produces "3 kilometres per second"

## Outputting with an SI prefix (and/or square and cubic prefix)

This is the case when the applied SI prefix has no direct translation and we are composting the translation.

?Millifurlong? => ?milli{0}?, ?{0} furlongs?

The heuristic I currently apply is:

1. Since the prefix skeleton has the placeholder after the text it is merged in front of the unit
2. The placeholder of the prefix skeleton is deleted => ?milli?, ?{0} furlongs"
3. The prefix is merged to the front of the text in the unit skeleton => ?{0} millifurlongs?
4. Merge the number value into the placeholder

The heuristic of merging the SI (or other) prefix into the unit skeleton is unlikely to be correct for all locales.

My implementation produces "3 millifurlongs"

## Outputting a compound unit

This is where we have a unit leveraging the ?times? skeleton.

?Furlong light year? => ?{0} light years?, ???, ? {0} furlongs?

1. The order of the skeletons is determined by the canonical sort order in Units.xml
2. The ?times? skeleton is introduced between the two units
3. Current heuristic is to omit the placeholder on all but the first skeleton (there may be n skeletons)
4. String join skeletons
5. Replace duplicate whitespace

This has similar issues as the previous ?prefix? example - collapsing duplicate whitespace is required.
It also has the heuristic of determining when to use the plural form for a sub-unit or the singular form.

My implementation produces: "3 light years?furlongs?

It uses the same plural form for all sub units. Its not ?correct? English and its just as likely to be the wrong strategy for most locales (this is a guess).

Many thanks for any help or suggestions,

?Kip

PS: In case anyone get this far, the implementation is in the Elixir language at https://github.com/elixir-cldr/cldr_units


From mark at macchiato.com  Sat May 16 15:00:00 2020
From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=)
Date: Sat, 16 May 2020 13:00:00 -0700
Subject: Merging unit skeletons for output - a better way?
In-Reply-To: <A9A43191-4C7C-4C82-8225-46C99E8A9728@gmail.com>
References: <A9A43191-4C7C-4C82-8225-46C99E8A9728@gmail.com>
Message-ID: <CAJ2xs_Hmm5N4bcRhJv+nPyPDS-NdXWj=QR9J8DM68Jt50BLOwQ@mail.gmail.com>

Thanks for the detailed report. Can you file this as a ticket?

The biggest problem is that the spec for constructing the fallback compound
names is missing details for the times pattern, the power patterns, and the
prefix patterns. The per pattern appears to be complete,
https://unicode.org/reports/tr35/tr35-general.html#perUnitPatterns, and
some of that is described there goes also for the other complex names, such
as that the fallback name construction may not work well for languages with
inflections. And that the "remove the placeholder" step does require
removing spaces around the {0}. Note that the "square" doesn't not work
right for gendered languages because it often then needs to agree with the
base unit.

First, we need to add full descriptions for all the complex fallback names.
Second, we need to add a test file with construction of some more
complicated names.

As to the details, the heuristics do have to play with the plurals,
including having all but the last in a times sequence use the singular.

We are in the process of gathering information for including gender and
case, and will need heuristics for those as well. For example, my current
draft has:


   1.

   Prefixes & powers: the gender of the whole is the same as the gender of
   the operand. In pseudocode:
   1.

      gender(square, meter) = gender(meter)
      2.

      gender(kilo, meter) = gender(meter)
      2.

   Per: the gender of the whole is the gender of the numerator
   1.

      gender(gram per meter) = gender(gram)
      3.

   Times: the gender of the whole is the gender of the last operand
   1.

      gender(gram-meter) = gender(gram)


NOTE: I'm sure that we will find cases of languages that have different
strategies for dealing with the plural, gender, and case in the complex
cases; so we'll undoubtedly need to refine as we go along.

Mark


On Sat, May 16, 2020 at 12:27 AM Kip Cole via CLDR-Users <
cldr-users at unicode.org> wrote:

> Congratulations to those who implemented the new Unit conversion and
> preference data in CLDR 37. Its been a joy to implement on top of the data,
> and not without a few challenges :-)
>
> One area that appears undocumented, and one that is quite tricky to
> implement, is merging unit skeletons when outputting a string
> representation. I will use some examples to illustrate. All examples are
> using a unit value of ?3? unless otherwise indicated, and all in the ?en?
> locale.
>
> ## Basic question
>
> Is there a better heuristic or some algorithm I?m missing that would
> improve this?  Totally ok that this is a new part of CLDR working around
> some heuristics is also fine. Just after the communities view of the best
> approach to take.
>
> ## Outputting a translatable unit (meaning it has a single skeleton in
> CLDR)
>
> ?Kilometer-per-hour? => ?{0} kilometers per hour?
>
> This is a simple case and the merging of the value into the skeleton is
> deterministic.
> No issues, simple substitutions.
>
> My implementation produces "3 kilometres per hour"
>
> ## Outputting a compound unit (no direct translation, composing is
> required)
>
> ?Kilometer per second? => ?{0} kilometers?, ? per ? and ?{0} second?
>
> Now we have three skeletons that need to be merged. Here are the
> Issues as I see them:
>
> 1. In order to resolve the skeleton for the denominator ?second? I take
> the plural value for ?1? (ie always singular form)
> 2. Ignore the placeholder in the denominator so ?{0} second? becomes ?
> second?
> 3. String join the three skeletons
> 4. Merge the number value into the placeholder ?{0}?
> 5. Replace the double space between ?per? and ?second? that arises because
> there is a trailing space in the ?per? skeleton and a leading space in the
> ? second? skeleton
>
> All of this is a heuristic and I?m not at all sure it transitive for all
> other locales.
>
> My implementation produces "3 kilometres per second"
>
> ## Outputting with an SI prefix (and/or square and cubic prefix)
>
> This is the case when the applied SI prefix has no direct translation and
> we are composting the translation.
>
> ?Millifurlong? => ?milli{0}?, ?{0} furlongs?
>
> The heuristic I currently apply is:
>
> 1. Since the prefix skeleton has the placeholder after the text it is
> merged in front of the unit
> 2. The placeholder of the prefix skeleton is deleted => ?milli?, ?{0}
> furlongs"
> 3. The prefix is merged to the front of the text in the unit skeleton =>
> ?{0} millifurlongs?
> 4. Merge the number value into the placeholder
>
> The heuristic of merging the SI (or other) prefix into the unit skeleton
> is unlikely to be correct for all locales.
>
> My implementation produces "3 millifurlongs"
>
> ## Outputting a compound unit
>
> This is where we have a unit leveraging the ?times? skeleton.
>
> ?Furlong light year? => ?{0} light years?, ???, ? {0} furlongs?
>
> 1. The order of the skeletons is determined by the canonical sort order in
> Units.xml
> 2. The ?times? skeleton is introduced between the two units
> 3. Current heuristic is to omit the placeholder on all but the first
> skeleton (there may be n skeletons)
> 4. String join skeletons
> 5. Replace duplicate whitespace
>
> This has similar issues as the previous ?prefix? example - collapsing
> duplicate whitespace is required.
> It also has the heuristic of determining when to use the plural form for a
> sub-unit or the singular form.
>
> My implementation produces: "3 light years?furlongs?
>
> It uses the same plural form for all sub units. Its not ?correct? English
> and its just as likely to be the wrong strategy for most locales (this is a
> guess).
>
> Many thanks for any help or suggestions,
>
> ?Kip
>
> PS: In case anyone get this far, the implementation is in the Elixir
> language at https://github.com/elixir-cldr/cldr_units
>
>
>
>
>
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at corp.unicode.org
> https://corp.unicode.org/mailman/listinfo/cldr-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/cldr-users/attachments/20200516/56cc6c05/attachment-0001.htm>

From richard.wordingham at ntlworld.com  Sat May 16 15:03:42 2020
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Sat, 16 May 2020 21:03:42 +0100
Subject: Merging unit skeletons for output - a better way?
In-Reply-To: <A9A43191-4C7C-4C82-8225-46C99E8A9728@gmail.com>
References: <A9A43191-4C7C-4C82-8225-46C99E8A9728@gmail.com>
Message-ID: <20200516210342.0463a032@JRWUBU2>

On Sat, 16 May 2020 15:26:05 +0800
Kip Cole via CLDR-Users <cldr-users at unicode.org> wrote:

> ## Outputting a compound unit (no direct translation, composing is
> required)
> 
> ?Kilometer per second? => ?{0} kilometers?, ? per ? and ?{0} second?
> 
> Now we have three skeletons that need to be merged. Here are the 
> Issues as I see them:
> 
> 1. In order to resolve the skeleton for the denominator ?second? I
> take the plural value for ?1? (ie always singular form) 2. Ignore the
> placeholder in the denominator so ?{0} second? becomes ? second? 3.
> String join the three skeletons 4. Merge the number value into the
> placeholder ?{0}? 5. Replace the double space between ?per? and
> ?second? that arises because there is a trailing space in the ?per?
> skeleton and a leading space in the ? second? skeleton

Are we supposed to be able to see how the three elements are joined
together?  The Thai word for 'per' I learnt (??) has the numerator and
denominator the opposite way round to English, but I see there is a now
a word (???) that allows the same order as English. Chinese ?????
has the syntactic order <per><second><three><kilometer> (source:
Google Translate) 

You haven't accounted for the case of the denominator unit.  It's
accusative in Russian and Polish, e.g. "trzy kilometry na sekund?"
according to Google Translate.  From the same source, Finnish doesn't
use a joining preposition, just the inessive singular on its own.

> PS: In case anyone get this far, the implementation is in the Elixir
> language at https://github.com/elixir-cldr/cldr_units

Would you give us the file names for the generation of the strings?  I'm
not sure it's obvious even if one uses Elixir.

Richard.


From richard.wordingham at ntlworld.com  Sun May 17 03:30:33 2020
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Sun, 17 May 2020 09:30:33 +0100
Subject: Merging unit skeletons for output - a better way?
In-Reply-To: <A9A43191-4C7C-4C82-8225-46C99E8A9728@gmail.com>
References: <A9A43191-4C7C-4C82-8225-46C99E8A9728@gmail.com>
Message-ID: <20200517093033.3ea5a1ad@JRWUBU2>

On Sat, 16 May 2020 15:26:05 +0800
Kip Cole via CLDR-Users <cldr-users at unicode.org> wrote:


> 1. In order to resolve the skeleton for the denominator ?second? I
> take the plural value for ?1? (ie always singular form)

> 2. Ignore the placeholder in the denominator so ?{0} second? becomes
> ? second?

> 3. String join the three skeletons

> 4. Merge the number value into the placeholder ?{0}?

> 5. Replace the double space between ?per? and
> ?second? that arises because there is a trailing space in the ?per?
> skeleton and a leading space in the ? second? skeleton

So, even where the compoundUnit pattern should work, your space
stripping algorithm seems to be wrong.  (For example, it should only go
wrong for Russian with feminine denominator units.)  The rules for
inserting spaces can be complicated. I don't know of anything more
complicated than Thai, but it may exist.

At step 2, you strip the placeholder and its surrounding spaces.  This
makes step 5 redundant.

Step 3 is a substitution, not a concatenation.  Perhaps that is what
you mean by 'join' - I couldn't find the code that performs this step.
Not only can denominator unit precede the numerator unit, but a quick
glance at translations indicates that the denominator can occur
multiple times, as though a fairly literal translation were "3
kilometres second by second". (Hawaiian is the example I have in mind.)

I think step 4 should be done before step 3.  Does the compound
skeleton inherit the plural rules?

Richard.


From dewi.kerbrat at opab.bzh  Wed May 20 07:08:37 2020
From: dewi.kerbrat at opab.bzh (Dewi Kerbrat)
Date: Wed, 20 May 2020 14:08:37 +0200
Subject: collation test
Message-ID: <65fc0c35-4536-7c6e-33f3-4ed300778a4e@opab.bzh>

Hello,

I work in public office for Breton language. We take part in the CLDR 
survey tool for Breton locale data.
We would like to add a collation for breton language, but I have a few 
questions. I have read the collation guideline here 
http://cldr.unicode.org/index/cldr-spec/collation-guidelines, but the 
links to demos seem to be dead so I can?t test the rules I?ve built.
The alphabet order in breton would be :
a, b, c, ch, c'h, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, 
v, w, x, y, z

So I built those rules :

&C<ch<<<Ch<<<CH<c?h<<<C?h<<<C?H
&n<<?

Is there a way to test them ? If we don?t add rules for accentuated 
characters, will they be considered as equal to the non-accentuated 
character or with a secondary level rule ? (for example, a<<? ? ?)

Thank you for your help,
Best regards

-- 

*Dewi Kerbrat*
Penn Raktres Yezh ha Nevezi? Niverel ? Chef de Projet Langue et 
Innovation Num?rique

*Ofis Publik Ar Brezhoneg*
8 plasenn ar Marichal Juin ? place du Mar?chal Juin
35000 ROAZHON ? RENNES
dewi.kerbrat at opab.bzh <mailto:dewi.kerbrat at opab.bzh>
www.brezhoneg.bzh <http://www.brezhoneg.bzh>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/cldr-users/attachments/20200520/ecf4595e/attachment.htm>

From mark at macchiato.com  Wed May 20 13:05:57 2020
From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=)
Date: Wed, 20 May 2020 11:05:57 -0700
Subject: collation test
In-Reply-To: <65fc0c35-4536-7c6e-33f3-4ed300778a4e@opab.bzh>
References: <65fc0c35-4536-7c6e-33f3-4ed300778a4e@opab.bzh>
Message-ID: <CAJ2xs_G3jfxkVL5DMc1_airkJMY5a2ddv3dHH+tZnAY-4BspwQ@mail.gmail.com>

The demo appears to be working, with the link redirecting to
https://icu4c-demos-7hxm2n5zgq-uc.a.run.app/icu-bin/collation.html


Mark


On Wed, May 20, 2020 at 8:48 AM Dewi Kerbrat via CLDR-Users <
cldr-users at unicode.org> wrote:

> Hello,
>
> I work in public office for Breton language. We take part in the CLDR
> survey tool for Breton locale data.
> We would like to add a collation for breton language, but I have a few
> questions. I have read the collation guideline here
> http://cldr.unicode.org/index/cldr-spec/collation-guidelines, but the
> links to demos seem to be dead so I can?t test the rules I?ve built.
> The alphabet order in breton would be :
> a, b, c, ch, c'h, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v,
> w, x, y, z
>
> So I built those rules :
>
> &C<ch<<<Ch<<<CH<c?h<<<C?h<<<C?H
> &n<<?
>
> Is there a way to test them ? If we don?t add rules for accentuated
> characters, will they be considered as equal to the non-accentuated
> character or with a secondary level rule ? (for example, a<<  ? ?)
>
> Thank you for your help,
> Best regards
> --
>
> *Dewi Kerbrat*
> Penn Raktres Yezh ha Nevezi? Niverel ? Chef de Projet Langue et Innovation
> Num?rique
>
> *Ofis Publik Ar Brezhoneg*
> 8 plasenn ar Marichal Juin ? place du Mar?chal Juin
> 35000 ROAZHON ? RENNES
> dewi.kerbrat at opab.bzh
> www.brezhoneg.bzh
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at corp.unicode.org
> https://corp.unicode.org/mailman/listinfo/cldr-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/cldr-users/attachments/20200520/8cb0d4d7/attachment.htm>

From kipcole9 at gmail.com  Sat May 23 22:18:59 2020
From: kipcole9 at gmail.com (Kip Cole)
Date: Sun, 24 May 2020 11:18:59 +0800
Subject: Util to produce final merged locale.xml file?
Message-ID: <11707FAA-1E62-4F5F-BF05-1CD6270B923B@gmail.com>

My current CLDR-based implementation is based upon the JSON content. This is primarily because I?m not confident implementing the inheritance rules which are quite complex. 

I?d like to move to using the canonical XML data from CLDR-38 and I?m hoping there is a util that will implement the inheritance rules and export a fully merged main/locale.xml file.  

Any chance such a tool exists?

Many thanks, ?Kip


From mark at macchiato.com  Tue May 26 20:03:26 2020
From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=)
Date: Tue, 26 May 2020 18:03:26 -0700
Subject: Util to produce final merged locale.xml file?
In-Reply-To: <11707FAA-1E62-4F5F-BF05-1CD6270B923B@gmail.com>
References: <11707FAA-1E62-4F5F-BF05-1CD6270B923B@gmail.com>
Message-ID: <CAJ2xs_Guik9=1RKUpi-yOa6Ve4NFUnjXt4YG=5rJAKpSkgAHGQ@mail.gmail.com>

There is tooling to do that. The workhorse is CLDRFile. You create a file
by getting a factory for the file (see CLDRConfig.java), then calling
make(yourlocale,true) to get a resolved version. You can then call
CLDRFile's write(PrintWriter) to write out to a file. Sorry I don't have
more time to go into detail.

Mark


On Tue, May 26, 2020 at 2:39 PM Kip Cole via CLDR-Users <
cldr-users at unicode.org> wrote:

> My current CLDR-based implementation is based upon the JSON content. This
> is primarily because I?m not confident implementing the inheritance rules
> which are quite complex.
>
> I?d like to move to using the canonical XML data from CLDR-38 and I?m
> hoping there is a util that will implement the inheritance rules and export
> a fully merged main/locale.xml file.
>
> Any chance such a tool exists?
>
> Many thanks, ?Kip
>
>
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at corp.unicode.org
> https://corp.unicode.org/mailman/listinfo/cldr-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/cldr-users/attachments/20200526/10d38598/attachment.htm>