Locale bringup and barriers for entry

Marcel Schneider via CLDR-Users cldr-users at unicode.org
Tue Sep 25 00:54:25 CDT 2018


On 24/09/18 22:18 Mark Davis ☕️ wrote:
[quote]
>
> The focus for plurals in CLDR is "what would change if I change a number to another number in a placeholder".
> So if I have a message with a masculine noun, I have two versions:
>
> one: "{number} libro è selezionato"
> other: "{number} libri sono selezionati"
>
> vs also 2 versions with a feminine noun.
>
> one: "{number} nota è selezionata"
> other: "{number} note è selezionato

I’m turning out unable to retrieve plural rules in the LDML tree, except some plural and ordinal minimal pairs.

Also the actual DTD does not seem to contain what is found in the LDML spec at:

https://www.unicode.org/reports/tr35/tr35-numbers.html#Language_Plural_Rules

The DTD only has:

<!ELEMENT minimalPairs ( alias | ( pluralMinimalPairs*, ordinalMinimalPairs*, special* ) ) >
<!ATTLIST minimalPairs alt NMTOKENS #IMPLIED >
<!ATTLIST minimalPairs draft (approved | contributed | provisional | unconfirmed) #IMPLIED >

<!ELEMENT pluralMinimalPairs ( #PCDATA ) >
<!ATTLIST pluralMinimalPairs count NMTOKEN #IMPLIED >
<!ATTLIST pluralMinimalPairs alt NMTOKENS #IMPLIED >
<!ATTLIST pluralMinimalPairs draft (approved | contributed | provisional | unconfirmed) #IMPLIED >

That tends at conjecturing that plural support is still partly under construction,
hence probably the stress put upon it in Steven’s posting.

Consistently, at locale level, eg for Italian, common/main/it.xml only has:

<minimalPairs>
<pluralMinimalPairs count="one">{0} giorno</pluralMinimalPairs>
<pluralMinimalPairs count="other">{0} giorni</pluralMinimalPairs>
<ordinalMinimalPairs ordinal="many">Prendi l’{0}° a destra.</ordinalMinimalPairs>
<ordinalMinimalPairs ordinal="other">Prendi la {0}° a destra.</ordinalMinimalPairs>
</minimalPairs>

That is what I meant when complaining about gender support.
Following your exemplar data, we should have additional data, and I can see no structure 
to accomodate additional forms:

<pluralMinimalPairs count="one">{0} libro è selezionato</pluralMinimalPairs>
<pluralMinimalPairs count="other">{0} libri sono selezionati</pluralMinimalPairs>
<pluralMinimalPairs count="one">{0} nota è selezionata</pluralMinimalPairs>
<pluralMinimalPairs count="other">{0} note sono selezionate</pluralMinimalPairs>

The apparent redundancy induced might be disambiguated by adding a gender attribute:

<pluralMinimalPairs gender="masculine" count="one">{0} libro è selezionato</pluralMinimalPairs>
<pluralMinimalPairs gender="masculine" count="other">{0} libri sono selezionati</pluralMinimalPairs>
<pluralMinimalPairs gender="feminine" count="one">{0} nota è selezionata</pluralMinimalPairs>
<pluralMinimalPairs gender="feminine" count="other">{0} note sono selezionate</pluralMinimalPairs>

The case is also striking when considering ordinal minimal pairs. 
To start, I can find no clear definition of what "few" and "many" are to represent.
Hence I’m unable to make sense of the following, although that may result from my incompetence in 
Italian, and not using Google Translate right now to enlighten me (although I heavily used it elsewhere):

<ordinalMinimalPairs ordinal="many">Prendi l’{0}° a destra.</ordinalMinimalPairs>

When making a case for gender here, taking something like "via" for feminine, and "camino" for 
masculine, and "prima"/"primo" for "one" vs "terzia"/"terzio" for "other", the data above would 
IMO expand to:

<ordinalMinimalPairs gender="feminine" ordinal="one">Prendi la {0}ª a destra.</ordinalMinimalPairs>
<ordinalMinimalPairs gender="feminine" ordinal="many">???</ordinalMinimalPairs>
<ordinalMinimalPairs gender="feminine" ordinal="other">Prendi la {0}ª a destra.</ordinalMinimalPairs>
<ordinalMinimalPairs gender="masculine" ordinal="one">Prendi il {0}º a destra.</ordinalMinimalPairs>
<ordinalMinimalPairs gender="masculine" ordinal="many">???</ordinalMinimalPairs>
<ordinalMinimalPairs gender="masculine" ordinal="other">Prendi il {0}º a destra.</ordinalMinimalPairs>

Assuming that "many" stands for "8" — which should be defined somewhere — and collapsing redundant 
defines, the result would be akin to the original data (although with proper ordinal indicators):

<ordinalMinimalPairs gender="feminine" ordinal="many">Prendi l’{0}ª a destra.</ordinalMinimalPairs>
<ordinalMinimalPairs gender="feminine" ordinal="other">Prendi la {0}ª a destra.</ordinalMinimalPairs>
<ordinalMinimalPairs gender="masculine" ordinal="many">Prendi l’{0}º a destra.</ordinalMinimalPairs>
<ordinalMinimalPairs gender="masculine" ordinal="other">Prendi il {0}º a destra.</ordinalMinimalPairs>

Perhaps ticket #11393 is related to this topic.


> Now, there are some languages (eg Russian) that only exhibit differences
> for one of the plural categories if there is certain gender involved.
> So the plural categories themselves need to be the maximal partition
> across the possible genders, cases, and other features.

Perhaps I’m silly, still I’m unable to figure out how "minimal pairs" can represent "maximal partition".

> What is NOT in scope for CLDR at this time is to both change gender and number.
> Typically that requires many other changes in the rest of the text.

What I mean is not that CLDR should show the way of transforming content across gender.
What I mean is that CLDR should provide support for both feminine/masculine and masculine/feminine 
patterns. Actually gender support seems to be limited to what English examples suggest as a translation,
be it masculine when "day" translates to "giorno", or feminine when "street" translates to "via".
That is what I think is insufficient.


> one: "{number} {thing} è selezionata"
> ...
>  
> ICU has a mechanism for doing a SELECT using gender, but there the gender has to be supplied
> as a parameter, and a sub-message supplied for each of the (say) 3 genders x 4 plural-categories.
>
> Actually detecting the gender of nouns and modifying sentences on that basis is out of scope
> (and a very tricky problem in general).

That seems OK to me as long as CLDR actually helps developers with data for any case they may 
encounter when setting up the values. Else they may wish to just look up a dictionary and a grammar 
of the target locale to find out by themselves what are the cases they have to consider.

[quote]
> > > Perhaps any data item that depends on plurals ( currency category, compact decimal category, etc. )
> > > would be 'locked' until it is unlocked by the input of plural data.
> >  
> > Provided that “locking” an item won’t cause a blank or another sort of bug. 
> > When a user sees an item not pluralized where it is expected to be plural, 
> > then simply inferring that pluralization isn’t ready might be straightforward.
> > There will surely be some IF in the code to prevent the app from crashing.
>
> What we have considered (there is a ticket for this somewhere) is disallowing any data/votes
> to be entered in a row with a "count" or "ordinal" attribute until the rules (resp. plural or ordinal)
> are supplied. The row would either be grayed out or just omitted.
> So data could be entered in the locale for other fields, but the locale couldn't reach moderate
> or modern coverage without the rules. So applications not requiring that coverage level could
> include the locale, but those requiring that coverage level would omit it.

Sorry, I misunderstood the scope. Thanks for explaining.

Perhaps the ticket may be #11061


Indeed that makes for clean data and ensures reliability of CLDR.
If so many plural rule data are missing that CLDR must make a special case for it, 
that may result from the difficulties that non-expert vetters like me are experiencing 
with the topic. Now that CLDR plural rules are reported to work well in practice, I’m 
wondering about how all that interconnects. Eg obviously some rules are working well, 
especially when matching some frequent uses cases. But the point as I can see it is 
whether CLDR is covering *all* use cases, eventually except very rare ones.


Thanks.

Regards,

Marcel



More information about the CLDR-Users mailing list