Locale bringup and barriers for entry

Steven R. Loomis via CLDR-Users cldr-users at unicode.org
Mon Sep 24 11:52:25 CDT 2018


Marcel and Philippe,
 I see some interesting discussion, though some of it was (as noted in
later emails) recapping existing bugs.
 However, please note how I began this discussion:

On Sat, Sep 22, 2018 at 1:15 AM Marcel Schneider via CLDR-Users <
cldr-users at unicode.org> wrote:

> > At the IUC conference last week, a few of us discussed around lunch some
> issues around getting new locales into CLDR, and barriers to entry.
>

The key word here is “new”-  locales not currently in CLDR.  For example,
Emoji category names are not a part of CLDR minimal data, and also, new
locales will not face issues around inheritance.

1. The English template may not be internally consistent, eg emoji category
> names may be singular or plural (plural throughout seems correct);
> 2. The English template may not be up-to-date, eg. still including ASCII
> quotes in exemplar punctuation though these have been ruled out;
> 3. The target data sets may not be comprehensively specified, eg the
> define of exemplar punctuation does include an exclusion clause for math
>      symbols only, while the clause about not including symbols on a
> programmatic usage basis such as # @ _ is still missing;
> 4. The English template may not be kept in synchrony with the
> specifications, eg emoji keywords not to include emoji name or name starter;
>

There are continuous improvements on the English side data. I don't think
the above are necessarily barriers to initial entry.


> 5. Numerous bugs affecting markup of inherited values (but these have been
> reported and are about to be fixed in the SurveyTool code).
>

Right.


> > http://cldr.unicode.org/index/cldr-spec/minimaldata - especially
> pluralization data
>
> The scope of pluralization seems unclear and biased by the English
> paradigm of genderlessness, while in other languages grammatical gender
> is a determining parameter for pluralization, so that even extensions to
> the DTD seem to be required for providing out-of-the-box pluralization
> rules.
>

I'm not sure what is meant by 'extensions to the DTD'.  In any event, CLDR
pluralization has proven to be largely successful in practice.
Do you have any specific concern about CLDR plurals? Is there a bug filed?


> > - what about fonts?
>


> > keyboards?
>
> I see fonts and keyboards actually as the two missing components of the
> stack that you designed, because though being part of locale data, input
> methods are a precondition of efficient submission of locale data. The
> full stack would thus expand to:
>




> > - what are the best ways to coordinate efforts between the language
> users and different technical experts?
> > Ideas:
> > - a web app to take in new locale data?
>
> I think CLDR has already its web app, ie SurveyTool. A full-time engineer
> is actually redeveloping and debugging several or all parts of it.
>

Again, the scope of this data is data for a completely new locale that is
not currently in CLDR. The idea would be an application just for taking in
data listed at http://cldr.unicode.org/index/cldr-spec/minimaldata


> > - a web app to debug/explore plurals?
>
> Before including this functionality in SurveyTool, where it belongs in, I
> think that the spec should be redesigned, and the documentation updated
> accordingly. That could eventually result in extended language support by
> CLDR/ICU, which would do no harm but only raise the product value.
>

Redesigned how? Again - do you have any specific concern about CLDR
plurals? Is there a bug filed?


> > - allowing some locales to 'get started' without plural rules?
>
> I think that any locale may get started in CLDR when providing date and
> time formats, while correctly displaying a reminder of a shopping cart
> may be left over for a later stage.
>

That's the general idea. (And a good way to put it, as a 'shopping cart'.)
Perhaps any data item that depends on plurals ( currency category, compact
decimal category, etc. ) would be 'locked' until it is unlocked by the
input of plural data.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20180924/330c12b6/attachment.html>


More information about the CLDR-Users mailing list