Locale bringup and barriers for entry

Marcel Schneider via CLDR-Users cldr-users at unicode.org
Sat Sep 22 03:15:09 CDT 2018


Thank you Steven for sharing these useful resources and for the effort you and others undertook 
in vulgarizing some insights about what is CLDR, what is locale data, and how to bring these together.
 
To start discussion, here are a few thoughts crossing my mind based on experience of past survey round:

On 22/09/18 02:37 Steven R. Loomis via CLDR-Users wrote:
> 
> Hello, and welcome to the new cldr-users members.

Thanks.

> For discussion:
>
> At the IUC conference last week, a few of us discussed around lunch some issues around getting new locales into CLDR, and barriers to entry.
> Barriers:
> - we discussed that it could be confusing or difficult to collect all of the data needed for a minimal locale:

Some main sources of confusion seem to me:
1. The English template may not be internally consistent, eg emoji category names may be singular or plural (plural throughout seems correct);
2. The English template may not be up-to-date, eg. still including ASCII quotes in exemplar punctuation though these have been ruled out;
3. The target data sets may not be comprehensively specified, eg the define of exemplar punctuation does include an exclusion clause for math
     symbols only, while the clause about not including symbols on a programmatic usage basis such as # @ _ is still missing;
4. The English template may not be kept in synchrony with the specifications, eg emoji keywords not to include emoji name or name starter;
5. Numerous bugs affecting markup of inherited values (but these have been reported and are about to be fixed in the SurveyTool code).

> http://cldr.unicode.org/index/cldr-spec/minimaldata - especially pluralization data

The scope of pluralization seems unclear and biased by the English paradigm of genderlessness, while in other languages grammatical gender
is a determining parameter for pluralization, so that even extensions to the DTD seem to be required for providing out-of-the-box pluralization rules.

> - what about fonts?

Invisibles and confusables should be visualized and distinguished throughout, ie both in SurveyTool and in Charts. While SurveyTool already shows
U+200E LEFT-TO-RIGHT MARK and U+200F RIGHT-TO-LEFT MARK, confusables like spaces and apostrophes are still hard or not to distinguish.
That’s in the nature of the related charactes, eg U+00A0 NO-BREAK SPACE is defined as being like U+0020 SPACE except for line-break behavior, 
and the preferred glyph of U+02BC MODIFIER LETTER APOSTROPHE is the same as that of U+2019 RIGHT SINGLE QUOTATION MARK.

> keyboards?

I see fonts and keyboards actually as the two missing components of the stack that you designed, because though being part of locale data, input
methods are a precondition of efficient submission of locale data. The full stack would thus expand to:

1. Encoding
2. Fonts
3. Input methods
4. Locale data

> - what are the best ways to coordinate efforts between the language users and different technical experts?
> Ideas:
> - a web app to take in new locale data?

I think CLDR has already its web app, ie SurveyTool. A full-time engineer is actually redeveloping and debugging several or all parts of it.

> - a web app to debug/explore plurals?

Before including this functionality in SurveyTool, where it belongs in, I think that the spec should be redesigned, and the documentation updated 
accordingly. That could eventually result in extended language support by CLDR/ICU, which would do no harm but only raise the product value.

> - allowing some locales to 'get started' without plural rules?

I think that any locale may get started in CLDR when providing date and time formats, while correctly displaying a reminder of a shopping cart 
may be left over for a later stage.

> Links for discussion:
> - Elnaz and Steven's prez from (last) Monday: https://goo.gl/sN7biw
> - My "full stack" blog post: https://srl295.github.io/2017/06/06/full-stack-enablement/

Thanks. Have read and discussed following the hints you provided.

Regards,

Marcel




_______________________________________________
CLDR-Users mailing list
CLDR-Users at unicode.org
http://unicode.org/mailman/listinfo/cldr-users




More information about the CLDR-Users mailing list