Locale bringup and barriers for entry

Marcel Schneider via CLDR-Users cldr-users at unicode.org
Sat Sep 22 06:07:29 CDT 2018


I didn’t aim at doing what I’ve ended up doing, ie summing up a bunch of tickets already under process 
in a thread launched to welcome newcomers and showing ways of expanding CLDR support to all of 
the world’s locales.

Indeed over the details I forgot my first thoughts:

On 22/09/18 02:37 Steven R. Loomis via CLDR-Users wrote:
[…]
> - what are the best ways to coordinate efforts between the language users and different technical experts?

I can only encourage everyone to first make up our minds individually by taking a close look at the latest Charts, 
especially — as of learning how to include *new* locales — at the By-Type overviews of the set of locales that 
have already had the chance of making it into CLDR:

http://cldr.unicode.org/index/downloads

http://www.unicode.org/cldr/charts/latest/

https://www.unicode.org/cldr/charts/latest/by_type/index.html

https://www.unicode.org/cldr/charts/latest/by_type/core_data.alphabetic_information.html

https://www.unicode.org/cldr/charts/latest/by_type/core_data.alphabetic_information.main.html

https://www.unicode.org/cldr/charts/latest/by_type/core_data.alphabetic_information.punctuation.html

and so on.

Another important step is to read through the Information Hub for Linguists, the main documentation resource:

http://cldr.unicode.org/translation

from where we can access the detailed pages linked also from the information pane in SurveyTool.
Eg about plurals:

http://cldr.unicode.org/translation/plurals

I happened to start uninformed discussions prior to noticing that the documentation already provided 
sufficient instructions, or prior to sorting out what was already covered or what clarifications I needed…

A good way to prepare — if not already done — is also to learn XML and more specifically LDML, the 
Unicode Locale Data Markup Language, in order to be able to read and submit data in that format:

http://cldr.unicode.org/index/cldr-spec

linking:

http://www.unicode.org/reports/tr35/

Eg to understand how inheritance works:

http://www.unicode.org/reports/tr35/#Locale_Inheritance

That is key knowledge to understand what happens to us when working in SurveyTool, 
and to detect eventual inheritance display bugs — unlikely to happen anymore, though.

Now we’re ready for a take on the raw data, as downloaded or found in the online repository:

http://www.unicode.org/repos/cldr/tags/latest/
https://www.unicode.org/repos/cldr/tags/latest/common/
https://www.unicode.org/repos/cldr/tags/latest/common/main/

where we may wish to pick the locale that is closest to our new data, or that we know best among 
the precursors, or simply English for reference:

https://www.unicode.org/repos/cldr/tags/latest/common/main/en.xml

(Emoji-related data are in a separate repository:
https://www.unicode.org/repos/cldr/tags/latest/common/annotations/en.xml
)

I think best is to download a whole set of data in a zipped folder ; latest as of now are in:

http://www.unicode.org/Public/cldr/33.1/

and then open relevant files in a text editor with syntaxic highlighting and XML syntax checker.
Here’s finally my answer to the quoted question about how to coordinate efforts between users and experts:
All interested people may communicate by any available means all over the year, given SurveyTool fora have 
limited access and accept posts only during surveys, while being read-only for accredited people the rest of
the time. Likewise, SurveyTool submission forms are read-only except during relatively short windows of 
opportunity extending over 4..7 weeks two times a year.

Results of discussions may then be committed to a file in LDML/XML format. The easiest way is to take 
the English files, cut off eventually unreviewed parts, and replace English content with locale content.

The resulting files may then be submitted individually by each coordinated vetter using 
the SurveyTool bulk data upload feature:

http://cldr.unicode.org/index/survey-tool
http://cldr.unicode.org/index/survey-tool/guide
http://cldr.unicode.org/index/survey-tool/guide#TOC-Advanced-Features
http://cldr.unicode.org/index/survey-tool/upload

I think we’ll look whether we’ll try this out for French / fr-FR when the next rush starts on December 1ˢᵗ.

Good luck!

Marcel



More information about the CLDR-Users mailing list