Dataset for all ISO639 code sorted by country/territory?

Mark Davis ☕️ mark at macchiato.com
Sun Nov 20 16:20:40 CST 2016


The way we are set up now in CLDR, people can always provide additional
information on language population via tickets, such as
http://unicode.org/cldr/trac/ticket/9856. And on the status (official, etc)
of each language. There is already a ticket to allow the addition of
subdivisions (http://unicode.org/cldr/trac/ticket/9897).

We've had probably about 300 such tickets and there are others slated for
the current release. The process is far from as simple as you state, since
we need to have accessible, authoritative references for the data. And
often when we look into those sources, we find that the figures stated in
the ticket are simply wrong, and need to be corrected. Or the source cite
figures are themselves out of date.

So any willing parties, such as you, can do the research and supply more
data.

As for changes over time: the data is stated in terms of percentages of the
country's population. So if the language growth is roughly the same as the
overall country's population growth, then that is reflected in the figures
going forward. Of course, where the growth (or decrease) varies from the
country's (which can clearly happen over time, or in case of upheavals or
population movements), then people should file tickets to correct the
values.

Mark

BTW, in an ideal world, for each country we'd be able to collect a set of
language tuples for people who are functional in each language in the
tuple, with a percentage of the population that each applies to, eg:

75% {English}
15% {English, Spanish}
7.5% {Spanish}
...

Some countries collect and make available data that is roughly at that
level in each census, but most do not. Thus we are not able to provide that
kind of data (which would be very useful).

Mark

On Sun, Nov 20, 2016 at 12:35 PM, Mats Blakstad <mats.gbproject at gmail.com>
wrote:

> I understand it would take a lot of time to collect the full data, but it
> also depends on how much engagement you manage to create for the work.
>
> On the other side: to simply allow users to start provide the data is
> first step in the process, and to do it would take very little time to do
> it!
>
> On 20 November 2016 at 19:54, Doug Ewell <doug at ewellic.org> wrote:
>
>> Mats,
>>
>> I think you are genuinely underestimating the time and effort that this
>> project would take.
>>
>> --
>> Doug Ewell | Thornton, CO, US | ewellic.org
>>
>>
>
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20161120/bca89c99/attachment.html>


More information about the CLDR-Users mailing list