Dataset for all ISO639 code sorted by country/territory?
doug at ewellic.org
Thu Nov 10 11:56:58 CST 2016
Mats Blakstad wrote:
> For myself I was not actually considering the amount of speakers in
> each country, but to map languages with countries/territories where
> the language originated or have been spoken traditionally.
And that is where I think you'll have disagreement on the details.
> So I guess what matters is which language people mostly expect to find
> under the country/territory.
Yep, that's the challenge.
> Would it be possible to extend this dataset to all languages and start
> build an open source data set for language-territory mapping?
That's a good question for the CLDR folks, who have their own mailing
Keep in mind that the CLDR table documents 675 of the world's best-known
languages, counting variants such as three different orthographies of
Uzbek. While anything is possible, extending this to "all languages,"
e.g. the other 6,300 lesser-known living languages, might require a bit
of time and money.
There is also a resource in the "UDHR in Unicode" project that might be
worth investigating, though it too is an imperfect match with what you
seem to be looking for.
Doug Ewell | Thornton, CO, US | ewellic.org
More information about the Unicode