Dataset for all ISO639 code sorted by country/territory?
mats.gbproject at gmail.com
Thu Nov 10 05:47:41 CST 2016
On 20 September 2016 at 18:34, Doug Ewell <doug at ewellic.org> wrote:
> > Is there any dataset that contains all languages in the world sorted
> > by country/territory?
> As others have pointed out, be careful about how slippery this slope can
> get. Everyone has his or her own opinion about how many speakers of
> Language X in country Y need to be identified, estimated, or conjectured
> in order to say that "language X is spoken in country Y."
For myself I was not actually considering the amount of speakers in each
country, but to map languages with countries/territories where the language
originated or have been spoken traditionally.
For instance in Norway we do have many immigrants from Pakistan, but I
doubt any of them would expect to see Urdu sorted under Norway, even though
there are many people in Norway that speak Urdu.
They would expect to see it under Pakistan that is a their heritage
country, I guess this is a lot an identity issue also
I do understand that it is not easy to get a perfect language-country
mapping, and I guess the mapping also depend on the use.
For myself I want people to be able to sort languages by
country/territories to make it easier to make lists of translations, I
think it can be good to be able to sort by territories instead of providing
a looong list of languages.
So I guess what matters is which language people mostly expect to find
under the country/territory.
> > I manage to find a dataset on the website of Ethnologue, though it
> > doesn't look like open source, need to check with them exactly how I'm
> > allowed to use it:
> > http://www.ethnologue.com/codes/download-code-tables
> The readme file included in the downloadable zip file makes SIL's terms
> very clear. Basically you need to credit SIL as the source of the data,
> not change it, and not make the data directly available for others to
> download. It's best not to get caught up in "open source" as if any
> other terms would make the data totally unusable.
I agree that a dataset is not unusable just because it is not open source,
but for myself I in fact need a dowbloadable file!
I tried contact SiL but they will only sell the dataset for a fee and will
not give an open source license.
Would it be possible to extend this dataset to all languages and start
build an open source data set for language-territory mapping?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode