UDHR in Unicode: 400 translations in text form!

Eric Muller eric.muller at efele.net
Mon Jun 29 08:49:22 CDT 2015


On 6/28/2015 12:20 PM, Philippe Verdy wrote:
> Note: The marker icons showing languages in the Leaflet component 
> (over the OSM map) are not working (broken links)

Fixed, I believe.

> Also the locations assigned of some international languages is strange:
>
> Esperanto ... Picard ... Standard French

These locations for those come from http://glottolog.org. Unless those 
locations are obviously wrong, I'd prefer to keep them aligned.

>  But in fact I would have placed those international languages 
> somewhere in the middle of an ocean, just aligned vertically in a list 
> along a meridian (across the Atlantic or Pacific for example)

A few are already in Antarctica. I'll move Esperanto and Interlingua there.

>
> Some languages do have an ISO 639-3 code. E.g.
> - Tetum, official in Timor-Leste, is currently "coded" as "010" 
> (mapped to "und" in ISO 639-3), it should be "tet".

In general, identification of the language of the translations is not 
trivial. I have learned to not trust just the names provided with the 
translations.

For this one, there is another translation, [tet], which most likely is 
tet/Tetun. [010] looks like a fairly different language and it is not 
clear to me that it is Tetun. I'd rather have some informed 
recommendation before assigning a language to [010]. It does not help 
that the source site does not seem accessible right now.


> - Forro (Saotomense) is a Portuguese-based creole in Sao Tome, 
> currently "coded" as "007" (mapped to "und"), it should use "cri".

The OHCHR site warns: "not to confuse Crioulo Santomense with Santomense 
(a variety and dialect of Portuguese in São Tomé and Príncipe)" Again, 
I'd prefer some informed recommendation.


> - Kimbundu should also use "kmb" and not "009"
> - Umbundo (Umbundu) should also use "umb" and not "011"

According to the Ethnologue, both Kimbundu and Umbundu are used both as 
language names and as family names. Given that I don't really trust the 
sources of those names, I'd prefer some informed recommendation.

Thanks,
Eric.



More information about the Unicode mailing list