names, addresses, phone numbers

Edwin Hoogerbeets ehoogerbeets at gmail.com
Thu Apr 21 18:34:35 CDT 2016


Chris, you can see the data at:

https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/

Under there is 
https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/und/<countrycode> 
directories which contain the phone files for 22 countries. The phone 
files are phonefmt.json for the progressive formats designed to be used 
for format partial and full numbers while dialing digits in a phone UI, 
numplan.json for the basic numbering plan information, states.json which 
is a generated trie used for parsing area codes, and area.json which 
maps area codes to geolocations. A special case is the North American 
Number Plan (NANP) countries (Canada, US, Bermuda, and many Caribbean 
nations) which are all configured together in the 
https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/und/US 
directory for convenience.

Mike M, I can imagine that the area codes and geolocations change very 
regularly, but the formats do not. "(XXX) XXX-XXXX" has been the de 
facto standard American format for many, many years for example. Ilib 
contains multiple styles of format as well, since the format is often a 
matter of user preference instead of government mandate. See 
https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/und/DE/phonefmt.json 
for a country with 5 different possible styles.

Also under 
https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/und/<countrycode> 
are the address.json files. These are meta-information plus a list of 
regular expressions and hard-coded lists used to parse the addresses. It 
doesn't get it right all the time (the US one has problems with two word 
localities like "San Francisco" for example), but it gets it reasonably 
close, and pretty much every country in the world is covered.

Under 55 of the locale dirs are the name.json files which configure the 
name formats and settings for those languages. The top level contains a 
western-centric fall-back file used when the language doesn't have its 
own parser: 
https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/name.json. 
An example of Asian formats: 
https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/ja/name.json

Almost all of the phone data was gleaned either from the documents on 
the International Telecommunications Union site which has the officially 
published numbering plan documents for many countries, as well as 
wikipedia which has information about the formats. The address and name 
information is gleaned almost exclusively from wikipedia.

Edwin


On 04/20/2016 11:27 PM, Chris Leonard wrote:
> On Thu, Apr 21, 2016 at 1:34 AM, Edwin Hoogerbeets
> <ehoogerbeets at gmail.com> wrote:
>> I heard talk 2 or 3 years ago about a proposal to add name, address, and
>> phone number formats to CLDR. What ever happened to those efforts? I don't
>> really see data in CLDR 29 about those.
>>
>> In my i18n library for JS called "ilib", I have data about the address
>> formats for practically every country in the world, as well as the phone
>> formats and name formats for many countries. I would love to contribute this
>> data to CLDR and then later leverage other people's local knowledge to fill
>> in the gaps where my data is lacking...
>>
>> Can someone direct me to the folks who are working on these? Thanks,
>>
>
>
> Dear Edwin.
>
>
> I'd be interested in comparing your data to that in the glibc locales.
>
> Is there a link to your repo you can provide?
>
> cjl



More information about the CLDR-Users mailing list