From beng at fb.com Mon Nov 2 12:25:11 2015 From: beng at fb.com (Ben Hamilton) Date: Mon, 2 Nov 2015 10:25:11 -0800 Subject: Feedback on CLDR JSON and encoding crucial data only in keys Message-ID: <5637AA87.2060404@fb.com> Hi folks, I'm working on a server to allow arbitrary queries of slices of CLDR data using the GraphQL protocol (https://facebook.github.io/graphql/). While working with the fully resolved CLDR JSON data, I noticed a few design decisions that complicate building a structured object model (required by GraphQL) to represent it: 1) Crucial LDML data is often encoded only in JSON keys, requiring clients to parse keys to extract them For example, number formats (e.g. from main/root/numbers.json) require parsing the keys to know the range of values to which the format should be applied: "decimalFormat": { "1000-count-other": "0K", "10000-count-other": "00K", "100000-count-other": "000K", "1000000-count-other": "0M", (snip) } If I wanted to build an object model to represent this, I'd need to know that the keys of this dictionary include three pieces of data separated by "-" and write a parser which understands the meaning of each section. This becomes much more complicated when dealing with dateFields.json, which include keys with particularly complex encodings. From main/root/dateFields.json: "sat-narrow": { "relative-type--1": "last Sa", "relative-type-0": "this Sa", "relative-type-1": "next Sa" }, "dayperiod": { "displayName": "AM/PM", "displayName-alt-variant": "am/pm" }, For this, I need to know that the "-" separators have multiple meanings, and might be present (or not), and could act either as a field separator, or as a negation operation in front of a number. I think we can keep the keys as-is as opaque unique identifiers, but the values should be more structured. A map with separate fields for the meanings of each item in the key (plus the original value) would be great. The original XML format does this pretty well; I think we can do that in the JSON without too much trouble. 2) Much of the LDML data is represented as serialized UTS #35 UnicodeSet objects, which requires deserializing them to understand the underlying meaning For example, main/root/characters.json includes: "characters": { "exemplarCharacters": "[]", "auxiliary": "[]", "punctuation": "[\\\\- , ; \\\\: ! ? . ( ) \\\\[ \\\\] \\\\{ \\\\}]", (snip) } This means every program which wants to interact with this data needs to include a UTS #35 UnicodeSet deserializer (or forward the raw patterns on to the client with the assumption that it will include a UnicodeSet deserializer). For many languages including JavaScript / ECMAScript, I don't think there exists such a deserializer today?please let me know if I'm wrong! Ben From mark at macchiato.com Tue Nov 3 00:32:14 2015 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Mon, 2 Nov 2015 22:32:14 -0800 Subject: Feedback on CLDR JSON and encoding crucial data only in keys In-Reply-To: <5637AA87.2060404@fb.com> References: <5637AA87.2060404@fb.com> Message-ID: I suggest that you file this as a bug, and we can discuss in the meeting. For #1, the knottiest issue is "dayperiod": { "displayName": "AM/PM", "displayName-alt-variant": "am/pm" }, We've wrestled with this. As I recall, we considered fleshing it out, be something like: "dayperiod": { "displayName": { "plain": "AM/PM", "variant": "am/pm" }, }, But because 'alt' could potentially go on every leaf node that would require adding a level (and "plain") for essentially every leaf node. (And where alt can go on non-leaf nodes we'd have to work that in also.) But we could explore some ideas. For #2, we could probably go to a simpler format for JSON. We could look at space-delimited strings, maybe with a special sequence for ranges, that would be easy to parse. Mark On Mon, Nov 2, 2015 at 10:25 AM, Ben Hamilton wrote: > Hi folks, > > I'm working on a server to allow arbitrary queries of slices of CLDR data > using the GraphQL protocol (https://facebook.github.io/graphql/). > > While working with the fully resolved CLDR JSON data, I noticed a few > design decisions that complicate building a structured object model > (required by GraphQL) to represent it: > > 1) Crucial LDML data is often encoded only in JSON keys, requiring clients > to parse keys to extract them > > For example, number formats (e.g. from main/root/numbers.json) require > parsing the keys to know the range of values to which the format should be > applied: > > "decimalFormat": { > "1000-count-other": "0K", > "10000-count-other": "00K", > "100000-count-other": "000K", > "1000000-count-other": "0M", > (snip) > } > > If I wanted to build an object model to represent this, I'd need to know > that the keys of this dictionary include three pieces of data separated by > "-" and write a parser which understands the meaning of each section. > > This becomes much more complicated when dealing with dateFields.json, > which include keys with particularly complex encodings. From > main/root/dateFields.json: > > "sat-narrow": { > "relative-type--1": "last Sa", > "relative-type-0": "this Sa", > "relative-type-1": "next Sa" > }, > "dayperiod": { > "displayName": "AM/PM", > "displayName-alt-variant": "am/pm" > }, > > For this, I need to know that the "-" separators have multiple meanings, > and might be present (or not), and could act either as a field separator, > or as a negation operation in front of a number. > > I think we can keep the keys as-is as opaque unique identifiers, but the > values should be more structured. A map with separate fields for the > meanings of each item in the key (plus the original value) would be great. > The original XML format does this pretty well; I think we can do that in > the JSON without too much trouble. > > 2) Much of the LDML data is represented as serialized UTS #35 UnicodeSet > objects, which requires deserializing them to understand the underlying > meaning > > For example, main/root/characters.json includes: > > "characters": { > "exemplarCharacters": "[]", > "auxiliary": "[]", > "punctuation": "[\\\\- , ; \\\\: ! ? . ( ) \\\\[ \\\\] \\\\{ \\\\}]", > (snip) > } > > This means every program which wants to interact with this data needs to > include a UTS #35 UnicodeSet deserializer (or forward the raw patterns on > to the client with the assumption that it will include a UnicodeSet > deserializer). > > For many languages including JavaScript / ECMAScript, I don't think there > exists such a deserializer today?please let me know if I'm wrong! > > Ben > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From beng at fb.com Tue Nov 3 10:36:09 2015 From: beng at fb.com (Ben Hamilton) Date: Tue, 3 Nov 2015 16:36:09 +0000 Subject: Feedback on CLDR JSON and encoding crucial data only in keys In-Reply-To: References: <5637AA87.2060404@fb.com>, Message-ID: Will do, thanks! I'll file two separate issues, since they're unrelated. Outlook ???? On Mon, Nov 2, 2015 at 10:32 PM -0800, "Mark Davis ??" > wrote: I suggest that you file this as a bug, and we can discuss in the meeting. For #1, the knottiest issue is "dayperiod": { "displayName": "AM/PM", "displayName-alt-variant": "am/pm" }, We've wrestled with this. As I recall, we considered fleshing it out, be something like: "dayperiod": { "displayName": { "plain": "AM/PM", "variant": "am/pm" }, }, But because 'alt' could potentially go on every leaf node that would require adding a level (and "plain") for essentially every leaf node. (And where alt can go on non-leaf nodes we'd have to work that in also.) But we could explore some ideas. For #2, we could probably go to a simpler format for JSON. We could look at space-delimited strings, maybe with a special sequence for ranges, that would be easy to parse. Mark On Mon, Nov 2, 2015 at 10:25 AM, Ben Hamilton > wrote: Hi folks, I'm working on a server to allow arbitrary queries of slices of CLDR data using the GraphQL protocol (https://facebook.github.io/graphql/). While working with the fully resolved CLDR JSON data, I noticed a few design decisions that complicate building a structured object model (required by GraphQL) to represent it: 1) Crucial LDML data is often encoded only in JSON keys, requiring clients to parse keys to extract them For example, number formats (e.g. from main/root/numbers.json) require parsing the keys to know the range of values to which the format should be applied: "decimalFormat": { "1000-count-other": "0K", "10000-count-other": "00K", "100000-count-other": "000K", "1000000-count-other": "0M", (snip) } If I wanted to build an object model to represent this, I'd need to know that the keys of this dictionary include three pieces of data separated by "-" and write a parser which understands the meaning of each section. This becomes much more complicated when dealing with dateFields.json, which include keys with particularly complex encodings. From main/root/dateFields.json: "sat-narrow": { "relative-type--1": "last Sa", "relative-type-0": "this Sa", "relative-type-1": "next Sa" }, "dayperiod": { "displayName": "AM/PM", "displayName-alt-variant": "am/pm" }, For this, I need to know that the "-" separators have multiple meanings, and might be present (or not), and could act either as a field separator, or as a negation operation in front of a number. I think we can keep the keys as-is as opaque unique identifiers, but the values should be more structured. A map with separate fields for the meanings of each item in the key (plus the original value) would be great. The original XML format does this pretty well; I think we can do that in the JSON without too much trouble. 2) Much of the LDML data is represented as serialized UTS #35 UnicodeSet objects, which requires deserializing them to understand the underlying meaning For example, main/root/characters.json includes: "characters": { "exemplarCharacters": "[]", "auxiliary": "[]", "punctuation": "[\\\\- , ; \\\\: ! ? . ( ) \\\\[ \\\\] \\\\{ \\\\}]", (snip) } This means every program which wants to interact with this data needs to include a UTS #35 UnicodeSet deserializer (or forward the raw patterns on to the client with the assumption that it will include a UnicodeSet deserializer). For many languages including JavaScript / ECMAScript, I don't think there exists such a deserializer today?please let me know if I'm wrong! Ben _______________________________________________ CLDR-Users mailing list CLDR-Users at unicode.org http://unicode.org/mailman/listinfo/cldr-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From beng at fb.com Tue Nov 3 11:12:32 2015 From: beng at fb.com (Ben Hamilton) Date: Tue, 3 Nov 2015 09:12:32 -0800 Subject: Feedback on CLDR JSON and encoding crucial data only in keys In-Reply-To: References: <5637AA87.2060404@fb.com>, Message-ID: <5638EB00.6000208@fb.com> Filed http://unicode.org/cldr/trac/ticket/9061 and http://unicode.org/cldr/trac/ticket/9062. Ben > Ben Hamilton > November 3, 2015 at 8:36 AM > Will do, thanks! I'll file two separate issues, since they're unrelated. > > Outlook > > ???? > > > > > I suggest that you file this as a bug, and we can discuss in the meeting. > > For #1, the knottiest issue is > "dayperiod": { > "displayName": "AM/PM", > "displayName-alt-variant": "am/pm" > }, > > We've wrestled with this. As I recall, we considered fleshing it out, > be something like: > > "dayperiod": { > "displayName": { > "plain": "AM/PM", > "variant": "am/pm" > }, > }, > But because 'alt' could potentially go on every leaf node that would > require adding a level (and "plain") for essentially every leaf node. > (And where alt can go on non-leaf nodes we'd have to work that in > also.) But we could explore some ideas. > > For #2, we could probably go to a simpler format for JSON. We could > look at space-delimited strings, maybe with a special sequence for > ranges, that would be easy to parse. > > > > Mark > // > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__unicode.org_mailman_listinfo_cldr-2Dusers&d=CwICAg&c=5VD0RTtNlTh3ycd41b3MUw&r=gpfxwYl04l1SD8BzaPGd9w&m=Ib0aMGh4Jfxynsnzk_RceX-daYG0QLcAKfILVk2InXU&s=7YR3UwcbXoW_O4m0vgEYJwDX0607f0TKOttZNrPsc6w&e= > > Ben Hamilton > November 2, 2015 at 10:25 AM > Hi folks, > > I'm working on a server to allow arbitrary queries of slices of CLDR > data using the GraphQL protocol > (https://urldefense.proofpoint.com/v2/url?u=https-3A__facebook.github.io_graphql_&d=CwIFEA&c=5VD0RTtNlTh3ycd41b3MUw&r=gpfxwYl04l1SD8BzaPGd9w&m=rT51mQQLvsjt7sQglXXiKvR6CKDXtrULU44pV-coQyw&s=MPkKRjhTfCClsHWZu5SrFLR1vWCEgVy7N1UTDAQMV5k&e= > ). > > While working with the fully resolved CLDR JSON data, I noticed a few > design decisions that complicate building a structured object model > (required by GraphQL) to represent it: > > 1) Crucial LDML data is often encoded only in JSON keys, requiring > clients to parse keys to extract them > > For example, number formats (e.g. from main/root/numbers.json) require > parsing the keys to know the range of values to which the format > should be applied: > > "decimalFormat": { > "1000-count-other": "0K", > "10000-count-other": "00K", > "100000-count-other": "000K", > "1000000-count-other": "0M", > (snip) > } > > If I wanted to build an object model to represent this, I'd need to > know that the keys of this dictionary include three pieces of data > separated by "-" and write a parser which understands the meaning of > each section. > > This becomes much more complicated when dealing with dateFields.json, > which include keys with particularly complex encodings. From > main/root/dateFields.json: > > "sat-narrow": { > "relative-type--1": "last Sa", > "relative-type-0": "this Sa", > "relative-type-1": "next Sa" > }, > "dayperiod": { > "displayName": "AM/PM", > "displayName-alt-variant": "am/pm" > }, > > For this, I need to know that the "-" separators have multiple > meanings, and might be present (or not), and could act either as a > field separator, or as a negation operation in front of a number. > > I think we can keep the keys as-is as opaque unique identifiers, but > the values should be more structured. A map with separate fields for > the meanings of each item in the key (plus the original value) would > be great. The original XML format does this pretty well; I think we > can do that in the JSON without too much trouble. > > 2) Much of the LDML data is represented as serialized UTS #35 > UnicodeSet objects, which requires deserializing them to understand > the underlying meaning > > For example, main/root/characters.json includes: > > "characters": { > "exemplarCharacters": "[]", > "auxiliary": "[]", > "punctuation": "[\\\\- , ; \\\\: ! ? . ( ) \\\\[ \\\\] \\\\{ \\\\}]", > (snip) > } > > This means every program which wants to interact with this data needs > to include a UTS #35 UnicodeSet deserializer (or forward the raw > patterns on to the client with the assumption that it will include a > UnicodeSet deserializer). > > For many languages including JavaScript / ECMAScript, I don't think > there exists such a deserializer today?please let me know if I'm wrong! > > Ben > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__unicode.org_mailman_listinfo_cldr-2Dusers&d=CwIFEA&c=5VD0RTtNlTh3ycd41b3MUw&r=gpfxwYl04l1SD8BzaPGd9w&m=rT51mQQLvsjt7sQglXXiKvR6CKDXtrULU44pV-coQyw&s=i9nNfvh9I-e-O1a11xM7qMW6XX08kqnEl66m8UELYZA&e= > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mats.gbproject at gmail.com Sun Nov 8 17:59:18 2015 From: mats.gbproject at gmail.com (Mats Blakstad) Date: Mon, 9 Nov 2015 00:59:18 +0100 Subject: #9066: Territory code request for Abkhazia Message-ID: I field this issue yesterday: http://unicode.org/cldr/trac/ticket/9066 What criteria do you have for assigning codes to new territories? Should I provide localization data now, and do you prefer to get a patch or just to list up data in the issue summary? Or should I first wait and see reaction on the issue? -------------- next part -------------- An HTML attachment was scrubbed... URL: From mats.gbproject at gmail.com Sun Nov 8 18:05:00 2015 From: mats.gbproject at gmail.com (Mats Blakstad) Date: Mon, 9 Nov 2015 01:05:00 +0100 Subject: "Svalbard" and "Jan Mayen" subdivisions of Norway Message-ID: Check out this page. Some subdivisions, like "Hong Kong" also have a territory code. They are marked, like CN-91 = HK (Hong Kong SAR China): http://www.unicode.org/cldr /charts/latest/supplemental/territory_subdivisions.html However, I can't find this type of mapping in the cldr core, are they there? If not it would be great to have them there! I noticed that Norway have "Svalbard" and "Jan Mayen" listed as subdivisions. They also have their own territory codes, but this is not marked on the page i linked to. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dzo at bisharat.net Mon Nov 9 11:11:08 2015 From: dzo at bisharat.net (Don Osborn) Date: Mon, 9 Nov 2015 12:11:08 -0500 Subject: UN M.49 in language tags & locales Message-ID: <5640D3AC.6040404@bisharat.net> Trying to catch up on what the current rules are on use of 3-digit region codes in language tags and got lost in the wording of RFC 5646. Also interested to know if these UN M.49 codes can still be used in locales. This is for general info for the benefit of African localizers, not for a specific localization project. Is there any quick guide that covers this particular issue? BTW, it looks like the Territory Containment table at http://www.unicode.org/cldr/charts/latest/supplemental/territory_containment_un_m_49.html needs to be updated to show South Sudan in Eastern Africa (per http://unstats.un.org/unsd/methods/m49/m49regin.htm ). I'll file a bug report on that. Don Osborn From doug at ewellic.org Mon Nov 9 12:36:16 2015 From: doug at ewellic.org (Doug Ewell) Date: Mon, 09 Nov 2015 11:36:16 -0700 Subject: UN M.49 in language tags & locales Message-ID: <20151109113616.665a7a7059d7ee80bb4d670165c8327d.ee8bd3f200.wbe@email03.secureserver.net> Don Osborn wrote: > Trying to catch up on what the current rules are on use of 3-digit > region codes in language tags and got lost in the wording of RFC 5646. > Also interested to know if these UN M.49 codes can still be used in > locales. The rules were set out in RFC 4645, Section 2. In short, the code elements for macro-geographical regions are in; those for individual countries (because they already have two-letter code elements from ISO 3166-1) and for non-geographical categories like "economic groupings" are out. You can get an up-to-date list by looking in the Language Subtag Registry [1]. The three-digit region subtags are listed immediately after the two-letter ones. [1] http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From addison at lab126.com Mon Nov 9 12:41:55 2015 From: addison at lab126.com (Phillips, Addison) Date: Mon, 9 Nov 2015 18:41:55 +0000 Subject: UN M.49 in language tags & locales In-Reply-To: <20151109113616.665a7a7059d7ee80bb4d670165c8327d.ee8bd3f200.wbe@email03.secureserver.net> References: <20151109113616.665a7a7059d7ee80bb4d670165c8327d.ee8bd3f200.wbe@email03.secureserver.net> Message-ID: > > Also interested to know if these UN M.49 codes can still be used in > > locales. Also: Can still be and are actively used in locales, if, by locales, you mean CLDR or ICU (and implementations that have adopted these). The most well-known of them is "es-419". Addison > -----Original Message----- > From: Ietf-languages [mailto:ietf-languages-bounces at alvestrand.no] On > Behalf Of Doug Ewell > Sent: Monday, November 09, 2015 10:36 AM > To: Don Osborn; cldr-users at unicode.org; ietf-languages at iana.org > Subject: RE: UN M.49 in language tags & locales > > Don Osborn wrote: > > > Trying to catch up on what the current rules are on use of 3-digit > > region codes in language tags and got lost in the wording of RFC 5646. > > Also interested to know if these UN M.49 codes can still be used in > > locales. > > The rules were set out in RFC 4645, Section 2. > > In short, the code elements for macro-geographical regions are in; those for > individual countries (because they already have two-letter code elements > from ISO 3166-1) and for non-geographical categories like "economic > groupings" are out. > > You can get an up-to-date list by looking in the Language Subtag Registry [1]. > The three-digit region subtags are listed immediately after the two-letter > ones. > > [1] > http://www.iana.org/assignments/language-subtag-registry/language- > subtag-registry > > -- > Doug Ewell | http://ewellic.org | Thornton, CO ???? > > _______________________________________________ > Ietf-languages mailing list > Ietf-languages at alvestrand.no > http://www.alvestrand.no/mailman/listinfo/ietf-languages From doug at ewellic.org Mon Nov 9 13:22:36 2015 From: doug at ewellic.org (Doug Ewell) Date: Mon, 09 Nov 2015 12:22:36 -0700 Subject: UN M.49 in language tags & locales Message-ID: <20151109122236.665a7a7059d7ee80bb4d670165c8327d.d5e52c607b.wbe@email03.secureserver.net> John Cowan wrote: > The Official Doug keeps track of additions to the UN list and gets > them added to the Registry when required, but I don't think this has > happened lately, maybe not ever. There have been no changes to M.49 that affected the Registry since the publication of RFCs 4645 and 4646 in 2006. A year earlier, 062 "South-Central Asia" was withdrawn and replaced by 034 "Southern Asia" and 143 "Central Asia". This affected pre-planning versions of the LSR, but not the RFC 4645 initial version. We did exceptionally add 003 "North America" in 2010, which covers all of the Americas except South (as opposed to 021 "Northern America", which is just Bermuda, Canada, Greenland, St. Pierre and Miquelon, and the US). 003 was originally omitted from the Registry since it was just a footnote to 021 in the UN standard, not part of the containment/composition tree (which, as John noted, doesn't affect BCP 47 anyway). I understand 003 is useful for some locale purposes. No changes at all have been made to M.49 since the addition of South Sudan in 2011. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From cameron at lumoslabs.com Sat Nov 14 14:06:48 2015 From: cameron at lumoslabs.com (Cameron Dutro) Date: Sat, 14 Nov 2015 14:06:48 -0600 Subject: Test Data Gone? Message-ID: Hey cldr users, The CLDR-based library I maintain used test data available in a directory called test/ that seems to no longer exist. I believe v21 was the last release to contain it (here's a link ). We're currently using these data to test collation tailoring, but are blocked from upgrading our tailoring rules by the fact that this directory has disappeared from every subsequent release. Has it been moved somewhere? Thanks! -Cameron -------------- next part -------------- An HTML attachment was scrubbed... URL: From markus.icu at gmail.com Sat Nov 14 17:23:16 2015 From: markus.icu at gmail.com (Markus Scherer) Date: Sat, 14 Nov 2015 15:23:16 -0800 Subject: Test Data Gone? In-Reply-To: References: Message-ID: Digging around in Trac, I see http://unicode.org/cldr/trac/changeset/6638 cldrbug 4370 : Clean up posix and test directories -> http://unicode.org/cldr/trac/ticket/4370 "Clean up mentions of CLDR test data" (CLDR 22) -> has a comment "remove posix and test directories - update download page" but I don't see it mentioned on the CLDR 22 download page markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From cameron at lumoslabs.com Sat Nov 14 17:45:50 2015 From: cameron at lumoslabs.com (Cameron Dutro) Date: Sat, 14 Nov 2015 17:45:50 -0600 Subject: Test Data Gone? In-Reply-To: References: Message-ID: Markus, thanks for looking into this. I just found this bit of text on the download page: "Note: Beginning with CLDR v21, the CLDR project will no longer publish the conformance test files for CLDR data. These files were intended to be used to validate behavior for certain fields, but have proven to be difficult to maintain and of limited usefulness." Looks like the test data is no longer published, which is really a shame. Oh well. I guess we'll have to figure out some other way of validating our collation implementation. Any ideas? -Cameron On Saturday, November 14, 2015, Markus Scherer wrote: > Digging around in Trac, I see > http://unicode.org/cldr/trac/changeset/6638 cldrbug 4370 > : Clean up posix and test > directories > -> http://unicode.org/cldr/trac/ticket/4370 "Clean up mentions of CLDR > test data" (CLDR 22) > -> has a comment "remove posix and test directories - update download page" > but I don't see it mentioned on the CLDR 22 download page > > markus > -------------- next part -------------- An HTML attachment was scrubbed... URL: From srl at icu-project.org Sat Nov 14 20:52:23 2015 From: srl at icu-project.org (Steven R. Loomis) Date: Sat, 14 Nov 2015 18:52:23 -0800 Subject: Test Data Gone? In-Reply-To: References: Message-ID: Nova and/or Rafael at IUC were asking for this. I asked someone file a big. Enviado desde nuestro iPhone. > El 14 nov 2015, a las 3:45 PM, Cameron Dutro escribi?: > > Markus, thanks for looking into this. I just found this bit of text on the download page: > > "Note: Beginning with CLDR v21, the CLDR project will no longer publish the conformance test files for CLDR data. These files were intended to be used to validate behavior for certain fields, but have proven to be difficult to maintain and of limited usefulness." > > Looks like the test data is no longer published, which is really a shame. Oh well. I guess we'll have to figure out some other way of validating our collation implementation. Any ideas? > > -Cameron > >> On Saturday, November 14, 2015, Markus Scherer wrote: >> Digging around in Trac, I see >> http://unicode.org/cldr/trac/changeset/6638 cldrbug 4370: Clean up posix and test directories >> -> http://unicode.org/cldr/trac/ticket/4370 "Clean up mentions of CLDR test data" (CLDR 22) >> -> has a comment "remove posix and test directories - update download page" >> but I don't see it mentioned on the CLDR 22 download page >> >> markus > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From cameron at lumoslabs.com Sat Nov 14 21:21:57 2015 From: cameron at lumoslabs.com (Cameron Dutro) Date: Sat, 14 Nov 2015 21:21:57 -0600 Subject: Test Data Gone? In-Reply-To: References: Message-ID: I looked back through some old Github issues and found a link Steven provided to the collation test file ICU uses. Is this something I should consider using? -Cameron On Sat, Nov 14, 2015 at 8:52 PM, Steven R. Loomis wrote: > Nova and/or Rafael at IUC were asking for this. I asked someone file a > big. > > Enviado desde nuestro iPhone. > > El 14 nov 2015, a las 3:45 PM, Cameron Dutro > escribi?: > > Markus, thanks for looking into this. I just found this bit of text on the > download page: > > "Note: Beginning with CLDR v21, the CLDR project will no longer publish > the conformance test files for CLDR data. These files were intended to be > used to validate behavior for certain fields, but have proven to be > difficult to maintain and of limited usefulness." > > Looks like the test data is no longer published, which is really a shame. > Oh well. I guess we'll have to figure out some other way of validating our > collation implementation. Any ideas? > > -Cameron > > On Saturday, November 14, 2015, Markus Scherer > wrote: > >> Digging around in Trac, I see >> http://unicode.org/cldr/trac/changeset/6638 cldrbug 4370 >> : Clean up posix and test >> directories >> -> http://unicode.org/cldr/trac/ticket/4370 "Clean up mentions of CLDR >> test data" (CLDR 22) >> -> has a comment "remove posix and test directories - update download >> page" >> but I don't see it mentioned on the CLDR 22 download page >> >> markus >> > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From markus.icu at gmail.com Sun Nov 15 12:53:22 2015 From: markus.icu at gmail.com (Markus Scherer) Date: Sun, 15 Nov 2015 10:53:22 -0800 Subject: Test Data Gone? In-Reply-To: References: Message-ID: On Sat, Nov 14, 2015 at 7:21 PM, Cameron Dutro wrote: > I looked back through some old Github issues and found a link Steven > provided to the collation test file > > ICU uses. Is this something I should consider using? > For collation, that should be useful. Also http://www.unicode.org/Public/UCA/latest/CollationTest.html I have not looked at the old CLDR test data that has been removed, so I don't know how that compares with any other data. markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From cameron at lumoslabs.com Sun Nov 15 12:56:42 2015 From: cameron at lumoslabs.com (Cameron Dutro) Date: Sun, 15 Nov 2015 12:56:42 -0600 Subject: Test Data Gone? In-Reply-To: References: Message-ID: Great, thanks Markus. Having these files is wonderful, and we're using them to test our implementation already. It is my understanding however that they do not test individual locale tailorings, is that correct? -Cameron On Sun, Nov 15, 2015 at 12:53 PM, Markus Scherer wrote: > On Sat, Nov 14, 2015 at 7:21 PM, Cameron Dutro > wrote: > >> I looked back through some old Github issues and found a link Steven >> provided to the collation test file >> >> ICU uses. Is this something I should consider using? >> > > For collation, that should be useful. Also > http://www.unicode.org/Public/UCA/latest/CollationTest.html > > I have not looked at the old CLDR test data that has been removed, so I > don't know how that compares with any other data. > > markus > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dzo at bisharat.net Sun Nov 15 21:07:35 2015 From: dzo at bisharat.net (Don Osborn) Date: Sun, 15 Nov 2015 22:07:35 -0500 Subject: UN M.49 in language tags & locales In-Reply-To: References: <20151109113616.665a7a7059d7ee80bb4d670165c8327d.ee8bd3f200.wbe@email03.secureserver.net> Message-ID: <56494877.1090000@bisharat.net> Thanks all, This info is helpful. Reason for asking is to verify info on the PanAfrican Localisation wiki, which I'm working on bringing back. All the best, Don On 11/9/2015 1:41 PM, Phillips, Addison wrote: >>> Also interested to know if these UN M.49 codes can still be used in >>> locales. > Also: Can still be and are actively used in locales, if, by locales, you mean CLDR or ICU (and implementations that have adopted these). The most well-known of them is "es-419". > > Addison > >> -----Original Message----- >> From: Ietf-languages [mailto:ietf-languages-bounces at alvestrand.no] On >> Behalf Of Doug Ewell >> Sent: Monday, November 09, 2015 10:36 AM >> To: Don Osborn; cldr-users at unicode.org; ietf-languages at iana.org >> Subject: RE: UN M.49 in language tags & locales >> >> Don Osborn wrote: >> >>> Trying to catch up on what the current rules are on use of 3-digit >>> region codes in language tags and got lost in the wording of RFC 5646. >>> Also interested to know if these UN M.49 codes can still be used in >>> locales. >> The rules were set out in RFC 4645, Section 2. >> >> In short, the code elements for macro-geographical regions are in; those for >> individual countries (because they already have two-letter code elements >> from ISO 3166-1) and for non-geographical categories like "economic >> groupings" are out. >> >> You can get an up-to-date list by looking in the Language Subtag Registry [1]. >> The three-digit region subtags are listed immediately after the two-letter >> ones. >> >> [1] >> http://www.iana.org/assignments/language-subtag-registry/language- >> subtag-registry >> >> -- >> Doug Ewell | http://ewellic.org | Thornton, CO ???? >> >> _______________________________________________ >> Ietf-languages mailing list >> Ietf-languages at alvestrand.no >> http://www.alvestrand.no/mailman/listinfo/ietf-languages From markus.icu at gmail.com Sun Nov 15 23:31:35 2015 From: markus.icu at gmail.com (Markus Scherer) Date: Sun, 15 Nov 2015 21:31:35 -0800 Subject: Test Data Gone? In-Reply-To: References: Message-ID: On Sun, Nov 15, 2015 at 10:56 AM, Cameron Dutro wrote: > Great, thanks Markus. Having these files is wonderful, and we're using > them to test our implementation already. It is my understanding however > that they do not test individual locale tailorings, is that correct? > The UCA test file is only for the DUCET, corresponding to what we call the "root locale". Actually, since CLDR tailors the default sort order, and ICU implements that, CLDR has modified versions of those test files: http://unicode.org/cldr/trac/browser/trunk/common/uca/ The ICU test file has a number of test cases for various locales, as indicated in the test data. They assume CLDR collation data. More often, I tried to make minimal assumption about the collation data, and copied relevant parts of rules into the test data -- so some of the test cases require a from-rules builder. As a result, this file might be too specific for other implementations. markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Mon Nov 16 00:30:49 2015 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Mon, 16 Nov 2015 07:30:49 +0100 Subject: Test Data Gone? In-Reply-To: References: Message-ID: Probably the most thorough test you could use would be one that tests semi-random strings to see if you get the same results as ICU. {phone} On Nov 16, 2015 06:32, "Markus Scherer" wrote: > On Sun, Nov 15, 2015 at 10:56 AM, Cameron Dutro > wrote: > >> Great, thanks Markus. Having these files is wonderful, and we're using >> them to test our implementation already. It is my understanding however >> that they do not test individual locale tailorings, is that correct? >> > > The UCA test file is only for the DUCET, corresponding to what we call the > "root locale". Actually, since CLDR tailors the default sort order, and ICU > implements that, CLDR has modified versions of those test files: > http://unicode.org/cldr/trac/browser/trunk/common/uca/ > > The ICU test file has a number of test cases for various locales, as > indicated in the test data. They assume CLDR collation data. More often, I > tried to make minimal assumption about the collation data, and copied > relevant parts of rules into the test data -- so some of the test cases > require a from-rules builder. As a result, this file might be too specific > for other implementations. > > markus > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From duerst at it.aoyama.ac.jp Mon Nov 16 04:00:48 2015 From: duerst at it.aoyama.ac.jp (=?UTF-8?Q?Martin_J._D=c3=bcrst?=) Date: Mon, 16 Nov 2015 19:00:48 +0900 Subject: Test Data Gone? In-Reply-To: References: Message-ID: <5649A950.50409@it.aoyama.ac.jp> On 2015/11/16 15:30, Mark Davis ?? wrote: > Probably the most thorough test you could use would be one that tests > semi-random strings to see if you get the same results as ICU. Good idea. For tailorings, one thing to do is to extract the characters used in the tailoring and to bias the semi-random strings heavily towards using these characters. Based on my experience with testing data for normalization (NFC and friends), I can say that having a good set of test data is extremely useful for implementers. I strongly encourage the Unicode Consortium to curate such data, and implementers at all levels to contribute to it. Regards, Martin. > On Nov 16, 2015 06:32, "Markus Scherer" wrote: > >> On Sun, Nov 15, 2015 at 10:56 AM, Cameron Dutro >> wrote: >> >>> Great, thanks Markus. Having these files is wonderful, and we're using >>> them to test our implementation already. It is my understanding however >>> that they do not test individual locale tailorings, is that correct? >>> >> >> The UCA test file is only for the DUCET, corresponding to what we call the >> "root locale". Actually, since CLDR tailors the default sort order, and ICU >> implements that, CLDR has modified versions of those test files: >> http://unicode.org/cldr/trac/browser/trunk/common/uca/ >> >> The ICU test file has a number of test cases for various locales, as >> indicated in the test data. They assume CLDR collation data. More often, I >> tried to make minimal assumption about the collation data, and copied >> relevant parts of rules into the test data -- so some of the test cases >> require a from-rules builder. As a result, this file might be too specific >> for other implementations. >> >> markus >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> >> > > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > From mark at macchiato.com Mon Nov 16 06:44:47 2015 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Mon, 16 Nov 2015 13:44:47 +0100 Subject: Test Data Gone? In-Reply-To: <5649A950.50409@it.aoyama.ac.jp> References: <5649A950.50409@it.aoyama.ac.jp> Message-ID: At the time we retracted it, it didn't appear that there was a lot of usage, and you really get a much more thorough test by comparing to ICU's implementation. The data we previously had was mechanically generated from the data, not curated. It was created by generating concatenations of some chosen primary/secondary/tertiary characters together with the tailored+exemplar characters for each language. Mark On Mon, Nov 16, 2015 at 11:00 AM, Martin J. D?rst wrote: > On 2015/11/16 15:30, Mark Davis ?? wrote: > >> Probably the most thorough test you could use would be one that tests >> semi-random strings to see if you get the same results as ICU. >> > > Good idea. For tailorings, one thing to do is to extract the characters > used in the tailoring and to bias the semi-random strings heavily towards > using these characters. > > Based on my experience with testing data for normalization (NFC and > friends), I can say that having a good set of test data is extremely useful > for implementers. I strongly encourage the Unicode Consortium to curate > such data, and implementers at all levels to contribute to it. > > Regards, Martin. > > > > On Nov 16, 2015 06:32, "Markus Scherer" wrote: >> >> On Sun, Nov 15, 2015 at 10:56 AM, Cameron Dutro >>> wrote: >>> >>> Great, thanks Markus. Having these files is wonderful, and we're using >>>> them to test our implementation already. It is my understanding however >>>> that they do not test individual locale tailorings, is that correct? >>>> >>>> >>> The UCA test file is only for the DUCET, corresponding to what we call >>> the >>> "root locale". Actually, since CLDR tailors the default sort order, and >>> ICU >>> implements that, CLDR has modified versions of those test files: >>> http://unicode.org/cldr/trac/browser/trunk/common/uca/ >>> >>> The ICU test file has a number of test cases for various locales, as >>> indicated in the test data. They assume CLDR collation data. More often, >>> I >>> tried to make minimal assumption about the collation data, and copied >>> relevant parts of rules into the test data -- so some of the test cases >>> require a from-rules builder. As a result, this file might be too >>> specific >>> for other implementations. >>> >>> markus >>> >>> _______________________________________________ >>> CLDR-Users mailing list >>> CLDR-Users at unicode.org >>> http://unicode.org/mailman/listinfo/cldr-users >>> >>> >>> >> >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From srl at icu-project.org Mon Nov 16 08:32:49 2015 From: srl at icu-project.org (Steven R. Loomis) Date: Mon, 16 Nov 2015 06:32:49 -0800 Subject: Test Data Gone? In-Reply-To: References: <5649A950.50409@it.aoyama.ac.jp> Message-ID: <53D05C3E-2CE8-4CB5-ACB7-8C857AD935E6@icu-project.org> Enviado desde nuestro iPhone. > El 16 nov 2015, a las 4:44 AM, Mark Davis ?? escribi?: > > At the time we retracted it, it didn't appear that there was a lot of usage, and you really get a much more thorough test by comparing to ICU's implementation. Right. An idea at IUC was rather than trying to scope test data as cldr conformance test data, to have a new effort that simply and explicitly records ICU's result for a certain Icu/cldr version somewhere for certain input values and certain formatting routines. People are doing this already, just combine efforts. Maybe the results would be an Icu-maintained file instead of cldr, like a sample app. > > The data we previously had was mechanically generated from the data, not curated. It was created by generating concatenations of some chosen primary/secondary/tertiary characters together with the tailored+exemplar characters for each language. > > Mark > >> On Mon, Nov 16, 2015 at 11:00 AM, Martin J. D?rst wrote: >>> On 2015/11/16 15:30, Mark Davis ?? wrote: >>> Probably the most thorough test you could use would be one that tests >>> semi-random strings to see if you get the same results as ICU. >> >> Good idea. For tailorings, one thing to do is to extract the characters used in the tailoring and to bias the semi-random strings heavily towards using these characters. >> >> Based on my experience with testing data for normalization (NFC and friends), I can say that having a good set of test data is extremely useful for implementers. I strongly encourage the Unicode Consortium to curate such data, and implementers at all levels to contribute to it. >> >> Regards, Martin. >> >> >> >>> On Nov 16, 2015 06:32, "Markus Scherer" wrote: >>> >>>> On Sun, Nov 15, 2015 at 10:56 AM, Cameron Dutro >>>> wrote: >>>> >>>>> Great, thanks Markus. Having these files is wonderful, and we're using >>>>> them to test our implementation already. It is my understanding however >>>>> that they do not test individual locale tailorings, is that correct? >>>> >>>> The UCA test file is only for the DUCET, corresponding to what we call the >>>> "root locale". Actually, since CLDR tailors the default sort order, and ICU >>>> implements that, CLDR has modified versions of those test files: >>>> http://unicode.org/cldr/trac/browser/trunk/common/uca/ >>>> >>>> The ICU test file has a number of test cases for various locales, as >>>> indicated in the test data. They assume CLDR collation data. More often, I >>>> tried to make minimal assumption about the collation data, and copied >>>> relevant parts of rules into the test data -- so some of the test cases >>>> require a from-rules builder. As a result, this file might be too specific >>>> for other implementations. >>>> >>>> markus >>>> >>>> _______________________________________________ >>>> CLDR-Users mailing list >>>> CLDR-Users at unicode.org >>>> http://unicode.org/mailman/listinfo/cldr-users >>> >>> >>> >>> _______________________________________________ >>> CLDR-Users mailing list >>> CLDR-Users at unicode.org >>> http://unicode.org/mailman/listinfo/cldr-users > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From cameron at lumoslabs.com Mon Nov 16 20:15:07 2015 From: cameron at lumoslabs.com (Cameron Dutro) Date: Mon, 16 Nov 2015 20:15:07 -0600 Subject: Test Data Gone? In-Reply-To: <53D05C3E-2CE8-4CB5-ACB7-8C857AD935E6@icu-project.org> References: <5649A950.50409@it.aoyama.ac.jp> <53D05C3E-2CE8-4CB5-ACB7-8C857AD935E6@icu-project.org> Message-ID: Mark and Martin, interleaving the tailoring characters with control characters seems like a totally valid approach, I'll give it a shot. There was some mention that such combinations were mechanically generated for CLDR before v22, does that code still exist somewhere? If I'm successful generating combinations I can then sort them with ICU and compare the order against our implementation. Steven, I like the idea of a maintained file that records conformance data, perhaps generated by ICU, although it seems a bit odd to keep it alongside ICU instead of CLDR. Does ICU contain a lot of customizations that cause it to sort in a different order, and if so, why are those not reflected in CLDR data? -Cameron On Mon, Nov 16, 2015 at 8:32 AM, Steven R. Loomis wrote: > > > Enviado desde nuestro iPhone. > > El 16 nov 2015, a las 4:44 AM, Mark Davis ?? > escribi?: > > At the time we retracted it, it didn't appear that there was a lot of > usage, and you really get a much more thorough test by comparing to ICU's > implementation. > > > Right. An idea at IUC was rather than trying to scope test data as cldr > conformance test data, to have a new effort that simply and explicitly > records ICU's result for a certain Icu/cldr version somewhere for certain > input values and certain formatting routines. People are doing this > already, just combine efforts. > > Maybe the results would be an Icu-maintained file instead of cldr, like a > sample app. > > > The data we previously had was mechanically generated from the data, not > curated. It was created by generating concatenations of some chosen > primary/secondary/tertiary characters together with the tailored+exemplar > characters for each language. > > Mark > > On Mon, Nov 16, 2015 at 11:00 AM, Martin J. D?rst > wrote: > >> On 2015/11/16 15:30, Mark Davis ?? wrote: >> >>> Probably the most thorough test you could use would be one that tests >>> semi-random strings to see if you get the same results as ICU. >>> >> >> Good idea. For tailorings, one thing to do is to extract the characters >> used in the tailoring and to bias the semi-random strings heavily towards >> using these characters. >> >> Based on my experience with testing data for normalization (NFC and >> friends), I can say that having a good set of test data is extremely useful >> for implementers. I strongly encourage the Unicode Consortium to curate >> such data, and implementers at all levels to contribute to it. >> >> Regards, Martin. >> >> >> >> On Nov 16, 2015 06:32, "Markus Scherer" wrote: >>> >>> On Sun, Nov 15, 2015 at 10:56 AM, Cameron Dutro >>>> wrote: >>>> >>>> Great, thanks Markus. Having these files is wonderful, and we're using >>>>> them to test our implementation already. It is my understanding however >>>>> that they do not test individual locale tailorings, is that correct? >>>>> >>>>> >>>> The UCA test file is only for the DUCET, corresponding to what we call >>>> the >>>> "root locale". Actually, since CLDR tailors the default sort order, and >>>> ICU >>>> implements that, CLDR has modified versions of those test files: >>>> http://unicode.org/cldr/trac/browser/trunk/common/uca/ >>>> >>>> The ICU test file has a number of test cases for various locales, as >>>> indicated in the test data. They assume CLDR collation data. More >>>> often, I >>>> tried to make minimal assumption about the collation data, and copied >>>> relevant parts of rules into the test data -- so some of the test cases >>>> require a from-rules builder. As a result, this file might be too >>>> specific >>>> for other implementations. >>>> >>>> markus >>>> >>>> _______________________________________________ >>>> CLDR-Users mailing list >>>> CLDR-Users at unicode.org >>>> http://unicode.org/mailman/listinfo/cldr-users >>>> >>>> >>>> >>> >>> >>> _______________________________________________ >>> CLDR-Users mailing list >>> CLDR-Users at unicode.org >>> http://unicode.org/mailman/listinfo/cldr-users >>> >>> > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From srloomis at us.ibm.com Mon Nov 16 21:42:06 2015 From: srloomis at us.ibm.com (Steven R Loomis) Date: Tue, 17 Nov 2015 03:42:06 +0000 Subject: Test Data Gone? In-Reply-To: References: , <5649A950.50409@it.aoyama.ac.jp> <53D05C3E-2CE8-4CB5-ACB7-8C857AD935E6@icu-project.org> Message-ID: <201511170342.tAH3gChY010343@d03av04.boulder.ibm.com> An HTML attachment was scrubbed... URL: From doug at ewellic.org Wed Nov 18 13:16:43 2015 From: doug at ewellic.org (Doug Ewell) Date: Wed, 18 Nov 2015 12:16:43 -0700 Subject: Test Data =?UTF-8?Q?Gone=3F?= Message-ID: <20151118121643.665a7a7059d7ee80bb4d670165c8327d.aa664bdecc.wbe@email03.secureserver.net> Steven R. Loomis wrote: >> At the time we retracted it, it didn't appear that there was a lot of >> usage, and you really get a much more thorough test by comparing to >> ICU's implementation. > > Right. An idea at IUC was rather than trying to scope test data as > cldr conformance test data, to have a new effort that simply and > explicitly records ICU's result for a certain Icu/cldr version > somewhere for certain input values and certain formatting routines. > People are doing this already, just combine efforts. > > Maybe the results would be an Icu-maintained file instead of cldr, > like a sample app. I've always been uncomfortable with the idea, which has been expressed from time to time, that CLDR is merely a data format for ICU, instead of being generally usable by applications and libraries that have nothing to do with ICU. Mark and Markus will recall that I expressed similar discomfort back in 2002, when BOCU-1 was specified in terms of ICU-based "sample" code instead of an actual specification, and when the conformance test was "make sure your implementation generates the same output as the sample code." -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From srl at icu-project.org Wed Nov 18 15:16:49 2015 From: srl at icu-project.org (Steven R. Loomis) Date: Wed, 18 Nov 2015 13:16:49 -0800 Subject: Test Data Gone? In-Reply-To: <20151118121643.665a7a7059d7ee80bb4d670165c8327d.aa664bdecc.wbe@email03.secureserver.net> References: <20151118121643.665a7a7059d7ee80bb4d670165c8327d.aa664bdecc.wbe@email03.secureserver.net> Message-ID: <7791D538-8611-43EB-AC02-B221588A674C@icu-project.org> > On Nov 18, 2015, at 11:16 AM, Doug Ewell wrote: > > Steven R. Loomis wrote: > >>> At the time we retracted it, it didn't appear that there was a lot of >>> usage, and you really get a much more thorough test by comparing to >>> ICU's implementation. >> >> Right. An idea at IUC was rather than trying to scope test data as >> cldr conformance test data, to have a new effort that simply and >> explicitly records ICU's result for a certain Icu/cldr version >> somewhere for certain input values and certain formatting routines. >> People are doing this already, just combine efforts. >> >> Maybe the results would be an Icu-maintained file instead of cldr, >> like a sample app. > > I've always been uncomfortable with the idea, which has been expressed > from time to time, that CLDR is merely a data format for ICU, instead of > being generally usable by applications and libraries that have nothing > to do with ICU. Doug, I am uncomfortable with that as well, and always glad for non-ICU users of CLDR. It was (non-ICU) implementors of CLDR which asked for the data. CLDR had not had a set of maintained conformance data, so the idea I mentioned was to simply make available ?ICU?s results? in a non-normative way. The first preference is of course to improve the documentation so that it is usable as is. However, the question came up, ?why not just dump ICU?s output so people have some examples to look at?. My comments above reflect a struggle to find a home for such data. By the way, that reminds me of the implementer?s guide which was started here - http://goo.gl/zAfDt Do you have any other recommendations on how to make CLDR more generally usable? > Mark and Markus will recall that I expressed similar discomfort back in > 2002, when BOCU-1 was specified in terms of ICU-based "sample" code > instead of an actual specification, and when the conformance test was > "make sure your implementation generates the same output as the sample > code.? Right. ICU must not become the spec for CLDR. -s -------------- next part -------------- An HTML attachment was scrubbed... URL: From mats.gbproject at gmail.com Wed Nov 18 16:25:05 2015 From: mats.gbproject at gmail.com (Mats Blakstad) Date: Wed, 18 Nov 2015 23:25:05 +0100 Subject: #9066: Territory code request for Abkhazia In-Reply-To: References: Message-ID: Hi again I'm wondering how we can proceed with this ticket: http://unicode.org/cldr/trac/ticket/9066 I want to add localization data for Abkhazia, even though it do not - like many other territories - have an ISO code. If people prefer to work on a list matching ISO they can easily extract territory information for only ISO codes. However, I don't understand why that should prevent other people from collecting and maintaining localization data for new territories. Where can I add my patch and how is the process to move forward with this ticket? Thanks in advance. 2015-11-09 0:59 GMT+01:00 Mats Blakstad : > I field this issue yesterday: > http://unicode.org/cldr/trac/ticket/9066 > > What criteria do you have for assigning codes to new territories? Should I > provide localization data now, and do you prefer to get a patch or just to > list up data in the issue summary? > > Or should I first wait and see reaction on the issue? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cameron at lumoslabs.com Wed Nov 18 17:05:50 2015 From: cameron at lumoslabs.com (Cameron Dutro) Date: Wed, 18 Nov 2015 17:05:50 -0600 Subject: Test Data Gone? In-Reply-To: <7791D538-8611-43EB-AC02-B221588A674C@icu-project.org> References: <20151118121643.665a7a7059d7ee80bb4d670165c8327d.aa664bdecc.wbe@email03.secureserver.net> <7791D538-8611-43EB-AC02-B221588A674C@icu-project.org> Message-ID: Steven, thanks for the reminder about that implementer's guide, I hope that will become a full-fledged resource at some point. I haven't really contributed to it because I feel like such a CLDR/ICU n00b even after working with the data for almost 5 years. It's a document I would love to read today as a CLDR implementer, but I don't feel qualified to contribute to it. Surely the members of the ICU team have insight to share here? -Cameron On Wed, Nov 18, 2015 at 3:16 PM, Steven R. Loomis wrote: > > On Nov 18, 2015, at 11:16 AM, Doug Ewell wrote: > > Steven R. Loomis wrote: > > At the time we retracted it, it didn't appear that there was a lot of > usage, and you really get a much more thorough test by comparing to > ICU's implementation. > > > Right. An idea at IUC was rather than trying to scope test data as > cldr conformance test data, to have a new effort that simply and > explicitly records ICU's result for a certain Icu/cldr version > somewhere for certain input values and certain formatting routines. > People are doing this already, just combine efforts. > > Maybe the results would be an Icu-maintained file instead of cldr, > like a sample app. > > > I've always been uncomfortable with the idea, which has been expressed > from time to time, that CLDR is merely a data format for ICU, instead of > being generally usable by applications and libraries that have nothing > to do with ICU. > > > Doug, > I am uncomfortable with that as well, and always glad for non-ICU users > of CLDR. > > It was (non-ICU) implementors of CLDR which asked for the data. > CLDR had not had a set of maintained conformance data, so the idea I > mentioned was to simply make available ?ICU?s results? in a non-normative > way. > The first preference is of course to improve the documentation so that it > is usable as is. However, the question came up, ?why not just dump ICU?s > output so people have some examples to look at?. My comments above reflect > a struggle to find a home for such data. > > By the way, that reminds me of the implementer?s guide which was started > here - http://goo.gl/zAfDt > > Do you have any other recommendations on how to make CLDR more generally > usable? > > Mark and Markus will recall that I expressed similar discomfort back in > 2002, when BOCU-1 was specified in terms of ICU-based "sample" code > instead of an actual specification, and when the conformance test was > "make sure your implementation generates the same output as the sample > code.? > > > Right. ICU must not become the spec for CLDR. > > -s > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Wed Nov 18 17:46:46 2015 From: doug at ewellic.org (Doug Ewell) Date: Wed, 18 Nov 2015 16:46:46 -0700 Subject: #9066: Territory code request for Abkhazia Message-ID: <20151118164646.665a7a7059d7ee80bb4d670165c8327d.464d36fee6.wbe@email03.secureserver.net> Mats Blakstad wrote: > I want to add localization data for Abkhazia, even though it do not - > like many other territories - have an ISO code. If people prefer to > work on a list matching ISO they can easily extract territory > information for only ISO codes. However, I don't understand why that > should prevent other people from collecting and maintaining > localization data for new territories. > > Where can I add my patch and how is the process to move forward with > this ticket? You'll have to wait for a reply from an actual CLDR team member, but my guess is that the reply will be something along these lines: ISO 3166-1 code elements are based on United Nations criteria: to be included, a state must be either a UN member state, a member of one of its specialized agencies (such as UNESCO), or a party to the Statute of the International Court of Justice. Kosovo is not included although it is a member of the IMF and the World Bank Group. CLDR, like many standards and data sets, uses ISO 3166-1 to identify "countries" or "regions." The main reason for delegating this to ISO is to avoid getting embroiled in political arguments over what constitutes a country. Abkhazia and South Ossetia are certainly two of the most politically controversial "countries" on the planet. They are recognized as independent only by Russia and a small handful of other nations. Identifying them as separate nations would generally be regarded as a statement about Georgian sovereignty. CLDR does make an exception in the case of Kosovo, which is recognized by roughly half of UN member states. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From verdy_p at wanadoo.fr Wed Nov 18 18:16:09 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Thu, 19 Nov 2015 01:16:09 +0100 Subject: "Svalbard" and "Jan Mayen" subdivisions of Norway In-Reply-To: References: Message-ID: Similar issue for Clipperton Island (CP reserved on ISO3166-1) whis is also an overseas dependencies and a subdivision of France (FR-CP assigned in ISO3166-2). Both codes are listed but still not as equivalent aliases. Note aldo that in France some m?tropolitain r?gions will merge from 22 to 13 (several r?gions will not ne affect?s, notably Britanny, Pays de la Loire, ?le-de-France and Corsica). Their new names are still not d?cider formally but their composition is known. These names will be decided by the new regional assemblies elected next month, following after the ongoing public consultation (only zone r?gion whose composition was not chang?s had its name changer and already effective) Their new codes in ISO 3166-2 is not decided. However one current r?gion is already special because one of its d?partements was split in two separate parts: the M?tropole de Lyon with a special status, and the new smaller d?partement of Rh?ne. Their union is no longer a departement but a special entity without local gouvernement, the "Circonscription d?partementale du Rh?ne" (only used for the state-controled "pr?fecture" which supervizes the two entities and provides some state-controled functions, not for any local government or elected body, and still using the former department code "69" for both entities, i.e. FR-69 in ISO3166-2). Currently, ISO 3166-2 has not changed, so the code listed for Rh?ne still designates this former union, but with an unqualified name now ambiguous, and no code is listed for the two newer local subdivisions. Note also that postal codes will not change and will remain "69???" (this is already the case anyway since long with two separate departments in Corsica, coded "2A" and "2B" but still sharing the prefix "20" for the 5-digit postal codes); anyway postal codes are not relevant in ISO3166-2 where "FR-20" is also not used, Corsica as a whole using a 1-letter region code like other metropolitan regions. 2015-11-09 1:05 GMT+01:00 Mats Blakstad : > Check out this page. Some subdivisions, like "Hong Kong" also have a > territory code. They are marked, like CN-91 = HK (Hong Kong SAR China): > http://www.unicode.org/cldr > /charts/latest/supplemental/territory_subdivisions.html > > However, I can't find this type of mapping in the cldr core, are they > there? If not it would be great to have them there! > > I noticed that Norway have "Svalbard" and "Jan Mayen" listed as > subdivisions. They also have their own territory codes, but this is not > marked on the page i linked to. > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mats.gbproject at gmail.com Wed Nov 18 18:28:11 2015 From: mats.gbproject at gmail.com (Mats Blakstad) Date: Thu, 19 Nov 2015 01:28:11 +0100 Subject: #9066: Territory code request for Abkhazia In-Reply-To: <20151118164646.665a7a7059d7ee80bb4d670165c8327d.464d36fee6.wbe@email03.secureserver.net> References: <20151118164646.665a7a7059d7ee80bb4d670165c8327d.464d36fee6.wbe@email03.secureserver.net> Message-ID: As far as I know CLDR does not follow ISO 3166-1 code elements on several other issues than Kosovo. Ascension Island (AC) Clipperton Island (CP) Diego Garcia (DG) Ceuta and Melilla (EA) Canary Islands (IC) Tristan da Cunha (TA) These codes have their own territory code within CLDR, but as far as I can see these are not valid ISO 3166-1 codes. Several of them have both a territory code and a sub-division code, so it seems like other regions already have both types of codes. The request is not about anything political, like asking to recognize Abkhazia as a country. CLDR do not create "country codes", it is a list of territories, and from my understanding the territory list mostly reflect that for some areas there is a need for some special consideration when you want to effectively localize your project there. So what should be the guiding star in the question weather new territory code should be created or not, is if there is a valid need to collect localization data. What I ask for is to provide localization data for Abkhazia, that people then can choose to use or not. I don't understand why political converses in UN should decide weather we are allowed to collect localization data for a new territory within CLDR. To collect localization data will actually help create more effective communication between people living in these marginalized regions and the wider international community, and I guess that is needed a lot. In Abkhazia there are real people living that by all practical measures of everyday life, do not live in Georgia. As example they use their own currency. So I guess it would make a lot of sense to be able to give them overview of money values estimates in their local currency. That a region operate with their own local currency, to me it is by itself a valid reason to create a new territory code in CLDR. As long as there is a need for localization data to create effective communication with people living in an area, I don't understand why there should be a problem to create a new territory code. If we would like to follow the ISO 3166-1, we could of course just delete the above mentioned codes that are not part of ISO 3166-1. If not I guess weather a territory have a ISO 3166-1 code is not relevant for CLDR-territory codes. 2015-11-19 0:46 GMT+01:00 Doug Ewell : > Mats Blakstad wrote: > > > I want to add localization data for Abkhazia, even though it do not - > > like many other territories - have an ISO code. If people prefer to > > work on a list matching ISO they can easily extract territory > > information for only ISO codes. However, I don't understand why that > > should prevent other people from collecting and maintaining > > localization data for new territories. > > > > Where can I add my patch and how is the process to move forward with > > this ticket? > > You'll have to wait for a reply from an actual CLDR team member, but my > guess is that the reply will be something along these lines: > > ISO 3166-1 code elements are based on United Nations criteria: to be > included, a state must be either a UN member state, a member of one of > its specialized agencies (such as UNESCO), or a party to the Statute of > the International Court of Justice. Kosovo is not included although it > is a member of the IMF and the World Bank Group. > > CLDR, like many standards and data sets, uses ISO 3166-1 to identify > "countries" or "regions." The main reason for delegating this to ISO is > to avoid getting embroiled in political arguments over what constitutes > a country. > > Abkhazia and South Ossetia are certainly two of the most politically > controversial "countries" on the planet. They are recognized as > independent only by Russia and a small handful of other nations. > Identifying them as separate nations would generally be regarded as a > statement about Georgian sovereignty. > > CLDR does make an exception in the case of Kosovo, which is recognized > by roughly half of UN member states. > > -- > Doug Ewell | http://ewellic.org | Thornton, CO ???? > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From srl at icu-project.org Wed Nov 18 18:45:52 2015 From: srl at icu-project.org (Steven R. Loomis) Date: Wed, 18 Nov 2015 16:45:52 -0800 Subject: #9066: Territory code request for Abkhazia In-Reply-To: References: Message-ID: <84D3670E-3DA3-4DD4-9635-37C1C8A43E54@icu-project.org> Please just be patient. Cldr is working through its tickets. Enviado desde nuestro iPhone. > El 18 nov 2015, a las 2:25 PM, Mats Blakstad escribi?: > > Hi again > > I'm wondering how we can proceed with this ticket: > http://unicode.org/cldr/trac/ticket/9066 > > I want to add localization data for Abkhazia, even though it do not - like many other territories - have an ISO code. If people prefer to work on a list matching ISO they can easily extract territory information for only ISO codes. However, I don't understand why that should prevent other people from collecting and maintaining localization data for new territories. > > Where can I add my patch and how is the process to move forward with this ticket? > > Thanks in advance. > > 2015-11-09 0:59 GMT+01:00 Mats Blakstad : >> I field this issue yesterday: >> http://unicode.org/cldr/trac/ticket/9066 >> >> What criteria do you have for assigning codes to new territories? Should I provide localization data now, and do you prefer to get a patch or just to list up data in the issue summary? >> >> Or should I first wait and see reaction on the issue? > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Wed Nov 18 19:36:59 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Thu, 19 Nov 2015 02:36:59 +0100 Subject: #9066: Territory code request for Abkhazia In-Reply-To: References: <20151118164646.665a7a7059d7ee80bb4d670165c8327d.464d36fee6.wbe@email03.secureserver.net> Message-ID: 2015-11-19 1:28 GMT+01:00 Mats Blakstad : > As far as I know CLDR does not follow ISO 3166-1 code elements on several > other issues than Kosovo. > > Ascension Island (AC) > Clipperton Island (CP) > Diego Garcia (DG) > Ceuta and Melilla (EA) > Canary Islands (IC) > Tristan da Cunha (TA) > > These codes have their own territory code within CLDR, but as far as I can > see these are not valid ISO 3166-1 codes. > However these codes have an exceptional reservation in ISO3166-1 because they were requested by other international standards (notably for postal and telecommunication standards by UPU and ITU, or by WIPO). These codes are listed in an annexe of ISO 3166-1 and are "de facto" standardized, but their usage is limited in time and normally applicable to the domains covered by these other standards that want to interoperate with ISO 3166-1 using specialized exceptions instead of the formal ISO 3166-1 codes (GB for AC; FR for CP; IO for DG; SH for TA; ES for EA and IC). The same remark applies to EU (European Union), frequently used as well by other economic unions in Europe when they are partners to the same treaties or cooperations and then apply the same economical or legal framework, and for those domains, the EU code is also used in Switzerland, Liechtenstein, Norway and Iceland even if they are not formal members of the EU. It is even sometimes extended for more limited cases to Monaco, Andorra, Vatican and San Marino, or to EU candidate countries that have chosen to rule their economical interests in cooperation with the EU or that are preparing their possible future integration in the EU. Kosovo remains special (even if it has been recognized by most EU members), but only because it locally uses the Euro and not the Serbian currency (under a ratified international treaty, whose the EU and Serbia are both parties, along with the UN mission and the local government of Kosovo). Kosovo is still formally part of Serbia (which is still not a EU member but a candidate) and not a full member of the Eurozone. IMHO, for most localization purposes, the EU code should not be extended and should only cover the 27 members. But note that some parts of these 27 countries are not part of the EU territory (e.g. the French "Overseas Collectivities" are not part of the EU, except Saint-Barth?lemy and Saint-Martin which were separated from the French "Overseas Department" of Guadeloupe but chose to remain in the EU), even if all their inhabitants are EU citizens (e.g. Northern Cyprus, British Sovereign Bases, Groenland) and can vote to European elections. Gibraltar is frequently included in EU even if it is not formally part of UK territory, but generally considered part of the EU territory as it is in the same custom union (Gibraltar citizens are also full citizens of UK, they vote in in the European elections with Southern England). But for most uses (including localization needs) the EU code include those remote parts of any one of the 27 members; if those parts have a ISO3166-1 assigned code (or one of the extensions you list above), you can still create exceptions in your data to this default inclusion by specializing locale tags for them. -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Wed Nov 18 22:29:43 2015 From: doug at ewellic.org (Doug Ewell) Date: Wed, 18 Nov 2015 21:29:43 -0700 Subject: #9066: Territory code request for Abkhazia In-Reply-To: References: <20151118164646.665a7a7059d7ee80bb4d670165c8327d.464d36fee6.wbe@email03.secureserver.net> Message-ID: <3312D83022FC42109F82631FF9D27450@DougEwell> Mats Blakstad wrote: > What I ask for is to provide localization data for Abkhazia, that > people then can choose to use or not. I don't understand why political > converses in UN should decide weather we are allowed to collect > localization data for a new territory within CLDR. If there's no intent to make a statement, then you can use the standard coding "GE-AB" for the Autonomous Republic of Abkhazia, which is coterminous with the disputed state. For example, a CLDR resource file containing resources in "the Abkhaz language as spoken in Abkhazia" (redundant, but used here for illustration) might be named "ab_GE_u_sd_geab.xml". -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From srl at icu-project.org Wed Nov 18 22:58:03 2015 From: srl at icu-project.org (Steven R. Loomis) Date: Wed, 18 Nov 2015 20:58:03 -0800 Subject: #9066: Territory code request for Abkhazia In-Reply-To: <3312D83022FC42109F82631FF9D27450@DougEwell> References: <20151118164646.665a7a7059d7ee80bb4d670165c8327d.464d36fee6.wbe@email03.secureserver.net> <3312D83022FC42109F82631FF9D27450@DougEwell> Message-ID: Last I checked the ticket didn't mention language. Please do so. Yes, you can collect the material and determine the locale id later. It's not so important for collecting data. S Enviado desde nuestro iPhone. > El 18 nov 2015, a las 8:29 PM, Doug Ewell escribi?: > > Mats Blakstad wrote: > >> What I ask for is to provide localization data for Abkhazia, that >> people then can choose to use or not. I don't understand why political >> converses in UN should decide weather we are allowed to collect >> localization data for a new territory within CLDR. > > If there's no intent to make a statement, then you can use the standard coding "GE-AB" for the Autonomous Republic of Abkhazia, which is coterminous with the disputed state. > > For example, a CLDR resource file containing resources in "the Abkhaz language as spoken in Abkhazia" (redundant, but used here for illustration) might be named "ab_GE_u_sd_geab.xml". > > -- > Doug Ewell | http://ewellic.org | Thornton, CO ???? > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From srl at icu-project.org Wed Nov 18 23:39:14 2015 From: srl at icu-project.org (Steven R. Loomis) Date: Wed, 18 Nov 2015 21:39:14 -0800 Subject: "Svalbard" and "Jan Mayen" subdivisions of Norway In-Reply-To: References: Message-ID: <401B35EB-0DCC-44D6-A2E5-4F8B75C22D1D@icu-project.org> Please file a ticket. Enviado desde nuestro iPhone. > El 8 nov 2015, a las 4:05 PM, Mats Blakstad escribi?: > > Check out this page. Some subdivisions, like "Hong Kong" also have a territory code. They are marked, like CN-91 = HK (Hong Kong SAR China): > http://www.unicode.org/cldr/charts/latest/supplemental/territory_subdivisions.html > > However, I can't find this type of mapping in the cldr core, are they there? If not it would be great to have them there! > > I noticed that Norway have "Svalbard" and "Jan Mayen" listed as subdivisions. They also have their own territory codes, but this is not marked on the page i linked to. > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Thu Nov 19 04:51:18 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Thu, 19 Nov 2015 11:51:18 +0100 Subject: "Svalbard" and "Jan Mayen" subdivisions of Norway In-Reply-To: <401B35EB-0DCC-44D6-A2E5-4F8B75C22D1D@icu-project.org> References: <401B35EB-0DCC-44D6-A2E5-4F8B75C22D1D@icu-project.org> Message-ID: Sorry for the typos, the message was initially composed on my smartphone which mixed the French and English languages and auto-"corrected" some words 2015-11-19 6:39 GMT+01:00 Steven R. Loomis : > Please file a ticket. > > Enviado desde nuestro iPhone. > > El 8 nov 2015, a las 4:05 PM, Mats Blakstad > escribi?: > > Check out this page. Some subdivisions, like "Hong Kong" also have a > territory code. They are marked, like CN-91 = HK (Hong Kong SAR China): > http://www.unicode.org/cldr > /charts/latest/supplemental/territory_subdivisions.html > > However, I can't find this type of mapping in the cldr core, are they > there? If not it would be great to have them there! > > I noticed that Norway have "Svalbard" and "Jan Mayen" listed as > subdivisions. They also have their own territory codes, but this is not > marked on the page i linked to. > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mats.gbproject at gmail.com Mon Nov 23 02:51:35 2015 From: mats.gbproject at gmail.com (Mats Blakstad) Date: Mon, 23 Nov 2015 09:51:35 +0100 Subject: #9066: Territory code request for Abkhazia In-Reply-To: References: <20151118164646.665a7a7059d7ee80bb4d670165c8327d.464d36fee6.wbe@email03.secureserver.net> <3312D83022FC42109F82631FF9D27450@DougEwell> Message-ID: I've added more info now: http://unicode.org/cldr/trac/ticket/9066#comment:6 Notice that Abkhazia use other *currency*, other *time zone* + other *telephone codes* than Geogia. The way I see it, to save this data under "GE-AB" would be exactly an "intent to make a statement" guided by other aims than developing CLDR in a consistent and coherent way. Or is it possible to add currency information to a subdivisions in CLDR? Or maybe we should add Russian Ruble to Georgia, as a part of their territory now actually uses this currency? No I can't use "GE-AB" in an alpha-2 coding system for territories. In general a solution like this complicated things a lot and is completely unnecessary. In a CMS like Drupal, the code would not be valid. I mean, why not just make a territory code, and then Abkhazia can be treated both as territory and as a sub-division like several other regions already are? And no I'm not asking about how I can make a private solution. I'm asking *how to add a new territory code to CLDR*. Why? Well... Why are not people just making their own private locale data system for Spain, Norway, India or any territory? Because sharing and maintaining localization data together is so much better: It is easier for everyone to get better quality of the data. I guess that is the philosophy behind the 'C' in the name CLDR, standing for 'Common': "*Common* Locale Data Repository" :) Thanks Philippe for clearing up about the exceptional reservation in ISO3166-1, I guess it makes more sense now. However, to me it seems like CLDR have the ability to be pragmatic, as these territories and Kosovo have been included despite they're not officially part of ISO 3166-1. The issue could be solved by assigning a private use tag like 'XA' to Abkhazia. Again: I think the development of territory codes within "Common Locale Data Repository" should be guided by exactly whether localization data is needed, not what goes on in UN. Lets try to have some institutional independence. 2015-11-19 5:58 GMT+01:00 Steven R. Loomis : > Last I checked the ticket didn't mention language. Please do so. > > Yes, you can collect the material and determine the locale id later. It's > not so important for collecting data. > > S > > Enviado desde nuestro iPhone. > > El 18 nov 2015, a las 8:29 PM, Doug Ewell escribi?: > > Mats Blakstad wrote: > > What I ask for is to provide localization data for Abkhazia, that > > people then can choose to use or not. I don't understand why political > > converses in UN should decide weather we are allowed to collect > > localization data for a new territory within CLDR. > > > If there's no intent to make a statement, then you can use the standard > coding "GE-AB" for the Autonomous Republic of Abkhazia, which is > coterminous with the disputed state. > > For example, a CLDR resource file containing resources in "the Abkhaz > language as spoken in Abkhazia" (redundant, but used here for illustration) > might be named "ab_GE_u_sd_geab.xml". > > -- > Doug Ewell | http://ewellic.org | Thornton, CO ???? > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mats.gbproject at gmail.com Mon Nov 23 03:06:20 2015 From: mats.gbproject at gmail.com (Mats Blakstad) Date: Mon, 23 Nov 2015 10:06:20 +0100 Subject: "Svalbard" and "Jan Mayen" subdivisions of Norway In-Reply-To: References: <401B35EB-0DCC-44D6-A2E5-4F8B75C22D1D@icu-project.org> Message-ID: I've opened a ticket here: http://unicode.org/cldr/trac/ticket/9081 Philippe, can you add your info there too? 2015-11-19 11:51 GMT+01:00 Philippe Verdy : > Sorry for the typos, the message was initially composed on my smartphone > which mixed the French and English languages and auto-"corrected" some words > > 2015-11-19 6:39 GMT+01:00 Steven R. Loomis : > >> Please file a ticket. >> >> Enviado desde nuestro iPhone. >> >> El 8 nov 2015, a las 4:05 PM, Mats Blakstad >> escribi?: >> >> Check out this page. Some subdivisions, like "Hong Kong" also have a >> territory code. They are marked, like CN-91 = HK (Hong Kong SAR China): >> http://www.unicode.org/cldr >> /charts/latest/supplemental/territory_subdivisions.html >> >> However, I can't find this type of mapping in the cldr core, are they >> there? If not it would be great to have them there! >> >> I noticed that Norway have "Svalbard" and "Jan Mayen" listed as >> subdivisions. They also have their own territory codes, but this is not >> marked on the page i linked to. >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lc at faktum.co Tue Nov 24 05:03:07 2015 From: lc at faktum.co (Lars Corneliussen) Date: Tue, 24 Nov 2015 11:03:07 +0000 Subject: likely subtag for 'und-GB' to 'en-GB' Message-ID: Hi Resending this. I'm not sure this was posted, since I sent it before enlisting my mail address. I was wondering why there is no mapping from undefined language and country GB od UK to UK/GB English. Same for US. Is there another way to get that information? _ Lars -------------- next part -------------- An HTML attachment was scrubbed... URL: