From srl at icu-project.org Fri Apr 8 14:40:12 2016 From: srl at icu-project.org (Steven Loomis) Date: Fri, 08 Apr 2016 12:40:12 -0700 Subject: Duration Lists? In-Reply-To: References: Message-ID: <882318EC-81E2-42BE-B86A-40056FFEAF00@icu-project.org> Nothing heard, so filed http://unicode.org/cldr/trac/ticket/9361 El 3/31/16 10:07 AM, "CLDR-Users en nombre de Steven Loomis" escribi?: >From path header: ../../tools/java/org/unicode/cldr/util/data/PathHeader.txt://ldml/listPatterns/listPattern[@type="unit-short"]/listPatternPart[@type="%A"] ; Misc ; Displaying Lists ; Short Duration Lists ; &listOrder($1) So the type ?unit-short? is named ?Short Duration List?. What is this list and what is it for? It?s not documented. I?m at TC39. The question is how CLDR lists map to the following four use cases: ? separating numbers: ?4, 5, 6? or ?4;5;6" ? separating units: ?5months 3days? or ?5 lb 8 oz? ? separating ?regular? things - ?Monday, Tuesday, and Wednesday? ? separating ?regular" in short form - ?Monday, Tuesday, Wednesday? Thanks. _______________________________________________ CLDR-Users mailing list CLDR-Users at unicode.org http://unicode.org/mailman/listinfo/cldr-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Fri Apr 8 14:51:21 2016 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Fri, 8 Apr 2016 21:51:21 +0200 Subject: Duration Lists? In-Reply-To: <882318EC-81E2-42BE-B86A-40056FFEAF00@icu-project.org> References: <882318EC-81E2-42BE-B86A-40056FFEAF00@icu-project.org> Message-ID: Sorry, didn't see original. added comments to bug. Mark On Fri, Apr 8, 2016 at 9:40 PM, Steven Loomis wrote: > Nothing heard, so filed http://unicode.org/cldr/trac/ticket/9361 > > > El 3/31/16 10:07 AM, "CLDR-Users en nombre de Steven Loomis" < > cldr-users-bounces at unicode.org en nombre de srl at icu-project.org> escribi?: > > From path header: > > ../../tools/java/org/unicode/cldr/util/data/PathHeader.txt://ldml/listPatterns/listPattern[@type="unit-short"]/listPatternPart[@type="%A"] > ; Misc ; Displaying Lists ; Short Duration Lists ; &listOrder($1) > > > So the type ?unit-short? is named ?Short Duration List?. > > What is this list and what is it for? It?s not documented. > > I?m at TC39. The question is how CLDR lists map to the following four use > cases: > > > ? separating numbers: ?4, 5, 6? or ?4;5;6" > ? separating units: ?5months 3days? or ?5 lb 8 oz? > ? separating ?regular? things - ?Monday, Tuesday, and Wednesday? > ? separating ?regular" in short form - ?Monday, Tuesday, Wednesday? > > Thanks. > > > _______________________________________________ CLDR-Users mailing list > CLDR-Users at unicode.org http://unicode.org/mailman/listinfo/cldr-users > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ehoogerbeets at gmail.com Thu Apr 21 00:34:43 2016 From: ehoogerbeets at gmail.com (Edwin Hoogerbeets) Date: Wed, 20 Apr 2016 22:34:43 -0700 Subject: names, addresses, phone numbers Message-ID: <57186673.3060408@gmail.com> I heard talk 2 or 3 years ago about a proposal to add name, address, and phone number formats to CLDR. What ever happened to those efforts? I don't really see data in CLDR 29 about those. In my i18n library for JS called "ilib", I have data about the address formats for practically every country in the world, as well as the phone formats and name formats for many countries. I would love to contribute this data to CLDR and then later leverage other people's local knowledge to fill in the gaps where my data is lacking... Can someone direct me to the folks who are working on these? Thanks, Edwin From cjl at sugarlabs.org Thu Apr 21 01:27:29 2016 From: cjl at sugarlabs.org (Chris Leonard) Date: Thu, 21 Apr 2016 02:27:29 -0400 Subject: names, addresses, phone numbers In-Reply-To: <57186673.3060408@gmail.com> References: <57186673.3060408@gmail.com> Message-ID: On Thu, Apr 21, 2016 at 1:34 AM, Edwin Hoogerbeets wrote: > I heard talk 2 or 3 years ago about a proposal to add name, address, and > phone number formats to CLDR. What ever happened to those efforts? I don't > really see data in CLDR 29 about those. > > In my i18n library for JS called "ilib", I have data about the address > formats for practically every country in the world, as well as the phone > formats and name formats for many countries. I would love to contribute this > data to CLDR and then later leverage other people's local knowledge to fill > in the gaps where my data is lacking... > > Can someone direct me to the folks who are working on these? Thanks, > Dear Edwin. I'd be interested in comparing your data to that in the glibc locales. Is there a link to your repo you can provide? cjl From mimckenna at paypal.com Thu Apr 21 17:13:39 2016 From: mimckenna at paypal.com (Mckenna, Mike) Date: Thu, 21 Apr 2016 22:13:39 +0000 Subject: names, addresses, phone numbers In-Reply-To: References: <57186673.3060408@gmail.com> Message-ID: <39D869D9-47DE-4609-8E70-62A08107A174@paypal.com> I compare mostly with google libphonenumber and libaddressinput / i18napis libraries and data. Phone, especially, is a very dynamic animal with changes happening every day in the google library. Mike McKenna Internationalization Technology Product Owner PayPal On 4/20/16, 11:27 PM, "CLDR-Users on behalf of Chris Leonard" wrote: >On Thu, Apr 21, 2016 at 1:34 AM, Edwin Hoogerbeets > wrote: >> I heard talk 2 or 3 years ago about a proposal to add name, address, and >> phone number formats to CLDR. What ever happened to those efforts? I don't >> really see data in CLDR 29 about those. >> >> In my i18n library for JS called "ilib", I have data about the address >> formats for practically every country in the world, as well as the phone >> formats and name formats for many countries. I would love to contribute this >> data to CLDR and then later leverage other people's local knowledge to fill >> in the gaps where my data is lacking... >> >> Can someone direct me to the folks who are working on these? Thanks, >> > > > >Dear Edwin. > > >I'd be interested in comparing your data to that in the glibc locales. > >Is there a link to your repo you can provide? > >cjl >_______________________________________________ >CLDR-Users mailing list >CLDR-Users at unicode.org >http://unicode.org/mailman/listinfo/cldr-users From ehoogerbeets at gmail.com Thu Apr 21 18:34:35 2016 From: ehoogerbeets at gmail.com (Edwin Hoogerbeets) Date: Thu, 21 Apr 2016 16:34:35 -0700 Subject: names, addresses, phone numbers In-Reply-To: References: <57186673.3060408@gmail.com> Message-ID: <5719638B.8090602@gmail.com> Chris, you can see the data at: https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/ Under there is https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/und/ directories which contain the phone files for 22 countries. The phone files are phonefmt.json for the progressive formats designed to be used for format partial and full numbers while dialing digits in a phone UI, numplan.json for the basic numbering plan information, states.json which is a generated trie used for parsing area codes, and area.json which maps area codes to geolocations. A special case is the North American Number Plan (NANP) countries (Canada, US, Bermuda, and many Caribbean nations) which are all configured together in the https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/und/US directory for convenience. Mike M, I can imagine that the area codes and geolocations change very regularly, but the formats do not. "(XXX) XXX-XXXX" has been the de facto standard American format for many, many years for example. Ilib contains multiple styles of format as well, since the format is often a matter of user preference instead of government mandate. See https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/und/DE/phonefmt.json for a country with 5 different possible styles. Also under https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/und/ are the address.json files. These are meta-information plus a list of regular expressions and hard-coded lists used to parse the addresses. It doesn't get it right all the time (the US one has problems with two word localities like "San Francisco" for example), but it gets it reasonably close, and pretty much every country in the world is covered. Under 55 of the locale dirs are the name.json files which configure the name formats and settings for those languages. The top level contains a western-centric fall-back file used when the language doesn't have its own parser: https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/name.json. An example of Asian formats: https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/ja/name.json Almost all of the phone data was gleaned either from the documents on the International Telecommunications Union site which has the officially published numbering plan documents for many countries, as well as wikipedia which has information about the formats. The address and name information is gleaned almost exclusively from wikipedia. Edwin On 04/20/2016 11:27 PM, Chris Leonard wrote: > On Thu, Apr 21, 2016 at 1:34 AM, Edwin Hoogerbeets > wrote: >> I heard talk 2 or 3 years ago about a proposal to add name, address, and >> phone number formats to CLDR. What ever happened to those efforts? I don't >> really see data in CLDR 29 about those. >> >> In my i18n library for JS called "ilib", I have data about the address >> formats for practically every country in the world, as well as the phone >> formats and name formats for many countries. I would love to contribute this >> data to CLDR and then later leverage other people's local knowledge to fill >> in the gaps where my data is lacking... >> >> Can someone direct me to the folks who are working on these? Thanks, >> > > > Dear Edwin. > > > I'd be interested in comparing your data to that in the glibc locales. > > Is there a link to your repo you can provide? > > cjl From cameron at lumoslabs.com Thu Apr 21 19:02:12 2016 From: cameron at lumoslabs.com (Cameron Dutro) Date: Thu, 21 Apr 2016 17:02:12 -0700 Subject: names, addresses, phone numbers In-Reply-To: <5719638B.8090602@gmail.com> References: <57186673.3060408@gmail.com> <5719638B.8090602@gmail.com> Message-ID: I remember some fine folks from Paypal talking about something like this at IUC a few years ago. Does anyone remember who spoke and perhaps how to get in touch with them? -Cameron On Thu, Apr 21, 2016 at 4:34 PM, Edwin Hoogerbeets wrote: > Chris, you can see the data at: > > https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/ > > Under there is > https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/und/ > directories which contain the phone files for 22 countries. The phone files > are phonefmt.json for the progressive formats designed to be used for > format partial and full numbers while dialing digits in a phone UI, > numplan.json for the basic numbering plan information, states.json which is > a generated trie used for parsing area codes, and area.json which maps area > codes to geolocations. A special case is the North American Number Plan > (NANP) countries (Canada, US, Bermuda, and many Caribbean nations) which > are all configured together in the > https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/und/US > directory for convenience. > > Mike M, I can imagine that the area codes and geolocations change very > regularly, but the formats do not. "(XXX) XXX-XXXX" has been the de facto > standard American format for many, many years for example. Ilib contains > multiple styles of format as well, since the format is often a matter of > user preference instead of government mandate. See > https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/und/DE/phonefmt.json > for a country with 5 different possible styles. > > Also under > https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/und/ > are the address.json files. These are meta-information plus a list of > regular expressions and hard-coded lists used to parse the addresses. It > doesn't get it right all the time (the US one has problems with two word > localities like "San Francisco" for example), but it gets it reasonably > close, and pretty much every country in the world is covered. > > Under 55 of the locale dirs are the name.json files which configure the > name formats and settings for those languages. The top level contains a > western-centric fall-back file used when the language doesn't have its own > parser: > https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/name.json. > An example of Asian formats: > https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/ja/name.json > > Almost all of the phone data was gleaned either from the documents on the > International Telecommunications Union site which has the officially > published numbering plan documents for many countries, as well as wikipedia > which has information about the formats. The address and name information > is gleaned almost exclusively from wikipedia. > > Edwin > > > > On 04/20/2016 11:27 PM, Chris Leonard wrote: > >> On Thu, Apr 21, 2016 at 1:34 AM, Edwin Hoogerbeets >> wrote: >> >>> I heard talk 2 or 3 years ago about a proposal to add name, address, and >>> phone number formats to CLDR. What ever happened to those efforts? I >>> don't >>> really see data in CLDR 29 about those. >>> >>> In my i18n library for JS called "ilib", I have data about the address >>> formats for practically every country in the world, as well as the phone >>> formats and name formats for many countries. I would love to contribute >>> this >>> data to CLDR and then later leverage other people's local knowledge to >>> fill >>> in the gaps where my data is lacking... >>> >>> Can someone direct me to the folks who are working on these? Thanks, >>> >>> >> >> Dear Edwin. >> >> >> I'd be interested in comparing your data to that in the glibc locales. >> >> Is there a link to your repo you can provide? >> >> cjl >> > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimckenna at paypal.com Thu Apr 21 20:11:07 2016 From: mimckenna at paypal.com (Mckenna, Mike) Date: Fri, 22 Apr 2016 01:11:07 +0000 Subject: names, addresses, phone numbers In-Reply-To: References: <57186673.3060408@gmail.com> <5719638B.8090602@gmail.com> Message-ID: <87B5B64D-E6AC-4186-8094-6D3783556590@paypal.com> That would be me, and Erwin Hom (erwin.hom at gmail.com) who spoke at IUC. My latest work is coalescing the schema used for Google address validation metadata, HTML5.1 autofill fields, AddressDoctor, and some geocoding standards, for a portable address format that works well across countries, can be adapted easily to common open source libraries like the google code, and uses generic terms like HTML so as not to confuse state/province/prefecture/country or suburb/district/ward/neighborhood with colloquially named fields using US nomenclature. The meta meta data I need for each country (my next project) is * The input format ? what order is expected on input, what fields are required and regex if known * Variations by local or international script(s) * Order change if address lookup using postcode is used * The output or display format ? what order, punctuation, and case-mapping * Variations for multi-line, single-line * Variations for local or international address * Cross-walk mapping between the portable address schema and HTML5.1, i18napis, hcard, AddressDoctor Name is pretty straightforward and what we do is nowhere near as complete or elegant as what Edwin has in his code, but we do add character-range regex because for us valid legal names have to be composed of characters that are allowed in identity or financial documents. Organization name gets more punctuation, but the character range is limited like personal name. The range limitations is an extension of the CLDR exemplar characters, and combined with normalization, helps reduce spoofing and confusables. For an interesting read on names, take a look at the name restrictions for the UK Deed Poll. For phone, we just punted and use the google phone lib. The big help there is the phone validation. Edwin is correct that the formats do not change much, but we like that for display, the google lib chooses the correct format, e.g. for the many prefix formats for Germany. Mike McKenna Internationalization Technology Product Owner +1-408-967-3631 (desk), +1-510-332-7820 (mobile) PayPal 2211 N. First Street, San Jose CA 95131 - USA From: CLDR-Users > on behalf of Cameron Dutro > Date: Thursday, April 21, 2016 at 5:02 PM To: Edwin Hoogerbeets > Cc: "cldr-users at unicode.org" >, Chris Leonard > Subject: Re: names, addresses, phone numbers I remember some fine folks from Paypal talking about something like this at IUC a few years ago. Does anyone remember who spoke and perhaps how to get in touch with them? -Cameron On Thu, Apr 21, 2016 at 4:34 PM, Edwin Hoogerbeets > wrote: Chris, you can see the data at: https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/ Under there is https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/und/ directories which contain the phone files for 22 countries. The phone files are phonefmt.json for the progressive formats designed to be used for format partial and full numbers while dialing digits in a phone UI, numplan.json for the basic numbering plan information, states.json which is a generated trie used for parsing area codes, and area.json which maps area codes to geolocations. A special case is the North American Number Plan (NANP) countries (Canada, US, Bermuda, and many Caribbean nations) which are all configured together in the https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/und/US directory for convenience. Mike M, I can imagine that the area codes and geolocations change very regularly, but the formats do not. "(XXX) XXX-XXXX" has been the de facto standard American format for many, many years for example. Ilib contains multiple styles of format as well, since the format is often a matter of user preference instead of government mandate. See https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/und/DE/phonefmt.json for a country with 5 different possible styles. Also under https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/und/ are the address.json files. These are meta-information plus a list of regular expressions and hard-coded lists used to parse the addresses. It doesn't get it right all the time (the US one has problems with two word localities like "San Francisco" for example), but it gets it reasonably close, and pretty much every country in the world is covered. Under 55 of the locale dirs are the name.json files which configure the name formats and settings for those languages. The top level contains a western-centric fall-back file used when the language doesn't have its own parser: https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/name.json. An example of Asian formats: https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/ja/name.json Almost all of the phone data was gleaned either from the documents on the International Telecommunications Union site which has the officially published numbering plan documents for many countries, as well as wikipedia which has information about the formats. The address and name information is gleaned almost exclusively from wikipedia. Edwin On 04/20/2016 11:27 PM, Chris Leonard wrote: On Thu, Apr 21, 2016 at 1:34 AM, Edwin Hoogerbeets > wrote: I heard talk 2 or 3 years ago about a proposal to add name, address, and phone number formats to CLDR. What ever happened to those efforts? I don't really see data in CLDR 29 about those. In my i18n library for JS called "ilib", I have data about the address formats for practically every country in the world, as well as the phone formats and name formats for many countries. I would love to contribute this data to CLDR and then later leverage other people's local knowledge to fill in the gaps where my data is lacking... Can someone direct me to the folks who are working on these? Thanks, Dear Edwin. I'd be interested in comparing your data to that in the glibc locales. Is there a link to your repo you can provide? cjl _______________________________________________ CLDR-Users mailing list CLDR-Users at unicode.org http://unicode.org/mailman/listinfo/cldr-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From alolita.sharma at gmail.com Thu Apr 21 19:09:01 2016 From: alolita.sharma at gmail.com (Alolita Sharma) Date: Thu, 21 Apr 2016 17:09:01 -0700 Subject: names, addresses, phone numbers In-Reply-To: References: <57186673.3060408@gmail.com> <5719638B.8090602@gmail.com> Message-ID: The i18n team at PayPal has been doing a lot of work in names, address and postal formats over the past couple of years. Mike McKenna would be the best person to reach out to. He must be subscribed to this list :-) It would be great to collaborate and get this info into CLDR. Best, Alolita On Thu, Apr 21, 2016 at 5:02 PM, Cameron Dutro wrote: > I remember some fine folks from Paypal talking about something like this > at IUC a few years ago. Does anyone remember who spoke and perhaps how to > get in touch with them? > > -Cameron > > On Thu, Apr 21, 2016 at 4:34 PM, Edwin Hoogerbeets > wrote: > >> Chris, you can see the data at: >> >> https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/ >> >> Under there is >> https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/und/ >> directories which contain the phone files for 22 countries. The phone files >> are phonefmt.json for the progressive formats designed to be used for >> format partial and full numbers while dialing digits in a phone UI, >> numplan.json for the basic numbering plan information, states.json which is >> a generated trie used for parsing area codes, and area.json which maps area >> codes to geolocations. A special case is the North American Number Plan >> (NANP) countries (Canada, US, Bermuda, and many Caribbean nations) which >> are all configured together in the >> https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/und/US >> directory for convenience. >> >> Mike M, I can imagine that the area codes and geolocations change very >> regularly, but the formats do not. "(XXX) XXX-XXXX" has been the de facto >> standard American format for many, many years for example. Ilib contains >> multiple styles of format as well, since the format is often a matter of >> user preference instead of government mandate. See >> https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/und/DE/phonefmt.json >> for a country with 5 different possible styles. >> >> Also under >> https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/und/ >> are the address.json files. These are meta-information plus a list of >> regular expressions and hard-coded lists used to parse the addresses. It >> doesn't get it right all the time (the US one has problems with two word >> localities like "San Francisco" for example), but it gets it reasonably >> close, and pretty much every country in the world is covered. >> >> Under 55 of the locale dirs are the name.json files which configure the >> name formats and settings for those languages. The top level contains a >> western-centric fall-back file used when the language doesn't have its own >> parser: >> https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/name.json. >> An example of Asian formats: >> https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/ja/name.json >> >> Almost all of the phone data was gleaned either from the documents on the >> International Telecommunications Union site which has the officially >> published numbering plan documents for many countries, as well as wikipedia >> which has information about the formats. The address and name information >> is gleaned almost exclusively from wikipedia. >> >> Edwin >> >> >> >> On 04/20/2016 11:27 PM, Chris Leonard wrote: >> >>> On Thu, Apr 21, 2016 at 1:34 AM, Edwin Hoogerbeets >>> wrote: >>> >>>> I heard talk 2 or 3 years ago about a proposal to add name, address, and >>>> phone number formats to CLDR. What ever happened to those efforts? I >>>> don't >>>> really see data in CLDR 29 about those. >>>> >>>> In my i18n library for JS called "ilib", I have data about the address >>>> formats for practically every country in the world, as well as the phone >>>> formats and name formats for many countries. I would love to contribute >>>> this >>>> data to CLDR and then later leverage other people's local knowledge to >>>> fill >>>> in the gaps where my data is lacking... >>>> >>>> Can someone direct me to the folks who are working on these? Thanks, >>>> >>>> >>> >>> Dear Edwin. >>> >>> >>> I'd be interested in comparing your data to that in the glibc locales. >>> >>> Is there a link to your repo you can provide? >>> >>> cjl >>> >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.chew at gmail.com Thu Apr 21 19:09:13 2016 From: patrick.chew at gmail.com (Patrick Chew) Date: Thu, 21 Apr 2016 17:09:13 -0700 Subject: names, addresses, phone numbers In-Reply-To: References: <57186673.3060408@gmail.com> <5719638B.8090602@gmail.com> Message-ID: Erwin Hom and Mike McKenna were the two presenters. cheers, - Patrick On Thu, Apr 21, 2016 at 5:02 PM, Cameron Dutro wrote: > I remember some fine folks from Paypal talking about something like this > at IUC a few years ago. Does anyone remember who spoke and perhaps how to > get in touch with them? > > -Cameron > > On Thu, Apr 21, 2016 at 4:34 PM, Edwin Hoogerbeets > wrote: > >> Chris, you can see the data at: >> >> https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/ >> >> Under there is >> https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/und/ >> directories which contain the phone files for 22 countries. The phone files >> are phonefmt.json for the progressive formats designed to be used for >> format partial and full numbers while dialing digits in a phone UI, >> numplan.json for the basic numbering plan information, states.json which is >> a generated trie used for parsing area codes, and area.json which maps area >> codes to geolocations. A special case is the North American Number Plan >> (NANP) countries (Canada, US, Bermuda, and many Caribbean nations) which >> are all configured together in the >> https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/und/US >> directory for convenience. >> >> Mike M, I can imagine that the area codes and geolocations change very >> regularly, but the formats do not. "(XXX) XXX-XXXX" has been the de facto >> standard American format for many, many years for example. Ilib contains >> multiple styles of format as well, since the format is often a matter of >> user preference instead of government mandate. See >> https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/und/DE/phonefmt.json >> for a country with 5 different possible styles. >> >> Also under >> https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/und/ >> are the address.json files. These are meta-information plus a list of >> regular expressions and hard-coded lists used to parse the addresses. It >> doesn't get it right all the time (the US one has problems with two word >> localities like "San Francisco" for example), but it gets it reasonably >> close, and pretty much every country in the world is covered. >> >> Under 55 of the locale dirs are the name.json files which configure the >> name formats and settings for those languages. The top level contains a >> western-centric fall-back file used when the language doesn't have its own >> parser: >> https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/name.json. >> An example of Asian formats: >> https://sourceforge.net/p/i18nlib/code/HEAD/tree/trunk/js/data/locale/ja/name.json >> >> Almost all of the phone data was gleaned either from the documents on the >> International Telecommunications Union site which has the officially >> published numbering plan documents for many countries, as well as wikipedia >> which has information about the formats. The address and name information >> is gleaned almost exclusively from wikipedia. >> >> Edwin >> >> >> >> On 04/20/2016 11:27 PM, Chris Leonard wrote: >> >>> On Thu, Apr 21, 2016 at 1:34 AM, Edwin Hoogerbeets >>> wrote: >>> >>>> I heard talk 2 or 3 years ago about a proposal to add name, address, and >>>> phone number formats to CLDR. What ever happened to those efforts? I >>>> don't >>>> really see data in CLDR 29 about those. >>>> >>>> In my i18n library for JS called "ilib", I have data about the address >>>> formats for practically every country in the world, as well as the phone >>>> formats and name formats for many countries. I would love to contribute >>>> this >>>> data to CLDR and then later leverage other people's local knowledge to >>>> fill >>>> in the gaps where my data is lacking... >>>> >>>> Can someone direct me to the folks who are working on these? Thanks, >>>> >>>> >>> >>> Dear Edwin. >>> >>> >>> I'd be interested in comparing your data to that in the glibc locales. >>> >>> Is there a link to your repo you can provide? >>> >>> cjl >>> >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cameron at lumoslabs.com Fri Apr 22 00:14:35 2016 From: cameron at lumoslabs.com (Cameron Dutro) Date: Thu, 21 Apr 2016 22:14:35 -0700 Subject: Trac Login Message-ID: Hey guys, I used to be able to browse the CLDR SVN repository by visiting http://www.unicode.org/cldr/trac/, but now it asks me for a username and password. Why the lockdown? -Cameron -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Fri Apr 22 07:53:18 2016 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Fri, 22 Apr 2016 14:53:18 +0200 Subject: Trac Login In-Reply-To: References: Message-ID: We had a DOS attack, and had to restrict access until we figured out what was going on. Mark On Fri, Apr 22, 2016 at 7:14 AM, Cameron Dutro wrote: > Hey guys, > > I used to be able to browse the CLDR SVN repository by visiting > http://www.unicode.org/cldr/trac/, but now it asks me for a username and > password. Why the lockdown? > > -Cameron > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cameron at lumoslabs.com Fri Apr 22 10:18:35 2016 From: cameron at lumoslabs.com (Cameron Dutro) Date: Fri, 22 Apr 2016 08:18:35 -0700 Subject: Trac Login In-Reply-To: References: Message-ID: Understood, thanks Mark. -Cameron On Friday, April 22, 2016, Mark Davis ?? wrote: > We had a DOS attack, and had to restrict access until we figured out what > was going on. > > Mark > > On Fri, Apr 22, 2016 at 7:14 AM, Cameron Dutro > wrote: > >> Hey guys, >> >> I used to be able to browse the CLDR SVN repository by visiting >> http://www.unicode.org/cldr/trac/, but now it asks me for a username and >> password. Why the lockdown? >> >> -Cameron >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> >> http://unicode.org/mailman/listinfo/cldr-users >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Tue Apr 26 04:01:37 2016 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Tue, 26 Apr 2016 11:01:37 +0200 Subject: Plurals Message-ID: I was working on http://unicode.org/cldr/trac/ticket/9258. There are two languages there that we don't support (to, fo), so I changed the test for now so that those are excluded. I think they are languages that you have coverage for. If you want to add the data, I augmented the tests to print out some information with failures so that it can be gathered. Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From eik at iki.fi Wed Apr 27 07:07:00 2016 From: eik at iki.fi (Erkki I Kolehmainen) Date: Wed, 27 Apr 2016 15:07:00 +0300 Subject: Time Zones Message-ID: <000001d1a07d$4de69bd0$e9b3d370$@fi> There are three versions of time-zone names: generic, standard, and daylight. The generic name has now been placed as the name for the standard time. Wouldn't it be more logical to keep the generic name and finally fill the standard and daylight slots automatically with it when specific data is redundant? Erkki I. Kolehmainen Tilkankatu 12 A 3, 00300 Helsinki, Finland Mob: +358400825943, Tel: +358943682643, Fax: +35813318116 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Wed Apr 27 07:09:07 2016 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Wed, 27 Apr 2016 14:09:07 +0200 Subject: names, addresses, phone numbers In-Reply-To: <57186673.3060408@gmail.com> References: <57186673.3060408@gmail.com> Message-ID: We considered adding that to CLDR, but decided that there were other OSS solutions out there (google's and others mentioned on this thread), so that we didn't see a need to duplicate those efforts. Mark On Thu, Apr 21, 2016 at 7:34 AM, Edwin Hoogerbeets wrote: > I heard talk 2 or 3 years ago about a proposal to add name, address, and > phone number formats to CLDR. What ever happened to those efforts? I don't > really see data in CLDR 29 about those. > > In my i18n library for JS called "ilib", I have data about the address > formats for practically every country in the world, as well as the phone > formats and name formats for many countries. I would love to contribute > this data to CLDR and then later leverage other people's local knowledge to > fill in the gaps where my data is lacking... > > Can someone direct me to the folks who are working on these? Thanks, > > Edwin > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimckenna at paypal.com Wed Apr 27 11:25:46 2016 From: mimckenna at paypal.com (Mckenna, Mike) Date: Wed, 27 Apr 2016 16:25:46 +0000 Subject: names, addresses, phone numbers In-Reply-To: References: <57186673.3060408@gmail.com> Message-ID: <3FB576C7-379B-4BE5-8548-6B690CC7E5B2@paypal.com> Yes ? we saw the update that postal codes were deprecated in CLDR#27 and the note that postal code regex being removed in CLDR#28 and so moved our metadata source from CLDR to i18napis and have been contributing back to the google code. What we have not done, is contribute all the auxilliary metadata we have been generating for field names in multiple languages, and additional layouts needed for multi-line, single-line, and input scenarios, plus crosswalk mappings to HTML5.1 and geocode APIs. Mike McKenna Internationalization Technology Product Owner +1-408-967-3631 (desk), +1-510-332-7820 (mobile) PayPal 2211 N. First Street, San Jose CA 95131 - USA From: CLDR-Users > on behalf of Mark Davis ?? > Date: Wednesday, April 27, 2016 at 5:09 AM To: Edwin Hoogerbeets > Cc: "cldr-users at unicode.org" > Subject: Re: names, addresses, phone numbers We considered adding that to CLDR, but decided that there were other OSS solutions out there (google's and others mentioned on this thread), so that we didn't see a need to duplicate those efforts. Mark On Thu, Apr 21, 2016 at 7:34 AM, Edwin Hoogerbeets > wrote: I heard talk 2 or 3 years ago about a proposal to add name, address, and phone number formats to CLDR. What ever happened to those efforts? I don't really see data in CLDR 29 about those. In my i18n library for JS called "ilib", I have data about the address formats for practically every country in the world, as well as the phone formats and name formats for many countries. I would love to contribute this data to CLDR and then later leverage other people's local knowledge to fill in the gaps where my data is lacking... Can someone direct me to the folks who are working on these? Thanks, Edwin _______________________________________________ CLDR-Users mailing list CLDR-Users at unicode.org http://unicode.org/mailman/listinfo/cldr-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjl at sugarlabs.org Wed Apr 27 23:04:52 2016 From: cjl at sugarlabs.org (Chris Leonard) Date: Thu, 28 Apr 2016 00:04:52 -0400 Subject: Ticket management query Message-ID: Why would a CLDR ticket reviewer (apparently Mark) ask for more information on a ticket and then close it when a general user does not have the privs to re-open and add the requested information? Just trying to understand the thought process here. It's fine to ask for more info, but why close the ticket? It just forces a restart on the whole process and burns ticket numbers (that may have been referenced elsewhere) unnecessarily. cjl This ticket closed http://unicode.org/cldr/trac/ticket/9378 Created new ticket http://unicode.org/cldr/trac/ticket/9398 From pedberg at apple.com Thu Apr 28 17:44:01 2016 From: pedberg at apple.com (Peter Edberg) Date: Thu, 28 Apr 2016 15:44:01 -0700 Subject: Use of Unicode 6.3 bidi format chars in CLDR number formats? Message-ID: <91EC00B1-E0CC-48CF-B44C-D48C9BDC92FC@apple.com> Dear CLDR users, One of the longstanding challenges in CLDR has been designing number formats (especially currency formats) and short date formats for bidi-language locales (e.g. ar, fa, he) so that the formatted text is displayed correctly in various contexts (where there may be no surrounding text, or initial text with strong right-to-left or strong left-to-right characters). With currency formats, the currency symbols themselves may involve characters that are neutral, or strong right-to-left or strong left-to-right. This is exactly one of the types of problems that was intended to be addressed by the addition of new bidi direction format characters in Unicode 6.3 (Sept. 2013), such as U+2067 RIGHT?TO?LEFT ISOLATE (RLI) and U+2069 POP DIRECTIONAL ISOLATE (PDI). See UAX #14 Unicode Bidirectional Algorithm . For CLDR 30, we are considering whether to start using some of these characters in some number formats; typically those formats would begin with a RLI, and end with a PDI. Two important considerations are: 1. Will the systems on which CLDR 30 data is used implement support for these bidi direction format characters? 2. Do the systems that will be used for CLDR 30 Survey Tool data collection implement support for those characters (e.g. for generating correct examples)? For cases in which the answers to the above questions are ?no?, we can address some of the issues as follows: ? For #1, in the tools that generate JSON data and ICU-format data, options can be added to replace any RLI?PDI combination that wraps a number format with an initial RLM (right-left mark) instead. This will result in the format having the same display layout when used in isolation, thought it may not have the same layout when used in the middle of other text. ? For #2, the Survey Tool example generators can also replace RLI?PDI with an initial RLM, and ensure that the resulting format is displayed by itself in a text cell, in order to produce the same format display layout that will be produced by the RLI?PDI on systems that support them. One remaining concern is the extent to which copy-paste of formats generated by CLDR will correctly include the RLI..PDI characters. We would appreciate any input from CLDR users on this, thanks! Peter Edberg, for the CLDR project -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Thu Apr 28 20:59:05 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Fri, 29 Apr 2016 03:59:05 +0200 Subject: Use of Unicode 6.3 bidi format chars in CLDR number formats? In-Reply-To: <91EC00B1-E0CC-48CF-B44C-D48C9BDC92FC@apple.com> References: <91EC00B1-E0CC-48CF-B44C-D48C9BDC92FC@apple.com> Message-ID: Those characters are only needed in plain-text documents that have no other solution. In rich-text documents, directional controls should be replaced by styles or equivalent tags (e.g. the "bdi" container element in HTML5). As we are moving to most applications being developed with an interface in HTML5 (or similar rednering and layout engines), I'd prefer avoid using those controls, whose support is erratic or frequently conflicts with tagging/styling in non obvious ways. RLI/PDF or LRI/PDF would then not be needed -> use "bdi" instead (whose CSS styling will map the necessary "unicode-bidi:" property. I know that "unicode-bidi:isolate" is still not supported everywhere in all browsers, that still only have "unicode-bidi:embed", but the same browsers then don't support as well the isolates and don't recognize RLI/PDI and LRI/PDI as they still use the older specification of the Bidi algorithm that did not have isolates. Additionally, RLI and LRI set a default direction inside, when bdi does not force it, allowing the content to determine their own initial direction: RLI and LRI are in fact equivalent to "bdi" elements with a "dir" attribute, or the combination of CSS "unicode-bidi:isolate" with "direction:rtl" or "direction:ltr", where bdi alone (without "dir" attribute) sets "direction:" to "initial" (overriding the inheritance of the current CSS direction from the parent to the children, in order to create a true isolate, as if the inner children where in a new separate document, rendered without any previous context) Using RLM would make things even worse (the isolation would be completely lost): the numer will be correct, but any text after the formatted value would inherit the context set by the formatted entity (which is not necessarily in the same language or script. Note also that for currencies, the currency symbols could use a symbol in the same native script, or an ISO currency symbol (using Latin letters).That symbol may be left of the incorrect side of the currency value depending on context or could force a direction after it (notably for Latin symbols that have strong LTR direction: the symbol itself may need to be isolted in the whole currency format, combining the formatted value with the effective symbol). 2016-04-29 0:44 GMT+02:00 Peter Edberg : > Dear CLDR users, > > One of the longstanding challenges in CLDR has been designing number > formats (especially currency formats) and short date formats for > bidi-language locales (e.g. ar, fa, he) so that the formatted text is > displayed correctly in various contexts (where there may be no surrounding > text, or initial text with strong right-to-left or strong left-to-right > characters). With currency formats, the currency symbols themselves may > involve characters that are neutral, or strong right-to-left or strong > left-to-right. > > This is exactly one of the types of problems that was intended to be > addressed by the addition of new bidi direction format characters in > Unicode 6.3 (Sept. 2013), such as U+2067 RIGHT?TO?LEFT ISOLATE (RLI) and > U+2069 POP DIRECTIONAL ISOLATE (PDI). See UAX #14 Unicode Bidirectional > Algorithm . > > For CLDR 30, we are considering whether to start using some of these > characters in some number formats; typically those formats would begin with > a RLI, and end with a PDI. Two important considerations are: > 1. Will the systems on which CLDR 30 data is used implement support for > these bidi direction format characters? > 2. Do the systems that will be used for CLDR 30 Survey Tool data > collection implement support for those characters (e.g. for generating > correct examples)? > > For cases in which the answers to the above questions are ?no?, we can > address some of the issues as follows: > ? For #1, in the tools that generate JSON data and ICU-format data, > options can be added to replace any RLI?PDI combination that wraps a number > format with an initial RLM (right-left mark) instead. This will result in > the format having the same display layout when used in isolation, thought > it may not have the same layout when used in the middle of other text. > ? For #2, the Survey Tool example generators can also replace RLI?PDI with > an initial RLM, and ensure that the resulting format is displayed by itself > in a text cell, in order to produce the same format display layout that > will be produced by the RLI?PDI on systems that support them. > > One remaining concern is the extent to which copy-paste of formats > generated by CLDR will correctly include the RLI..PDI characters. > > We would appreciate any input from CLDR users on this, thanks! > > Peter Edberg, for the CLDR project > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Thu Apr 28 21:30:20 2016 From: asmusf at ix.netcom.com (Asmus Freytag (c)) Date: Thu, 28 Apr 2016 19:30:20 -0700 Subject: Use of Unicode 6.3 bidi format chars in CLDR number formats? In-Reply-To: <91EC00B1-E0CC-48CF-B44C-D48C9BDC92FC@apple.com> References: <91EC00B1-E0CC-48CF-B44C-D48C9BDC92FC@apple.com> Message-ID: On 4/28/2016 3:44 PM, Peter Edberg wrote: > Dear CLDR users, Peter, I think this is where a "one size fits all" solution isn't the answer. Ideally, I'll be able to use CLDR (and formatting tools depending on it) to format date/time/number strings for a variety of consumers. Plain text (pre 6.3), Plain text with isolates support, and plain text for embedding into markup (where I'll supply external markup to isolate and otherwise prep the field). Given that CLDR data should be specifying the desired appearance (not the bidi controls necessary to get to that) it should be possible to provide mechanical conversion between these formats, rather than having to make a single choice for the data base. Not only will "pre 6.3" support be an issue for a long time to come, I am confidently predicting that the need for multiple bidi flavors will continue beyond the adoption of the isolates. Whether a string is part of an (arbitrary) plain text stream or a separate data field (with its scope determined by markup and with it's own bidi styling) will continue to call for somewhat different data. Given the correct choice of internal format for the database, it should be possible to provide all of these flavors mechanically, thus avoiding the full cost of duplication, while freeing users from having to make those format translations themselves. A./ From srl at icu-project.org Thu Apr 28 23:37:22 2016 From: srl at icu-project.org (Steven Loomis) Date: Thu, 28 Apr 2016 21:37:22 -0700 Subject: Use of Unicode 6.3 bidi format chars in CLDR number formats? In-Reply-To: References: <91EC00B1-E0CC-48CF-B44C-D48C9BDC92FC@apple.com> Message-ID: Asmus: > Given the correct choice of internal format for the database, The internal format is a Unicode String, specifically, UTF-8. > Given that CLDR data should be specifying the desired appearance But CLDR is text, specifically, XML, and not glyphs? Steven El 4/28/16 7:30 PM, "CLDR-Users en nombre de Asmus Freytag (c)" escribi?: >On 4/28/2016 3:44 PM, Peter Edberg wrote: >> Dear CLDR users, > >Peter, > >I think this is where a "one size fits all" solution isn't the answer. > >Ideally, I'll be able to use CLDR (and formatting tools depending on it) >to format date/time/number strings for a variety of consumers. > >Plain text (pre 6.3), Plain text with isolates support, and plain text >for embedding into markup (where I'll supply external markup to isolate >and otherwise prep the field). > >Given that CLDR data should be specifying the desired appearance (not >the bidi controls necessary to get to that) it should be possible to >provide mechanical conversion between these formats, rather than having >to make a single choice for the data base. > >Not only will "pre 6.3" support be an issue for a long time to come, I >am confidently predicting that the need for multiple bidi flavors will >continue beyond the adoption of the isolates. Whether a string is part >of an (arbitrary) plain text stream or a separate data field (with its >scope determined by markup and with it's own bidi styling) will continue >to call for somewhat different data. > >Given the correct choice of internal format for the database, it should >be possible to provide all of these flavors mechanically, thus avoiding >the full cost of duplication, while freeing users from having to make >those format translations themselves. > >A./ >_______________________________________________ >CLDR-Users mailing list >CLDR-Users at unicode.org >http://unicode.org/mailman/listinfo/cldr-users From asmusf at ix.netcom.com Thu Apr 28 23:59:33 2016 From: asmusf at ix.netcom.com (Asmus Freytag (c)) Date: Thu, 28 Apr 2016 21:59:33 -0700 Subject: Use of Unicode 6.3 bidi format chars in CLDR number formats? In-Reply-To: References: <91EC00B1-E0CC-48CF-B44C-D48C9BDC92FC@apple.com> Message-ID: <5c98059b-47c3-a62f-11a1-4370edc526e2@ix.netcom.com> On 4/28/2016 9:37 PM, Steven Loomis wrote: > Asmus: > >> Given the correct choice of internal format for the database, > > The internal format is a Unicode String, specifically, UTF-8. That covers a lot of ground. > >> Given that CLDR data should be specifying the desired appearance > But CLDR is text, specifically, XML, and not glyphs? Sorry, I meant that CLDR should be specified in a way that the user expected "visual ordering" can be determined., not "appearance" as in "glyphs". Just to sidestep a potential misunderstanding: I'm not suggesting that the format be in visual order. Just that there are some assumptions made about the context in which the Unicode string (when bidi processed) will result in the correct visual appearance. For example, if you assume that a string as stored displays correct when it is part of a RTL paragraph, then you should be able to compute what you need to do to get the correct visual order when the text is part of an LTR paragraph, part of an isolated embedding, etc. I haven't looked into the actualities, but I know that while you can convert uniquely between some formats in a given direction, there are some conversions (or directions) that are not unique. So the challenge would be for the database to find some format that allows conversions to all the bidi contexts (and capabilities) that are typically encountered. Storing things in visual order is a bad idea, because in the general case, conversion to logical order is not unique. But, instead of picking some "random" logical order (based on an assumption of what "might" be most needed) my suggestion is to carefully pick a "universal" format for the string, one that allows mechanical conversion to all the actual formats that people need, based on what environment they want to embed their strings into, and what sorts of embedding / isolation controls are actually supported. A./ > > Steven > > El 4/28/16 7:30 PM, "CLDR-Users en nombre de Asmus Freytag (c)" escribi?: > >> On 4/28/2016 3:44 PM, Peter Edberg wrote: >>> Dear CLDR users, >> Peter, >> >> I think this is where a "one size fits all" solution isn't the answer. >> >> Ideally, I'll be able to use CLDR (and formatting tools depending on it) >> to format date/time/number strings for a variety of consumers. >> >> Plain text (pre 6.3), Plain text with isolates support, and plain text >> for embedding into markup (where I'll supply external markup to isolate >> and otherwise prep the field). >> >> Given that CLDR data should be specifying the desired appearance (not >> the bidi controls necessary to get to that) it should be possible to >> provide mechanical conversion between these formats, rather than having >> to make a single choice for the data base. >> >> Not only will "pre 6.3" support be an issue for a long time to come, I >> am confidently predicting that the need for multiple bidi flavors will >> continue beyond the adoption of the isolates. Whether a string is part >> of an (arbitrary) plain text stream or a separate data field (with its >> scope determined by markup and with it's own bidi styling) will continue >> to call for somewhat different data. >> >> Given the correct choice of internal format for the database, it should >> be possible to provide all of these flavors mechanically, thus avoiding >> the full cost of duplication, while freeing users from having to make >> those format translations themselves. >> >> A./ >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users From verdy_p at wanadoo.fr Fri Apr 29 00:58:44 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Fri, 29 Apr 2016 07:58:44 +0200 Subject: Use of Unicode 6.3 bidi format chars in CLDR number formats? In-Reply-To: References: <91EC00B1-E0CC-48CF-B44C-D48C9BDC92FC@apple.com> Message-ID: 2016-04-29 6:37 GMT+02:00 Steven Loomis : > Asmus: > > > Given the correct choice of internal format for the database, > > The internal format is a Unicode String, specifically, UTF-8. > > > Given that CLDR data should be specifying the desired appearance > > But CLDR is text, specifically, XML, and not glyphs? > My opinion is there are differences only in terms of usage of these translated resources: if they are intended to be used only in plain-text documents (without any form of styling or DOM), then Bidi controls may be needed. For usage within rich-text documents, those BiDi controls are in fact more a nuisance. I don't think it is the role of CLDR to dictate how those generated strings will be embedded in documents. So those resources should remain self-contained independantly of their context of use: we should only expect that the Bidi algorithm will be correct only for the trnalsated item itself, in isolation. If with this isolation we don't need any control, don't insert any one: it's up to the outer context to specify those that will be needed around the translated resource. So RLI/PDI or LRI/PDI should never been needed... except if this is to surround only a *part* of the translated resource, excludinc the start and/or end of it. However if the *whole* translated resources may need bidi controls in some contexts, I think this should be previded only as external metadata, to indicate how it can be safely embedded into another context. As those resouces are normally created for a specific locale, they already have an implicit default direction associated to that locale (including root, if ever needed). The alternative would be to provide two distinct resources, one for use in isolation (rich text docuemnts providing their own embedding via markup or style), another with surrounding Bidi controls for use in non-isolated contexts such as plain-text documents, but it would be overkill. My opinion is that it is enough to specify that a translated resource MUST be used in isolation only (this is not strictly the case for currency amounts composed with a formatted number and a currency symbol, or for other formatted numbers with a measurement unit, both normally following the regular order of words in the external language (except in English and similar languages which put the currency symbol before the amount). There are similar issues with formatting more complex numbers : ordering of the positive or negative sign, ordering of the exponential notation, ordering of an additional percent/permille symbol, ordering of additional fractions (when not using the decimal notation but a true fraction separated from the integer part by some additive notation or only by "styling" the fraction itself), ordering of date elements in numeric formats notably in abbreviated notations (e.g. "29/04/16" which should not be implicitly reordered as "16/04/29" depending on the RTL or LTR context before it): each language has its own interpretation of dates in specific orders, even if spans of characters inside the numeric notation have weak directions.) In all these texts, the resources proving the format should just specify in metadata if they expect a specific ordering (either "rtl", or "ltr", or "inherited" by default for almost all resources), and if this resource should also be isolated or not (affecting the order of elements in the context after it, or if those elements after the embedded resource should restore the direction that was effective before the start of the embedded resource). This could be just a few optional attributes in resources, with 5 radio-buttons in the CLDR interface to define them: * default/inherited * rtl * rtl isolated * ltr * ltr isolated (the 6th option: inherited with isolation, is not needed in my opinion). The old "embed" style of CSS should be deprecated. And we should never have to use Bidi overrides (RLO/PDF or LRO/PDF, or single marks like RLM and RLM that break everything). In most translated resources the "default/inherited" option will be used, no need of any additional attributes in the LDML schema. Otherwise we'll see two optional attributes: isolate="true" (default is same as isolate="false"), and dir="ltr" or dir="rtl" (default is dir="inherited"). In such cases, we'll never need any Bidi control, or they can be generated on the fly by the I18n library for usage in plain-text only contexts. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Fri Apr 29 02:24:51 2016 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Fri, 29 Apr 2016 09:24:51 +0200 Subject: Use of Unicode 6.3 bidi format chars in CLDR number formats? In-Reply-To: <5c98059b-47c3-a62f-11a1-4370edc526e2@ix.netcom.com> References: <91EC00B1-E0CC-48CF-B44C-D48C9BDC92FC@apple.com> <5c98059b-47c3-a62f-11a1-4370edc526e2@ix.netcom.com> Message-ID: The number and currency formats can be used in a variety of contexts and adjacent to a variety of text. The bidi isolate characters were designed *precisely* to address this kind of need, without forcing people to jump through hoops. The *only* question we have is whether the major platforms/systems that use CLDR are all up to speed in terms of supporting the "new" (2013) characters in their BIDI algorithms: U+2066 LEFT-TO-RIGHT ISOLATE U+2067 RIGHT-TO-LEFT ISOLATE U+2068 FIRST STRONG ISOLATE U+2069 POP DIRECTIONAL ISOLATE Of course, anyone who is using the number formats in a richer format (like HTML) is free to remap characters to markup when processing. That's their choice. Mark Mark On Fri, Apr 29, 2016 at 6:59 AM, Asmus Freytag (c) wrote: > On 4/28/2016 9:37 PM, Steven Loomis wrote: > >> Asmus: >> >> Given the correct choice of internal format for the database, >>> >> >> The internal format is a Unicode String, specifically, UTF-8. >> > That covers a lot of ground. > >> >> Given that CLDR data should be specifying the desired appearance >>> >> But CLDR is text, specifically, XML, and not glyphs? >> > > Sorry, I meant that CLDR should be specified in a way that the user > expected "visual ordering" can be determined., not "appearance" as in > "glyphs". > > Just to sidestep a potential misunderstanding: I'm not suggesting that the > format be in visual order. Just that there are some assumptions made about > the context in which the Unicode string (when bidi processed) will result > in the correct visual appearance. > > For example, if you assume that a string as stored displays correct when > it is part of a RTL paragraph, then you should be able to compute what you > need to do to get the correct visual order when the text is part of an LTR > paragraph, part of an isolated embedding, etc. > > I haven't looked into the actualities, but I know that while you can > convert uniquely between some formats in a given direction, there are some > conversions (or directions) that are not unique. So the challenge would be > for the database to find some format that allows conversions to all the > bidi contexts (and capabilities) that are typically encountered. > > Storing things in visual order is a bad idea, because in the general case, > conversion to logical order is not unique. > > But, instead of picking some "random" logical order (based on an > assumption of what "might" be most needed) my suggestion is to carefully > pick a "universal" format for the string, one that allows mechanical > conversion to all the actual formats that people need, based on what > environment they want to embed their strings into, and what sorts of > embedding / isolation controls are actually supported. > > A./ > > > >> Steven >> >> El 4/28/16 7:30 PM, "CLDR-Users en nombre de Asmus Freytag (c)" < >> cldr-users-bounces at unicode.org en nombre de asmusf at ix.netcom.com> >> escribi?: >> >> On 4/28/2016 3:44 PM, Peter Edberg wrote: >>> >>>> Dear CLDR users, >>>> >>> Peter, >>> >>> I think this is where a "one size fits all" solution isn't the answer. >>> >>> Ideally, I'll be able to use CLDR (and formatting tools depending on it) >>> to format date/time/number strings for a variety of consumers. >>> >>> Plain text (pre 6.3), Plain text with isolates support, and plain text >>> for embedding into markup (where I'll supply external markup to isolate >>> and otherwise prep the field). >>> >>> Given that CLDR data should be specifying the desired appearance (not >>> the bidi controls necessary to get to that) it should be possible to >>> provide mechanical conversion between these formats, rather than having >>> to make a single choice for the data base. >>> >>> Not only will "pre 6.3" support be an issue for a long time to come, I >>> am confidently predicting that the need for multiple bidi flavors will >>> continue beyond the adoption of the isolates. Whether a string is part >>> of an (arbitrary) plain text stream or a separate data field (with its >>> scope determined by markup and with it's own bidi styling) will continue >>> to call for somewhat different data. >>> >>> Given the correct choice of internal format for the database, it should >>> be possible to provide all of these flavors mechanically, thus avoiding >>> the full cost of duplication, while freeing users from having to make >>> those format translations themselves. >>> >>> A./ >>> _______________________________________________ >>> CLDR-Users mailing list >>> CLDR-Users at unicode.org >>> http://unicode.org/mailman/listinfo/cldr-users >>> >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Fri Apr 29 02:56:11 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Fri, 29 Apr 2016 09:56:11 +0200 Subject: Use of Unicode 6.3 bidi format chars in CLDR number formats? In-Reply-To: References: <91EC00B1-E0CC-48CF-B44C-D48C9BDC92FC@apple.com> <5c98059b-47c3-a62f-11a1-4370edc526e2@ix.netcom.com> Message-ID: Yes but this is unnecessarily complex to edit in surveys, even if the XML or JSON exports are inserting these characters themselves, and even if libaries using the data may detect those characters (when they are properly paired, but not possible for RLM and LRM and overly complex for LRO/PDF and RLO/PDF) and replace them by markup or style (possible for LRI/PDI, RLI/PDI and FSI/PDI which is probably the best mapping in HTML for the "bdi" element without dir="ltr/rtl"). Do you expect that the survey will allow entering those controls easily? Can't there be helpers ? 2016-04-29 9:24 GMT+02:00 Mark Davis ?? : > The number and currency formats can be used in a variety of contexts and > adjacent to a variety of text. The bidi isolate characters were designed > *precisely* to address this kind of need, without forcing people to jump > through hoops. > > The *only* question we have is whether the major platforms/systems that > use CLDR are all up to speed in terms of supporting the "new" (2013) > characters in their BIDI algorithms: > > U+2066 LEFT-TO-RIGHT ISOLATE > U+2067 RIGHT-TO-LEFT ISOLATE > U+2068 FIRST STRONG ISOLATE > U+2069 POP DIRECTIONAL ISOLATE > > Of course, anyone who is using the number formats in a richer format (like > HTML) is free to remap characters to markup when processing. That's their > choice. > > Mark > > Mark > > On Fri, Apr 29, 2016 at 6:59 AM, Asmus Freytag (c) > wrote: > >> On 4/28/2016 9:37 PM, Steven Loomis wrote: >> >>> Asmus: >>> >>> Given the correct choice of internal format for the database, >>>> >>> >>> The internal format is a Unicode String, specifically, UTF-8. >>> >> That covers a lot of ground. >> >>> >>> Given that CLDR data should be specifying the desired appearance >>>> >>> But CLDR is text, specifically, XML, and not glyphs? >>> >> >> Sorry, I meant that CLDR should be specified in a way that the user >> expected "visual ordering" can be determined., not "appearance" as in >> "glyphs". >> >> Just to sidestep a potential misunderstanding: I'm not suggesting that >> the format be in visual order. Just that there are some assumptions made >> about the context in which the Unicode string (when bidi processed) will >> result in the correct visual appearance. >> >> For example, if you assume that a string as stored displays correct when >> it is part of a RTL paragraph, then you should be able to compute what you >> need to do to get the correct visual order when the text is part of an LTR >> paragraph, part of an isolated embedding, etc. >> >> I haven't looked into the actualities, but I know that while you can >> convert uniquely between some formats in a given direction, there are some >> conversions (or directions) that are not unique. So the challenge would be >> for the database to find some format that allows conversions to all the >> bidi contexts (and capabilities) that are typically encountered. >> >> Storing things in visual order is a bad idea, because in the general >> case, conversion to logical order is not unique. >> >> But, instead of picking some "random" logical order (based on an >> assumption of what "might" be most needed) my suggestion is to carefully >> pick a "universal" format for the string, one that allows mechanical >> conversion to all the actual formats that people need, based on what >> environment they want to embed their strings into, and what sorts of >> embedding / isolation controls are actually supported. >> >> A./ >> >> >> >>> Steven >>> >>> El 4/28/16 7:30 PM, "CLDR-Users en nombre de Asmus Freytag (c)" < >>> cldr-users-bounces at unicode.org en nombre de asmusf at ix.netcom.com> >>> escribi?: >>> >>> On 4/28/2016 3:44 PM, Peter Edberg wrote: >>>> >>>>> Dear CLDR users, >>>>> >>>> Peter, >>>> >>>> I think this is where a "one size fits all" solution isn't the answer. >>>> >>>> Ideally, I'll be able to use CLDR (and formatting tools depending on it) >>>> to format date/time/number strings for a variety of consumers. >>>> >>>> Plain text (pre 6.3), Plain text with isolates support, and plain text >>>> for embedding into markup (where I'll supply external markup to isolate >>>> and otherwise prep the field). >>>> >>>> Given that CLDR data should be specifying the desired appearance (not >>>> the bidi controls necessary to get to that) it should be possible to >>>> provide mechanical conversion between these formats, rather than having >>>> to make a single choice for the data base. >>>> >>>> Not only will "pre 6.3" support be an issue for a long time to come, I >>>> am confidently predicting that the need for multiple bidi flavors will >>>> continue beyond the adoption of the isolates. Whether a string is part >>>> of an (arbitrary) plain text stream or a separate data field (with its >>>> scope determined by markup and with it's own bidi styling) will continue >>>> to call for somewhat different data. >>>> >>>> Given the correct choice of internal format for the database, it should >>>> be possible to provide all of these flavors mechanically, thus avoiding >>>> the full cost of duplication, while freeing users from having to make >>>> those format translations themselves. >>>> >>>> A./ >>>> _______________________________________________ >>>> CLDR-Users mailing list >>>> CLDR-Users at unicode.org >>>> http://unicode.org/mailman/listinfo/cldr-users >>>> >>> >>> _______________________________________________ >>> CLDR-Users mailing list >>> CLDR-Users at unicode.org >>> http://unicode.org/mailman/listinfo/cldr-users >>> >> >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Fri Apr 29 03:01:07 2016 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Fri, 29 Apr 2016 10:01:07 +0200 Subject: Use of Unicode 6.3 bidi format chars in CLDR number formats? In-Reply-To: References: <91EC00B1-E0CC-48CF-B44C-D48C9BDC92FC@apple.com> <5c98059b-47c3-a62f-11a1-4370edc526e2@ix.netcom.com> Message-ID: I think the plan was to automatically enter those, so translators wouldn't need worry about that. (We can have a mechanism to transform any XML field for display in the ST, and transform what the user writes for storage in XML. Mark On Fri, Apr 29, 2016 at 9:56 AM, Philippe Verdy wrote: > Yes but this is unnecessarily complex to edit in surveys, even if the XML > or JSON exports are inserting these characters themselves, and even if > libaries using the data may detect those characters (when they are properly > paired, but not possible for RLM and LRM and overly complex for LRO/PDF and > RLO/PDF) and replace them by markup or style (possible for LRI/PDI, RLI/PDI > and FSI/PDI which is probably the best mapping in HTML for the "bdi" > element without dir="ltr/rtl"). > Do you expect that the survey will allow entering those controls easily? > Can't there be helpers ? > > 2016-04-29 9:24 GMT+02:00 Mark Davis ?? : > >> The number and currency formats can be used in a variety of contexts and >> adjacent to a variety of text. The bidi isolate characters were designed >> *precisely* to address this kind of need, without forcing people to jump >> through hoops. >> >> The *only* question we have is whether the major platforms/systems that >> use CLDR are all up to speed in terms of supporting the "new" (2013) >> characters in their BIDI algorithms: >> >> U+2066 LEFT-TO-RIGHT ISOLATE >> U+2067 RIGHT-TO-LEFT ISOLATE >> U+2068 FIRST STRONG ISOLATE >> U+2069 POP DIRECTIONAL ISOLATE >> >> Of course, anyone who is using the number formats in a richer format >> (like HTML) is free to remap characters to markup when processing. That's >> their choice. >> >> Mark >> >> Mark >> >> On Fri, Apr 29, 2016 at 6:59 AM, Asmus Freytag (c) >> wrote: >> >>> On 4/28/2016 9:37 PM, Steven Loomis wrote: >>> >>>> Asmus: >>>> >>>> Given the correct choice of internal format for the database, >>>>> >>>> >>>> The internal format is a Unicode String, specifically, UTF-8. >>>> >>> That covers a lot of ground. >>> >>>> >>>> Given that CLDR data should be specifying the desired appearance >>>>> >>>> But CLDR is text, specifically, XML, and not glyphs? >>>> >>> >>> Sorry, I meant that CLDR should be specified in a way that the user >>> expected "visual ordering" can be determined., not "appearance" as in >>> "glyphs". >>> >>> Just to sidestep a potential misunderstanding: I'm not suggesting that >>> the format be in visual order. Just that there are some assumptions made >>> about the context in which the Unicode string (when bidi processed) will >>> result in the correct visual appearance. >>> >>> For example, if you assume that a string as stored displays correct when >>> it is part of a RTL paragraph, then you should be able to compute what you >>> need to do to get the correct visual order when the text is part of an LTR >>> paragraph, part of an isolated embedding, etc. >>> >>> I haven't looked into the actualities, but I know that while you can >>> convert uniquely between some formats in a given direction, there are some >>> conversions (or directions) that are not unique. So the challenge would be >>> for the database to find some format that allows conversions to all the >>> bidi contexts (and capabilities) that are typically encountered. >>> >>> Storing things in visual order is a bad idea, because in the general >>> case, conversion to logical order is not unique. >>> >>> But, instead of picking some "random" logical order (based on an >>> assumption of what "might" be most needed) my suggestion is to carefully >>> pick a "universal" format for the string, one that allows mechanical >>> conversion to all the actual formats that people need, based on what >>> environment they want to embed their strings into, and what sorts of >>> embedding / isolation controls are actually supported. >>> >>> A./ >>> >>> >>> >>>> Steven >>>> >>>> El 4/28/16 7:30 PM, "CLDR-Users en nombre de Asmus Freytag (c)" < >>>> cldr-users-bounces at unicode.org en nombre de asmusf at ix.netcom.com> >>>> escribi?: >>>> >>>> On 4/28/2016 3:44 PM, Peter Edberg wrote: >>>>> >>>>>> Dear CLDR users, >>>>>> >>>>> Peter, >>>>> >>>>> I think this is where a "one size fits all" solution isn't the answer. >>>>> >>>>> Ideally, I'll be able to use CLDR (and formatting tools depending on >>>>> it) >>>>> to format date/time/number strings for a variety of consumers. >>>>> >>>>> Plain text (pre 6.3), Plain text with isolates support, and plain text >>>>> for embedding into markup (where I'll supply external markup to isolate >>>>> and otherwise prep the field). >>>>> >>>>> Given that CLDR data should be specifying the desired appearance (not >>>>> the bidi controls necessary to get to that) it should be possible to >>>>> provide mechanical conversion between these formats, rather than having >>>>> to make a single choice for the data base. >>>>> >>>>> Not only will "pre 6.3" support be an issue for a long time to come, I >>>>> am confidently predicting that the need for multiple bidi flavors will >>>>> continue beyond the adoption of the isolates. Whether a string is part >>>>> of an (arbitrary) plain text stream or a separate data field (with its >>>>> scope determined by markup and with it's own bidi styling) will >>>>> continue >>>>> to call for somewhat different data. >>>>> >>>>> Given the correct choice of internal format for the database, it should >>>>> be possible to provide all of these flavors mechanically, thus avoiding >>>>> the full cost of duplication, while freeing users from having to make >>>>> those format translations themselves. >>>>> >>>>> A./ >>>>> _______________________________________________ >>>>> CLDR-Users mailing list >>>>> CLDR-Users at unicode.org >>>>> http://unicode.org/mailman/listinfo/cldr-users >>>>> >>>> >>>> _______________________________________________ >>>> CLDR-Users mailing list >>>> CLDR-Users at unicode.org >>>> http://unicode.org/mailman/listinfo/cldr-users >>>> >>> >>> >>> _______________________________________________ >>> CLDR-Users mailing list >>> CLDR-Users at unicode.org >>> http://unicode.org/mailman/listinfo/cldr-users >>> >> >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Fri Apr 29 03:01:33 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Fri, 29 Apr 2016 10:01:33 +0200 Subject: Use of Unicode 6.3 bidi format chars in CLDR number formats? In-Reply-To: References: <91EC00B1-E0CC-48CF-B44C-D48C9BDC92FC@apple.com> <5c98059b-47c3-a62f-11a1-4370edc526e2@ix.netcom.com> Message-ID: Also apperently Mozilla browsers still have issues with those characters (as well as with the CSS "isolate", still not supported, except in recent versions but only with "-moz-" prefixes). On Android and Chrome, the "-webkit-" is no longer necessary in recent versions for CSS, however I don't think many versions still support isolates in characters and Bidi processing. For now most softwares still recognize only embedding, overrides, and 1-control markers (and browers are still mapping "bdi" elements only as embedding, not as isolates). Inserting directly RLI/LRI/FSI or PDI will just produce ignored characters, without even the minimum remppaing to the embedding style. 2016-04-29 9:56 GMT+02:00 Philippe Verdy : > Yes but this is unnecessarily complex to edit in surveys, even if the XML > or JSON exports are inserting these characters themselves, and even if > libaries using the data may detect those characters (when they are properly > paired, but not possible for RLM and LRM and overly complex for LRO/PDF and > RLO/PDF) and replace them by markup or style (possible for LRI/PDI, RLI/PDI > and FSI/PDI which is probably the best mapping in HTML for the "bdi" > element without dir="ltr/rtl"). > Do you expect that the survey will allow entering those controls easily? > Can't there be helpers ? > > 2016-04-29 9:24 GMT+02:00 Mark Davis ?? : > >> The number and currency formats can be used in a variety of contexts and >> adjacent to a variety of text. The bidi isolate characters were designed >> *precisely* to address this kind of need, without forcing people to jump >> through hoops. >> >> The *only* question we have is whether the major platforms/systems that >> use CLDR are all up to speed in terms of supporting the "new" (2013) >> characters in their BIDI algorithms: >> >> U+2066 LEFT-TO-RIGHT ISOLATE >> U+2067 RIGHT-TO-LEFT ISOLATE >> U+2068 FIRST STRONG ISOLATE >> U+2069 POP DIRECTIONAL ISOLATE >> >> Of course, anyone who is using the number formats in a richer format >> (like HTML) is free to remap characters to markup when processing. That's >> their choice. >> >> Mark >> >> Mark >> >> On Fri, Apr 29, 2016 at 6:59 AM, Asmus Freytag (c) >> wrote: >> >>> On 4/28/2016 9:37 PM, Steven Loomis wrote: >>> >>>> Asmus: >>>> >>>> Given the correct choice of internal format for the database, >>>>> >>>> >>>> The internal format is a Unicode String, specifically, UTF-8. >>>> >>> That covers a lot of ground. >>> >>>> >>>> Given that CLDR data should be specifying the desired appearance >>>>> >>>> But CLDR is text, specifically, XML, and not glyphs? >>>> >>> >>> Sorry, I meant that CLDR should be specified in a way that the user >>> expected "visual ordering" can be determined., not "appearance" as in >>> "glyphs". >>> >>> Just to sidestep a potential misunderstanding: I'm not suggesting that >>> the format be in visual order. Just that there are some assumptions made >>> about the context in which the Unicode string (when bidi processed) will >>> result in the correct visual appearance. >>> >>> For example, if you assume that a string as stored displays correct when >>> it is part of a RTL paragraph, then you should be able to compute what you >>> need to do to get the correct visual order when the text is part of an LTR >>> paragraph, part of an isolated embedding, etc. >>> >>> I haven't looked into the actualities, but I know that while you can >>> convert uniquely between some formats in a given direction, there are some >>> conversions (or directions) that are not unique. So the challenge would be >>> for the database to find some format that allows conversions to all the >>> bidi contexts (and capabilities) that are typically encountered. >>> >>> Storing things in visual order is a bad idea, because in the general >>> case, conversion to logical order is not unique. >>> >>> But, instead of picking some "random" logical order (based on an >>> assumption of what "might" be most needed) my suggestion is to carefully >>> pick a "universal" format for the string, one that allows mechanical >>> conversion to all the actual formats that people need, based on what >>> environment they want to embed their strings into, and what sorts of >>> embedding / isolation controls are actually supported. >>> >>> A./ >>> >>> >>> >>>> Steven >>>> >>>> El 4/28/16 7:30 PM, "CLDR-Users en nombre de Asmus Freytag (c)" < >>>> cldr-users-bounces at unicode.org en nombre de asmusf at ix.netcom.com> >>>> escribi?: >>>> >>>> On 4/28/2016 3:44 PM, Peter Edberg wrote: >>>>> >>>>>> Dear CLDR users, >>>>>> >>>>> Peter, >>>>> >>>>> I think this is where a "one size fits all" solution isn't the answer. >>>>> >>>>> Ideally, I'll be able to use CLDR (and formatting tools depending on >>>>> it) >>>>> to format date/time/number strings for a variety of consumers. >>>>> >>>>> Plain text (pre 6.3), Plain text with isolates support, and plain text >>>>> for embedding into markup (where I'll supply external markup to isolate >>>>> and otherwise prep the field). >>>>> >>>>> Given that CLDR data should be specifying the desired appearance (not >>>>> the bidi controls necessary to get to that) it should be possible to >>>>> provide mechanical conversion between these formats, rather than having >>>>> to make a single choice for the data base. >>>>> >>>>> Not only will "pre 6.3" support be an issue for a long time to come, I >>>>> am confidently predicting that the need for multiple bidi flavors will >>>>> continue beyond the adoption of the isolates. Whether a string is part >>>>> of an (arbitrary) plain text stream or a separate data field (with its >>>>> scope determined by markup and with it's own bidi styling) will >>>>> continue >>>>> to call for somewhat different data. >>>>> >>>>> Given the correct choice of internal format for the database, it should >>>>> be possible to provide all of these flavors mechanically, thus avoiding >>>>> the full cost of duplication, while freeing users from having to make >>>>> those format translations themselves. >>>>> >>>>> A./ >>>>> _______________________________________________ >>>>> CLDR-Users mailing list >>>>> CLDR-Users at unicode.org >>>>> http://unicode.org/mailman/listinfo/cldr-users >>>>> >>>> >>>> _______________________________________________ >>>> CLDR-Users mailing list >>>> CLDR-Users at unicode.org >>>> http://unicode.org/mailman/listinfo/cldr-users >>>> >>> >>> >>> _______________________________________________ >>> CLDR-Users mailing list >>> CLDR-Users at unicode.org >>> http://unicode.org/mailman/listinfo/cldr-users >>> >> >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: