From verdy_p at wanadoo.fr Sun Mar 1 08:10:59 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sun, 1 Mar 2015 15:10:59 +0100 Subject: Why $ appears on the left side of value? (was: Re: BIDI percentage sign) In-Reply-To: References: Message-ID: 2015-02-27 22:29 GMT+01:00 Shervin Afshar : > As far as common knowledge goes[1], this is purely a matter of convention. > > But in some banking contexts, I've seen currency values written as follwos: > > $200 USD... > $200 CAD > $200 AUD > 200$00 CVE > ?200 EUR > ?200 ILS > And just in English... This inversion is not used in most other languages. So why could'nt that be also a question of convention for noting percents (or perthousands) ? The notation of numbers with numeric figures does not abey the same rules as the natural language or its script. It is already the case for Arabic numbers (whose digits are ordered left to right, with the left-most digits being also pronounced first where numbers are spelled orally). The percent symbol is also part of the numeric notation and I don't see why it would not be ordered exactly like the digits, i.e. to the right, even if it pronounced after the number. If one wants to write it the way it is pronounced, the "%" symbol should not be used but the plain Arabic word. Some writers may not follow this convention and will want to order the symbol as if it was a natural word *detached* from the numeric figures. Note that the percent symbol is normally also attached typographically to the numeric figures. If there's spacing, it is **not-breaking** (in French, the recommanded spacing between the numeric figures and the percent symbol is a non-breaking thin space, "espace fine ins?cable", best represented by NNBSP in Unicode : U+203F, though many sites still use NBSP=U+00A0, even if it is too large) or, by a "lazy" lame and non recommended way, no spacing at all, which is only acceptable for use in very compact tables with many data columns in order to fit the page without reducing font sizes (it will be prefered before suppressing number group separators). > My take on this is that, here redundancy is used to avoid ambiguity. > > [1]: > http://english.stackexchange.com/questions/11326/what-is-the-difference-between-20-and-20 > > ? Shervin > > On Fri, Feb 27, 2015 at 10:51 AM, Philippe Verdy > wrote: > >> And why Mareicans are putting the currency unit symbol to the right ? It >> is still read *after* the amount... >> The only readon I see is to avoid adding an initial digit when the amount >> is writen over a blank space. You can't add a digit after only because you >> also add the decimal separator and subunits, or because you write these >> subunits with a small fraction, or in superscript.. My feeling is that this >> is a purely typographical tradition and it ia not related to the way you >> read it loud. >> For othe measurement units, the unit symbol is placed after the number, >> not before. This has nothing to do with the Bidi ordering : that symbol >> preserves its existing ordering even if you place it after or before by the >> choice of the redactor and his perception of traditions. Number figures use >> a different system than the rest of the text. >> >> 2015-02-27 19:14 GMT+01:00 James Lin : >> >>> Hi >>> I looked through the Unicode standard Annex #9 and unable to find out if >>> percentage sign "%" should reside on the LEFT of the numeric character or >>> RIGHT? >>> >>> My understanding is if the numeric is in Latin or Western Arabic number, >>> 1 2 3 4 5 6 7 8 9 0, "%" sign should be on the RIGHT: 12%, 54%; For >>> Eastern Arabic, "%" sign should be on the LEFT: %??? >>> >>> Is this correct? >>> >>> Thank you >>> -James >>> >>> _______________________________________________ >>> CLDR-Users mailing list >>> CLDR-Users at unicode.org >>> http://unicode.org/mailman/listinfo/cldr-users >>> >> >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eik at iki.fi Sun Mar 1 08:52:17 2015 From: eik at iki.fi (Erkki I Kolehmainen) Date: Sun, 1 Mar 2015 16:52:17 +0200 Subject: VS: Why $ appears on the left side of value? (was: Re: BIDI percentage sign) In-Reply-To: References: Message-ID: <000001d0542f$50184bb0$f048e310$@fi> Philippe. Without getting deeper into this, the floating currency sign has been used in many countries and languages, e.g. for Dutch guilders in the Netherlands. In some countries, e.g., Portugal, the currency symbol has been (and in some countries still is being) used as a decimal separator. To me this is natural (as is the use of Fahrenheit-degrees, inches, feet and miles where appropriate); in fact I still prefer to use calories as opposed to joules. Erkki I. Kolehmainen L?hett?j?: CLDR-Users [mailto:cldr-users-bounces at unicode.org] Puolesta Philippe Verdy L?hetetty: 1. maaliskuuta 2015 16:11 Vastaanottaja: Shervin Afshar Kopio: cldr-users at unicode.org Aihe: Re: Why $ appears on the left side of value? (was: Re: BIDI percentage sign) 2015-02-27 22:29 GMT+01:00 Shervin Afshar : As far as common knowledge goes[1], this is purely a matter of convention. But in some banking contexts, I've seen currency values written as follwos: $200 USD... $200 CAD $200 AUD 200$00 CVE ?200 EUR ?200 ILS And just in English... This inversion is not used in most other languages. So why could'nt that be also a question of convention for noting percents (or perthousands) ? The notation of numbers with numeric figures does not abey the same rules as the natural language or its script. It is already the case for Arabic numbers (whose digits are ordered left to right, with the left-most digits being also pronounced first where numbers are spelled orally). The percent symbol is also part of the numeric notation and I don't see why it would not be ordered exactly like the digits, i.e. to the right, even if it pronounced after the number. If one wants to write it the way it is pronounced, the "%" symbol should not be used but the plain Arabic word. Some writers may not follow this convention and will want to order the symbol as if it was a natural word *detached* from the numeric figures. Note that the percent symbol is normally also attached typographically to the numeric figures. If there's spacing, it is **not-breaking** (in French, the recommanded spacing between the numeric figures and the percent symbol is a non-breaking thin space, "espace fine ins?cable", best represented by NNBSP in Unicode : U+203F, though many sites still use NBSP=U+00A0, even if it is too large) or, by a "lazy" lame and non recommended way, no spacing at all, which is only acceptable for use in very compact tables with many data columns in order to fit the page without reducing font sizes (it will be prefered before suppressing number group separators). My take on this is that, here redundancy is used to avoid ambiguity. [1]: http://english.stackexchange.com/questions/11326/what-is-the-difference-between-20-and-20 ? Shervin On Fri, Feb 27, 2015 at 10:51 AM, Philippe Verdy wrote: And why Mareicans are putting the currency unit symbol to the right ? It is still read *after* the amount... The only readon I see is to avoid adding an initial digit when the amount is writen over a blank space. You can't add a digit after only because you also add the decimal separator and subunits, or because you write these subunits with a small fraction, or in superscript.. My feeling is that this is a purely typographical tradition and it ia not related to the way you read it loud. For othe measurement units, the unit symbol is placed after the number, not before. This has nothing to do with the Bidi ordering : that symbol preserves its existing ordering even if you place it after or before by the choice of the redactor and his perception of traditions. Number figures use a different system than the rest of the text. 2015-02-27 19:14 GMT+01:00 James Lin : Hi I looked through the Unicode standard Annex #9 and unable to find out if percentage sign "%" should reside on the LEFT of the numeric character or RIGHT? My understanding is if the numeric is in Latin or Western Arabic number, 1 2 3 4 5 6 7 8 9 0, "%" sign should be on the RIGHT: 12%, 54%; For Eastern Arabic, "%" sign should be on the LEFT: %??? Is this correct? Thank you -James _______________________________________________ CLDR-Users mailing list CLDR-Users at unicode.org http://unicode.org/mailman/listinfo/cldr-users _______________________________________________ CLDR-Users mailing list CLDR-Users at unicode.org http://unicode.org/mailman/listinfo/cldr-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Sun Mar 1 20:37:34 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Mon, 2 Mar 2015 03:37:34 +0100 Subject: Why $ appears on the left side of value? (was: Re: BIDI percentage sign) In-Reply-To: <000001d0542f$50184bb0$f048e310$@fi> References: <000001d0542f$50184bb0$f048e310$@fi> Message-ID: This has been true as well in France, with the franc (using the F letter as the symbol), but it is wtill true as well with the Euro. This was always an informal usage. For formal usages, the symbol has always been after the numeric figures, and the standard decimal separator (comma) always used, with or without the separator (thin non-breaking space) for grouping digits. 2015-03-01 15:52 GMT+01:00 Erkki I Kolehmainen : > Philippe. > > > > Without getting deeper into this, the floating currency sign has been used > in many countries and languages, e.g. for Dutch guilders in the > Netherlands. In some countries, e.g., Portugal, the currency symbol has > been (and in some countries still is being) used as a decimal separator. To > me this is natural (as is the use of Fahrenheit-degrees, inches, feet and > miles where appropriate); in fact I still prefer to use calories as opposed > to joules. > > > > Erkki I. Kolehmainen > > > > *L?hett?j?:* CLDR-Users [mailto:cldr-users-bounces at unicode.org] *Puolesta > *Philippe Verdy > *L?hetetty:* 1. maaliskuuta 2015 16:11 > *Vastaanottaja:* Shervin Afshar > *Kopio:* cldr-users at unicode.org > *Aihe:* Re: Why $ appears on the left side of value? (was: Re: BIDI > percentage sign) > > > > 2015-02-27 22:29 GMT+01:00 Shervin Afshar : > > As far as common knowledge goes[1], this is purely a matter of convention. > > > > But in some banking contexts, I've seen currency values written as follwos: > > > > $200 USD... > > $200 CAD > > $200 AUD > > 200$00 CVE > > ?200 EUR > > ?200 ILS > > > > And just in English... This inversion is not used in most other languages. > > > > So why could'nt that be also a question of convention for noting percents > (or perthousands) ? > > > > The notation of numbers with numeric figures does not abey the same rules > as the natural language or its script. It is already the case for Arabic > numbers (whose digits are ordered left to right, with the left-most digits > being also pronounced first where numbers are spelled orally). > > > > The percent symbol is also part of the numeric notation and I don't see > why it would not be ordered exactly like the digits, i.e. to the right, > even if it pronounced after the number. If one wants to write it the way it > is pronounced, the "%" symbol should not be used but the plain Arabic word. > Some writers may not follow this convention and will want to order the > symbol as if it was a natural word *detached* from the numeric figures. > > > > Note that the percent symbol is normally also attached typographically to > the numeric figures. > > > > If there's spacing, it is **not-breaking** (in French, the recommanded > spacing between the numeric figures and the percent symbol is a > non-breaking thin space, "espace fine ins?cable", best represented by NNBSP > in Unicode : U+203F, though many sites still use NBSP=U+00A0, even if it is > too large) or, by a "lazy" lame and non recommended way, no spacing at all, > which is only acceptable for use in very compact tables with many data > columns in order to fit the page without reducing font sizes (it will be > prefered before suppressing number group separators). > > > > > > My take on this is that, here redundancy is used to avoid ambiguity. > > > > [1]: > http://english.stackexchange.com/questions/11326/what-is-the-difference-between-20-and-20 > > > > ? Shervin > > > > On Fri, Feb 27, 2015 at 10:51 AM, Philippe Verdy > wrote: > > And why Mareicans are putting the currency unit symbol to the right ? It > is still read *after* the amount... > > The only readon I see is to avoid adding an initial digit when the amount > is writen over a blank space. You can't add a digit after only because you > also add the decimal separator and subunits, or because you write these > subunits with a small fraction, or in superscript.. My feeling is that this > is a purely typographical tradition and it ia not related to the way you > read it loud. > > For othe measurement units, the unit symbol is placed after the number, > not before. This has nothing to do with the Bidi ordering : that symbol > preserves its existing ordering even if you place it after or before by the > choice of the redactor and his perception of traditions. Number figures use > a different system than the rest of the text. > > > > 2015-02-27 19:14 GMT+01:00 James Lin : > > Hi > I looked through the Unicode standard Annex #9 and unable to find out if > percentage sign "%" should reside on the LEFT of the numeric character or > RIGHT? > > My understanding is if the numeric is in Latin or Western Arabic number, 1 > 2 3 4 5 6 7 8 9 0, "%" sign should be on the RIGHT: 12%, 54%; For Eastern > Arabic, "%" sign should be on the LEFT: %??? > > Is this correct? > > Thank you > -James > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zeppieri at gmail.com Tue Mar 10 23:27:36 2015 From: zeppieri at gmail.com (Jon Zeppieri) Date: Wed, 11 Mar 2015 00:27:36 -0400 Subject: Time zones: the localized GMT formats Message-ID: It's not clear to me how to determine the concrete format to use for the localized GMT time zone formats. The description of these formats begins: === Localized GMT format: A constant, specific offset from GMT (or UTC), which may be in a translated form. There are two styles for this. The first is used when there is an explicit non-zero offset from GMT; this style is specified by the element and element. The long format always uses 2-digit hours field and minutes field, with optional 2-digit seconds field. The short format is intended for the shortest representation and uses hour fields without leading zero, with optional 2-digit minutes and seconds fields. The digits used for hours, minutes and seconds fields in this format are the locale's default decimal digits: === While this is suggestive, it leaves a lot unsaid. The for "en" is "+HH:mm;-HH:mm". This looks appropriate to use for the long localized format -- except that it's not obvious what to do with the "optional 2-digit seconds field" mentioned above. More troubling, however, is that in the general case, I don't know how to generate the short form using this data. While it's easy to strip out the "mm" portion of this when it's not needed, I don't know, in general, how to deal with separators or possibly literal portions of this pattern that should be removed along with the "mm." For example, in this particular case, I know that I'd have to remove the colon before the minute pattern, but I imagine that a locale could use something like: "HH 'hours', mm 'minutes'," and I would not know to remove the entirety of ", mm 'minutes'" without treating it as a special case. Is there a way to generate the short format for an arbitrary locale? Also, how are optional seconds supposed to be handled? - Jon From zeppieri at gmail.com Thu Mar 12 19:22:15 2015 From: zeppieri at gmail.com (Jon Zeppieri) Date: Thu, 12 Mar 2015 20:22:15 -0400 Subject: Time zones: the localized GMT formats In-Reply-To: References: Message-ID: Is there a better place to direct questions like this? -J On Wed, Mar 11, 2015 at 12:27 AM, Jon Zeppieri wrote: > It's not clear to me how to determine the concrete format to use for > the localized GMT time zone formats. The description of these formats > begins: > === > Localized GMT format: A constant, specific offset from GMT (or UTC), > which may be in a translated form. There are two styles for this. The > first is used when there is an explicit non-zero offset from GMT; this > style is specified by the element and > element. The long format always uses 2-digit hours field and minutes > field, with optional 2-digit seconds field. The short format is > intended for the shortest representation and uses hour fields without > leading zero, with optional 2-digit minutes and seconds fields. The > digits used for hours, minutes and seconds fields in this format are > the locale's default decimal digits: > === > > While this is suggestive, it leaves a lot unsaid. The for > "en" is "+HH:mm;-HH:mm". This looks appropriate to use for the long > localized format -- except that it's not obvious what to do with the > "optional 2-digit seconds field" mentioned above. > > More troubling, however, is that in the general case, I don't know how > to generate the short form using this data. While it's easy to strip > out the "mm" portion of this when it's not needed, I don't know, in > general, how to deal with separators or possibly literal portions of > this pattern that should be removed along with the "mm." For example, > in this particular case, I know that I'd have to remove the colon > before the minute pattern, but I imagine that a locale could use > something like: "HH 'hours', mm 'minutes'," and I would not know to > remove the entirety of ", mm 'minutes'" without treating it as a > special case. > > Is there a way to generate the short format for an arbitrary locale? > Also, how are optional seconds supposed to be handled? > > - Jon From srloomis at us.ibm.com Thu Mar 12 19:36:05 2015 From: srloomis at us.ibm.com (Steven R Loomis) Date: Thu, 12 Mar 2015 17:36:05 -0700 Subject: Time zones: the localized GMT formats In-Reply-To: References: Message-ID: This is a great place, but not a great time. All of the CLDR team is finishing up a release, please be patient. From: Jon Zeppieri To: cldr-users at unicode.org Date: 03/12/2015 05:25 PM Subject: Re: Time zones: the localized GMT formats Sent by: "CLDR-Users" Is there a better place to direct questions like this? -J On Wed, Mar 11, 2015 at 12:27 AM, Jon Zeppieri wrote: > It's not clear to me how to determine the concrete format to use for > the localized GMT time zone formats. The description of these formats > begins: > === > Localized GMT format: A constant, specific offset from GMT (or UTC), > which may be in a translated form. There are two styles for this. The > first is used when there is an explicit non-zero offset from GMT; this > style is specified by the element and > element. The long format always uses 2-digit hours field and minutes > field, with optional 2-digit seconds field. The short format is > intended for the shortest representation and uses hour fields without > leading zero, with optional 2-digit minutes and seconds fields. The > digits used for hours, minutes and seconds fields in this format are > the locale's default decimal digits: > === > > While this is suggestive, it leaves a lot unsaid. The for > "en" is "+HH:mm;-HH:mm". This looks appropriate to use for the long > localized format -- except that it's not obvious what to do with the > "optional 2-digit seconds field" mentioned above. > > More troubling, however, is that in the general case, I don't know how > to generate the short form using this data. While it's easy to strip > out the "mm" portion of this when it's not needed, I don't know, in > general, how to deal with separators or possibly literal portions of > this pattern that should be removed along with the "mm." For example, > in this particular case, I know that I'd have to remove the colon > before the minute pattern, but I imagine that a locale could use > something like: "HH 'hours', mm 'minutes'," and I would not know to > remove the entirety of ", mm 'minutes'" without treating it as a > special case. > > Is there a way to generate the short format for an arbitrary locale? > Also, how are optional seconds supposed to be handled? > > - Jon _______________________________________________ CLDR-Users mailing list CLDR-Users at unicode.org http://unicode.org/mailman/listinfo/cldr-users -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From zeppieri at gmail.com Thu Mar 12 19:44:24 2015 From: zeppieri at gmail.com (Jon Zeppieri) Date: Thu, 12 Mar 2015 20:44:24 -0400 Subject: Time zones: the localized GMT formats In-Reply-To: References: Message-ID: On Thu, Mar 12, 2015 at 8:36 PM, Steven R Loomis wrote: > This is a great place, but not a great time. All of the CLDR team is > finishing up a release, please be patient. > > > Got it -- thanks for the reply. -J -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Fri Mar 13 09:52:38 2015 From: doug at ewellic.org (Doug Ewell) Date: Fri, 13 Mar 2015 07:52:38 -0700 Subject: Time zones: the localized GMT formats Message-ID: <20150313075238.665a7a7059d7ee80bb4d670165c8327d.a24dafca3a.wbe@email03.secureserver.net> Jon Zeppieri wrote: > More troubling, however, is that in the general case, I don't know how > to generate the short form using this data. While it's easy to strip > out the "mm" portion of this when it's not needed, I don't know, in > general, how to deal with separators or possibly literal portions of > this pattern that should be removed along with the "mm." For example, > in this particular case, I know that I'd have to remove the colon > before the minute pattern, but I imagine that a locale could use > something like: "HH 'hours', mm 'minutes'," and I would not know to > remove the entirety of ", mm 'minutes'" without treating it as a > special case. This is entirely up to you, but I'm personally having a hard time seeing the value of a "short" time format with just hours and no minutes. Many people would see that as shortened to the point of being unusable. I realize this is orthogonal to your question. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From shervinafshar at gmail.com Fri Mar 13 21:24:38 2015 From: shervinafshar at gmail.com (Shervin Afshar) Date: Fri, 13 Mar 2015 19:24:38 -0700 Subject: Time zones: the localized GMT formats In-Reply-To: <20150313075238.665a7a7059d7ee80bb4d670165c8327d.a24dafca3a.wbe@email03.secureserver.net> References: <20150313075238.665a7a7059d7ee80bb4d670165c8327d.a24dafca3a.wbe@email03.secureserver.net> Message-ID: > > This is entirely up to you, but I'm personally having a hard time seeing > the value of a "short" time format with just hours and no minutes. Not to mention the confusion this would cause with time-zones with fractions of an hour; e.g. Tehran UTC+3:30 != Moscow UTC+3. ? Shervin On Fri, Mar 13, 2015 at 7:52 AM, Doug Ewell wrote: > Jon Zeppieri wrote: > > > More troubling, however, is that in the general case, I don't know how > > to generate the short form using this data. While it's easy to strip > > out the "mm" portion of this when it's not needed, I don't know, in > > general, how to deal with separators or possibly literal portions of > > this pattern that should be removed along with the "mm." For example, > > in this particular case, I know that I'd have to remove the colon > > before the minute pattern, but I imagine that a locale could use > > something like: "HH 'hours', mm 'minutes'," and I would not know to > > remove the entirety of ", mm 'minutes'" without treating it as a > > special case. > > This is entirely up to you, but I'm personally having a hard time seeing > the value of a "short" time format with just hours and no minutes. Many > people would see that as shortened to the point of being unusable. > > I realize this is orthogonal to your question. > > -- > Doug Ewell | http://ewellic.org | Thornton, CO ???? > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Sat Mar 14 23:09:44 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sun, 15 Mar 2015 05:09:44 +0100 Subject: Time zones: the localized GMT formats In-Reply-To: References: <20150313075238.665a7a7059d7ee80bb4d670165c8327d.a24dafca3a.wbe@email03.secureserver.net> Message-ID: I suppose that the "short" form will differentiate from the non short form, only by stripping zeroes So "UTC+3" is the short form of "UTC+03:00", and the "short" form for "UTC+03:30" is ONLY "UTC+3:30". (the extra comment about the possibility of precision in seconds seems not needed for standard timezones, it is unnecessarily verbose : the normal form uses BOTH the hours AND minutes, preferably in fixed format with extra zeroes and the appropriate localized separator between hours and minutes) 2015-03-14 3:24 GMT+01:00 Shervin Afshar : > This is entirely up to you, but I'm personally having a hard time seeing >> the value of a "short" time format with just hours and no minutes. > > > Not to mention the confusion this would cause with time-zones with > fractions of an hour; e.g. Tehran UTC+3:30 != Moscow UTC+3. > > ? Shervin > > On Fri, Mar 13, 2015 at 7:52 AM, Doug Ewell wrote: > >> Jon Zeppieri wrote: >> >> > More troubling, however, is that in the general case, I don't know how >> > to generate the short form using this data. While it's easy to strip >> > out the "mm" portion of this when it's not needed, I don't know, in >> > general, how to deal with separators or possibly literal portions of >> > this pattern that should be removed along with the "mm." For example, >> > in this particular case, I know that I'd have to remove the colon >> > before the minute pattern, but I imagine that a locale could use >> > something like: "HH 'hours', mm 'minutes'," and I would not know to >> > remove the entirety of ", mm 'minutes'" without treating it as a >> > special case. >> >> This is entirely up to you, but I'm personally having a hard time seeing >> the value of a "short" time format with just hours and no minutes. Many >> people would see that as shortened to the point of being unusable. >> >> I realize this is orthogonal to your question. >> >> -- >> Doug Ewell | http://ewellic.org | Thornton, CO ???? >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shervinafshar at gmail.com Sat Mar 14 23:43:31 2015 From: shervinafshar at gmail.com (Shervin Afshar) Date: Sat, 14 Mar 2015 21:43:31 -0700 Subject: Time zones: the localized GMT formats In-Reply-To: References: <20150313075238.665a7a7059d7ee80bb4d670165c8327d.a24dafca3a.wbe@email03.secureserver.net> Message-ID: On Sat, Mar 14, 2015 at 9:09 PM, Philippe Verdy wrote: > I suppose that the "short" form will differentiate from the non short > form, only by stripping zeroes > That's an option to shorten these strings, but there seems to be other requirements according to initial poster: On Tue, Mar 10, 2015 at 9:27 PM, Jon Zeppieri wrote: > The short format is intended for the shortest representation and uses > hour fields without > leading zero, with optional 2-digit minutes and seconds fields. On Sat, Mar 14, 2015 at 9:09 PM, Philippe Verdy wrote: > (the extra comment about the possibility of precision in seconds seems not > needed for standard timezones, it is unnecessarily verbose : the normal > form uses BOTH the hours AND minutes, preferably in fixed format with extra > zeroes and the appropriate localized separator between hours and minutes) > Depends on the usage. Some might require up to seconds precision even if verbose. ? Shervin On Sat, Mar 14, 2015 at 9:09 PM, Philippe Verdy wrote: > I suppose that the "short" form will differentiate from the non short > form, only by stripping zeroes > > So "UTC+3" is the short form of "UTC+03:00", and the "short" form for > "UTC+03:30" is ONLY "UTC+3:30". > > (the extra comment about the possibility of precision in seconds seems not > needed for standard timezones, it is unnecessarily verbose : the normal > form uses BOTH the hours AND minutes, preferably in fixed format with extra > zeroes and the appropriate localized separator between hours and minutes) > > > 2015-03-14 3:24 GMT+01:00 Shervin Afshar : > >> This is entirely up to you, but I'm personally having a hard time seeing >>> the value of a "short" time format with just hours and no minutes. >> >> >> Not to mention the confusion this would cause with time-zones with >> fractions of an hour; e.g. Tehran UTC+3:30 != Moscow UTC+3. >> >> ? Shervin >> >> On Fri, Mar 13, 2015 at 7:52 AM, Doug Ewell wrote: >> >>> Jon Zeppieri wrote: >>> >>> > More troubling, however, is that in the general case, I don't know how >>> > to generate the short form using this data. While it's easy to strip >>> > out the "mm" portion of this when it's not needed, I don't know, in >>> > general, how to deal with separators or possibly literal portions of >>> > this pattern that should be removed along with the "mm." For example, >>> > in this particular case, I know that I'd have to remove the colon >>> > before the minute pattern, but I imagine that a locale could use >>> > something like: "HH 'hours', mm 'minutes'," and I would not know to >>> > remove the entirety of ", mm 'minutes'" without treating it as a >>> > special case. >>> >>> This is entirely up to you, but I'm personally having a hard time seeing >>> the value of a "short" time format with just hours and no minutes. Many >>> people would see that as shortened to the point of being unusable. >>> >>> I realize this is orthogonal to your question. >>> >>> -- >>> Doug Ewell | http://ewellic.org | Thornton, CO ???? >>> >>> _______________________________________________ >>> CLDR-Users mailing list >>> CLDR-Users at unicode.org >>> http://unicode.org/mailman/listinfo/cldr-users >>> >> >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zeppieri at gmail.com Sun Mar 15 00:23:19 2015 From: zeppieri at gmail.com (Jon Zeppieri) Date: Sun, 15 Mar 2015 01:23:19 -0400 Subject: Time zones: the localized GMT formats In-Reply-To: References: <20150313075238.665a7a7059d7ee80bb4d670165c8327d.a24dafca3a.wbe@email03.secureserver.net> Message-ID: On Sun, Mar 15, 2015 at 12:09 AM, Philippe Verdy wrote: > I suppose that the "short" form will differentiate from the non short form, > only by stripping zeroes > Unless the value of is syntactically constrained in ways not mentioned in the documentation, this isn't enough, as my example about possible literal strings in demonstrates. Here's a more realistic example: The pl locale's is "+H.mm;-H.mm". Note that it uses a literal '.' as the time separator, rather than the pattern variable ':'. If you were going to strip out the mm field here, you'd also want to strip out the '.'. But unless you know that '.' represents a separator, rather than some literal portion of the pattern, you really can't. And even the fact that '.' is the for pl doesn't prove that it's being used that way in the pattern. My guess is that *is* syntactically constrained -- that it's not allowed to use the full pattern syntax -- because if that's not true then it seems impossible to implement the short form as specified. So, really, I'm just looking for some confirmation about what can and cannot appear in . -Jon From isaac.jurado at roiback.com Mon Mar 16 07:47:26 2015 From: isaac.jurado at roiback.com (Isaac Jurado) Date: Mon, 16 Mar 2015 13:47:26 +0100 Subject: Significant digits corner cases Message-ID: Hello, I'm adding some fixes to the Babel [1] Python package and I found a number formatting case that I would like to verify. The case in point is when formatting 0.0001 using the "@@@" pattern. In the current version, the result is the string "0.000100", whereas I would had expected something like "0.0001" or even "0". As I'm unable to find an equivalent "official" example [2], I thought someone from this list may provide a quick hint. Thanks you. [1] http://babel.pocoo.org [2] http://unicode.org/reports/tr35/tr35-numbers.html#Number_Format_Patterns -- Isaac Jurado La informaci?n contenida en este mensaje y/o archivo(s) adjunto(s), enviada desde GLOBAL OBI SL, es confidencial/privilegiada y est? destinada a ser le?da s?lo por la(s) persona(s) a la(s) que va dirigida. Le recordamos que sus datos han sido incorporados en un fichero y que siempre y cuando se cumplan los requisitos exigidos por la normativa, podr? ejercer los derechos de acceso, rectificaci?n, cancelaci?n y oposici?n, ante nuestra entidad. Si usted lee este mensaje y no es el destinatario se?alado, el empleado o el agente responsable de entrega el mensaje al destinatario, o ha recibido esta comunicaci?n por error, le informamos que est? totalmente prohibida, y puede ser ilegal, cualquier divulgaci?n, distribuci?n o reproducci?n de esta comunicaci?n, y le rogamos que nos lo notifique inmediatamente y nos devuelva el mensaje original a la direcci?n arriba mencionada. Gracias. From rxaviers at gmail.com Mon Mar 16 07:59:02 2015 From: rxaviers at gmail.com (Rafael Xavier) Date: Mon, 16 Mar 2015 09:59:02 -0300 Subject: Significant digits corner cases In-Reply-To: References: Message-ID: 2015-03-16 9:47 GMT-03:00 Isaac Jurado : > Hello, > > I'm adding some fixes to the Babel [1] Python package and I found a > number formatting case that I would like to verify. > > The case in point is when formatting 0.0001 using the "@@@" pattern. In > the current version, the result is the string "0.000100", whereas I > would had expected something like "0.0001" or even "0". > Why do you expect the latter? "@@@" means minimum significant digits = 3 and maximum significant digits = 3... http://www.unicode.org/reports/tr35/tr35-numbers.html#sigdig > > As I'm unable to find an equivalent "official" example [2], I thought > someone from this list may provide a quick hint. > > Thanks you. > > [1] http://babel.pocoo.org > [2] > http://unicode.org/reports/tr35/tr35-numbers.html#Number_Format_Patterns > > -- > Isaac Jurado > > > La informaci?n contenida en este mensaje y/o archivo(s) adjunto(s), > enviada desde GLOBAL OBI SL, es confidencial/privilegiada y est? > destinada a ser le?da s?lo por la(s) persona(s) a la(s) que va dirigida. > Le recordamos que sus datos han sido incorporados en un fichero y que > siempre y cuando se cumplan los requisitos exigidos por la normativa, > podr? ejercer los derechos de acceso, rectificaci?n, cancelaci?n y > oposici?n, ante nuestra entidad. > > Si usted lee este mensaje y no es el destinatario se?alado, el empleado > o el agente responsable de entrega el mensaje al destinatario, o ha > recibido esta comunicaci?n por error, le informamos que est? totalmente > prohibida, y puede ser ilegal, cualquier divulgaci?n, distribuci?n o > reproducci?n de esta comunicaci?n, y le rogamos que nos lo notifique > inmediatamente y nos devuelva el mensaje original a la direcci?n arriba > mencionada. Gracias. > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -- +55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers http://rafael.xavier.blog.br -------------- next part -------------- An HTML attachment was scrubbed... URL: From isaac.jurado at roiback.com Mon Mar 16 08:02:33 2015 From: isaac.jurado at roiback.com (Isaac Jurado) Date: Mon, 16 Mar 2015 14:02:33 +0100 Subject: Significant digits corner cases In-Reply-To: References: Message-ID: 2015-03-16 13:59 GMT+01:00 Rafael Xavier : > > > 2015-03-16 9:47 GMT-03:00 Isaac Jurado : >> >> Hello, >> >> I'm adding some fixes to the Babel [1] Python package and I found a >> number formatting case that I would like to verify. >> >> The case in point is when formatting 0.0001 using the "@@@" pattern. In >> the current version, the result is the string "0.000100", whereas I >> would had expected something like "0.0001" or even "0". > > Why do you expect the latter? "@@@" means minimum significant digits = 3 and > maximum significant digits = 3... > http://www.unicode.org/reports/tr35/tr35-numbers.html#sigdig Well, as all examples seem to count decimal, leading, zeroes as a significant digits, I understood that "0.0001" already has four, which would be already greater than the maximum specified by the pattern. Hence my confusion. Best regards. -- Isaac Jurado La informaci?n contenida en este mensaje y/o archivo(s) adjunto(s), enviada desde GLOBAL OBI SL, es confidencial/privilegiada y est? destinada a ser le?da s?lo por la(s) persona(s) a la(s) que va dirigida. Le recordamos que sus datos han sido incorporados en un fichero y que siempre y cuando se cumplan los requisitos exigidos por la normativa, podr? ejercer los derechos de acceso, rectificaci?n, cancelaci?n y oposici?n, ante nuestra entidad. Si usted lee este mensaje y no es el destinatario se?alado, el empleado o el agente responsable de entrega el mensaje al destinatario, o ha recibido esta comunicaci?n por error, le informamos que est? totalmente prohibida, y puede ser ilegal, cualquier divulgaci?n, distribuci?n o reproducci?n de esta comunicaci?n, y le rogamos que nos lo notifique inmediatamente y nos devuelva el mensaje original a la direcci?n arriba mencionada. Gracias. From rxaviers at gmail.com Mon Mar 16 09:56:55 2015 From: rxaviers at gmail.com (Rafael Xavier) Date: Mon, 16 Mar 2015 11:56:55 -0300 Subject: Significant digits corner cases In-Reply-To: References: Message-ID: 2015-03-16 10:02 GMT-03:00 Isaac Jurado : > 2015-03-16 13:59 GMT+01:00 Rafael Xavier : > > > > > > 2015-03-16 9:47 GMT-03:00 Isaac Jurado : > >> > >> Hello, > >> > >> I'm adding some fixes to the Babel [1] Python package and I found a > >> number formatting case that I would like to verify. > >> > >> The case in point is when formatting 0.0001 using the "@@@" pattern. In > >> the current version, the result is the string "0.000100", whereas I > >> would had expected something like "0.0001" or even "0". > > > > Why do you expect the latter? "@@@" means minimum significant digits = 3 > and > > maximum significant digits = 3... > > http://www.unicode.org/reports/tr35/tr35-numbers.html#sigdig > > Well, as all examples seem to count decimal, leading, zeroes as a > significant digits, I understood that "0.0001" already has four, which > would be already greater than the maximum specified by the pattern. > Hence my confusion. > Nope, "0.0001" has only 1 significant digit. For more info http://en.wikipedia.org/wiki/Significant_figures Anyway, I still think you can express your thoughts in case you think this is poorly documented in http://www.unicode.org/reports/tr35/tr35-numbers.html#sigdig. > Best regards. > > -- > Isaac Jurado > > > La informaci?n contenida en este mensaje y/o archivo(s) adjunto(s), > enviada desde GLOBAL OBI SL, es confidencial/privilegiada y est? > destinada a ser le?da s?lo por la(s) persona(s) a la(s) que va dirigida. > Le recordamos que sus datos han sido incorporados en un fichero y que > siempre y cuando se cumplan los requisitos exigidos por la normativa, > podr? ejercer los derechos de acceso, rectificaci?n, cancelaci?n y > oposici?n, ante nuestra entidad. > > Si usted lee este mensaje y no es el destinatario se?alado, el empleado > o el agente responsable de entrega el mensaje al destinatario, o ha > recibido esta comunicaci?n por error, le informamos que est? totalmente > prohibida, y puede ser ilegal, cualquier divulgaci?n, distribuci?n o > reproducci?n de esta comunicaci?n, y le rogamos que nos lo notifique > inmediatamente y nos devuelva el mensaje original a la direcci?n arriba > mencionada. Gracias. > -- +55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers http://rafael.xavier.blog.br -------------- next part -------------- An HTML attachment was scrubbed... URL: From isaac.jurado at roiback.com Mon Mar 16 11:21:43 2015 From: isaac.jurado at roiback.com (Isaac Jurado) Date: Mon, 16 Mar 2015 17:21:43 +0100 Subject: Significant digits corner cases In-Reply-To: References: Message-ID: 2015-03-16 15:56 GMT+01:00 Rafael Xavier : > > > 2015-03-16 10:02 GMT-03:00 Isaac Jurado : >> >> 2015-03-16 13:59 GMT+01:00 Rafael Xavier : >> > >> > >> > 2015-03-16 9:47 GMT-03:00 Isaac Jurado : >> >> >> >> Hello, >> >> >> >> I'm adding some fixes to the Babel [1] Python package and I found a >> >> number formatting case that I would like to verify. >> >> >> >> The case in point is when formatting 0.0001 using the "@@@" pattern. >> >> In >> >> the current version, the result is the string "0.000100", whereas I >> >> would had expected something like "0.0001" or even "0". >> > >> > Why do you expect the latter? "@@@" means minimum significant digits = 3 >> > and >> > maximum significant digits = 3... >> > http://www.unicode.org/reports/tr35/tr35-numbers.html#sigdig >> >> Well, as all examples seem to count decimal, leading, zeroes as a >> significant digits, I understood that "0.0001" already has four, which >> would be already greater than the maximum specified by the pattern. >> Hence my confusion. > > Nope, "0.0001" has only 1 significant digit. For more info > http://en.wikipedia.org/wiki/Significant_figures Right. Makes more sense that way O:-] > Anyway, I still think you can express your thoughts in case you think > this is poorly documented in > http://www.unicode.org/reports/tr35/tr35-numbers.html#sigdig. I already saw that. What I was missing is the knowledge of the basic concepts (e.g. significant digit). Thanks for the information. Best regards. -- Isaac Jurado La informaci?n contenida en este mensaje y/o archivo(s) adjunto(s), enviada desde GLOBAL OBI SL, es confidencial/privilegiada y est? destinada a ser le?da s?lo por la(s) persona(s) a la(s) que va dirigida. Le recordamos que sus datos han sido incorporados en un fichero y que siempre y cuando se cumplan los requisitos exigidos por la normativa, podr? ejercer los derechos de acceso, rectificaci?n, cancelaci?n y oposici?n, ante nuestra entidad. Si usted lee este mensaje y no es el destinatario se?alado, el empleado o el agente responsable de entrega el mensaje al destinatario, o ha recibido esta comunicaci?n por error, le informamos que est? totalmente prohibida, y puede ser ilegal, cualquier divulgaci?n, distribuci?n o reproducci?n de esta comunicaci?n, y le rogamos que nos lo notifique inmediatamente y nos devuelva el mensaje original a la direcci?n arriba mencionada. Gracias. From rxaviers at gmail.com Thu Mar 19 09:45:27 2015 From: rxaviers at gmail.com (Rafael Xavier) Date: Thu, 19 Mar 2015 11:45:27 -0300 Subject: Time zones: the localized GMT formats In-Reply-To: References: <20150313075238.665a7a7059d7ee80bb4d670165c8327d.a24dafca3a.wbe@email03.secureserver.net> Message-ID: I highly encourage documentation gets updated for clarification. I completely agree with Jon Zeppieri that there are so many nebulous aspects of tz formatting. 1: Patterns O and OOOO are defined respectively by "The *short localized GMT format*", and "The *long localized GMT format*". Both (short and long) localized GMT format are defined by: 7.1 Time Zone Format Terminology > Localized GMT format: A constant, specific offset from GMT (or UTC), which > may be in a translated form. There are two styles for this. The first is > used when there is an *explicit non-zero offset* from GMT; this style *is > specified by the element and element*. The *long > format* always uses *2-digit hours* field and *minutes* field, with *optional > 2-digit seconds* field. The *short format* is intended for the shortest > representation and uses *hour* fields* without leading zero*, with *optional > 2-digit minutes and seconds* fields. The digits used for hours, minutes > and seconds fields in this format are the locale's default decimal digits: > - "GMT+03:30" (long) > - "GMT+3:30" (short) > - "UTC-03.00" (long) > - "UTC-3" (short) > - "????????+03:30" (long) > > At [ http://www.unicode.org/reports/tr35/tr35-dates.html#Time_Zone_Format_Terminology ]. Q1: Which format does define, the short or the long? E.g., "en" locale defines *"+HH:mm;-HH:mm"*, which suggests, as Jon has pointed out, the long format. But, "cs" (or "fi" similarly) defines "+H:mm;-H:mm", which suggests the short format. If it defines one of them, where is the other? Should implementations (e.g., ICU) be able to use the above and extract the other forms from it? If so, is there any specification for this algorithm? Q2: How should the optional seconds be generated? This is somewhat related to the above question. But, it adds additional questions, for example which timeSeparator to use? It's not reliable to use the information from numbers data given for example the "am" language, where the timeSeparator is ":", but hourFormat is "+HHmm;-HHmm" suggesting no time separator should be used. Q3: How should the short format be generated? Again, this is somewhat related to the above question. But, has different complications. An algorithm should be able to drop the minutes field plus to drop the time separator. As Jon has pointed out, there are locales that use different time separators than ":" on their hourFormats ("da", "id", "am" as more examples). Also as Jon has pointed out, the is not always the same as used in hourFormats ("ar" as another example, where its timeSeparator is "?", but its hourFormat is "+HH:mm;-HH:mm"). 2: Pattern x is defined by "The ISO8601 basic format with hours field and optional minutes field". ISO8601 is defined by: ISO 8601 time zone formats: The formats based on the ISO 8601 local time > difference from UTC, or the UTC indicator ("Z" - only when the local time > offset is 0 and the specifier X* is used). The ISO 8601 basic format does > not use a separator character between hours and minutes field, while the > extended format uses colon (':') as the separator. The ISO 8601 basic > format with hours and minutes fields is equivalent to RFC 822 zone format. > > - "-0800" (basic) > - "-08" (basic - short) > - "-08:00" (extended) > - "Z" (UTC) > > Note: This specification extends the original ISO 8601 formats and some > format specifiers append seconds field when necessary. > At [ http://www.unicode.org/reports/tr35/tr35-dates.html#Time_Zone_Format_Terminology ]. Q1: How to format offset zero: "+0000" or "-0000"? In wikipedia, it says to use "+0000", because "-0000" is forbidden according to clause 3.4.2 in the 2004 edition of the standard. Although, it's allowed on RFC 3339. Q2: Should we find any more info of ISO 8601 somewhere else in UTS TR? Does UTS TR recommend going external to find out more about it (eg. ISO_8601 wikipedia entry , or iso.org (available for purchase only) ). On Sun, Mar 15, 2015 at 2:23 AM, Jon Zeppieri wrote: > On Sun, Mar 15, 2015 at 12:09 AM, Philippe Verdy > wrote: > > I suppose that the "short" form will differentiate from the non short > form, > > only by stripping zeroes > > > > Unless the value of is syntactically constrained in ways > not mentioned in the documentation, this isn't enough, as my example > about possible literal strings in demonstrates. Here's a > more realistic example: > > The pl locale's is "+H.mm;-H.mm". Note that it uses a > literal '.' as the time separator, rather than the pattern variable > ':'. If you were going to strip out the mm field here, you'd also want > to strip out the '.'. But unless you know that '.' represents a > separator, rather than some literal portion of the pattern, you really > can't. And even the fact that '.' is the for pl > doesn't prove that it's being used that way in the pattern. > > My guess is that *is* syntactically constrained -- that > it's not allowed to use the full pattern syntax -- because if that's > not true then it seems impossible to implement the short form as > specified. So, really, I'm just looking for some confirmation about > what can and cannot appear in . > > -Jon > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -- +55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers http://rafael.xavier.blog.br -------------- next part -------------- An HTML attachment was scrubbed... URL: From rxaviers at gmail.com Thu Mar 19 10:07:36 2015 From: rxaviers at gmail.com (Rafael Xavier) Date: Thu, 19 Mar 2015 12:07:36 -0300 Subject: Time zones: the localized GMT formats In-Reply-To: References: <20150313075238.665a7a7059d7ee80bb4d670165c8327d.a24dafca3a.wbe@email03.secureserver.net> Message-ID: I took the liberty and filed two tickets: - http://unicode.org/cldr/trac/ticket/8293 - http://unicode.org/cldr/trac/ticket/8294 Please, feel free to correct me or add more information to them. On Thu, Mar 19, 2015 at 11:45 AM, Rafael Xavier wrote: > I highly encourage documentation gets updated for clarification. I > completely agree with Jon Zeppieri that there are so many nebulous aspects > of tz formatting. > > 1: > Patterns O and OOOO are defined respectively by "The *short localized GMT > format*", and "The *long localized GMT format*". Both (short and long) > localized GMT format are defined by: > > 7.1 Time Zone Format Terminology >> Localized GMT format: A constant, specific offset from GMT (or UTC), >> which may be in a translated form. There are two styles for this. The first >> is used when there is an *explicit non-zero offset* from GMT; this style *is >> specified by the element and element*. The *long >> format* always uses *2-digit hours* field and *minutes* field, with *optional >> 2-digit seconds* field. The *short format* is intended for the shortest >> representation and uses *hour* fields* without leading zero*, with *optional >> 2-digit minutes and seconds* fields. The digits used for hours, minutes >> and seconds fields in this format are the locale's default decimal digits: > > >> - "GMT+03:30" (long) >> - "GMT+3:30" (short) >> - "UTC-03.00" (long) >> - "UTC-3" (short) >> - "????????+03:30" (long) >> >> At [ > http://www.unicode.org/reports/tr35/tr35-dates.html#Time_Zone_Format_Terminology > ]. > > Q1: Which format does define, the short or the long? E.g., > "en" locale defines *"+HH:mm;-HH:mm"*, which suggests, as Jon has pointed > out, the long format. But, "cs" (or "fi" similarly) defines "+H:mm;-H:mm", > which suggests the short format. If it defines one of them, where is the > other? Should implementations (e.g., ICU) be able to use the above > and extract the other forms from it? If so, is there any > specification for this algorithm? > > Q2: How should the optional seconds be generated? This is somewhat related > to the above question. But, it adds additional questions, for example which > timeSeparator to use? It's not reliable to use the > information from numbers data given for example the "am" language, where > the timeSeparator is ":", but hourFormat is "+HHmm;-HHmm" suggesting no > time separator should be used. > > Q3: How should the short format be generated? Again, this is somewhat > related to the above question. But, has different complications. An > algorithm should be able to drop the minutes field plus to drop the time > separator. As Jon has pointed out, there are locales that use different > time separators than ":" on their hourFormats ("da", "id", "am" as more > examples). Also as Jon has pointed out, the is not always > the same as used in hourFormats ("ar" as another example, where its > timeSeparator is "?", but its hourFormat is "+HH:mm;-HH:mm"). > > > 2: > > Pattern x is defined by "The ISO8601 basic format with hours field and > optional minutes field". > > ISO8601 is defined by: > > ISO 8601 time zone formats: The formats based on the ISO 8601 local time >> difference from UTC, or the UTC indicator ("Z" - only when the local time >> offset is 0 and the specifier X* is used). The ISO 8601 basic format does >> not use a separator character between hours and minutes field, while the >> extended format uses colon (':') as the separator. The ISO 8601 basic >> format with hours and minutes fields is equivalent to RFC 822 zone format. >> >> - "-0800" (basic) >> - "-08" (basic - short) >> - "-08:00" (extended) >> - "Z" (UTC) >> >> Note: This specification extends the original ISO 8601 formats and some >> format specifiers append seconds field when necessary. >> > At [ > http://www.unicode.org/reports/tr35/tr35-dates.html#Time_Zone_Format_Terminology > ]. > > Q1: How to format offset zero: "+0000" or "-0000"? In wikipedia, it says > to use "+0000", because "-0000" is forbidden according to clause 3.4.2 in > the 2004 edition of the standard. Although, it's allowed on RFC 3339. > > Q2: Should we find any more info of ISO 8601 somewhere else in UTS TR? > Does UTS TR recommend going external to find out more about it (eg. ISO_8601 > wikipedia entry , or iso.org > (available for purchase only) > ). > > > On Sun, Mar 15, 2015 at 2:23 AM, Jon Zeppieri wrote: > >> On Sun, Mar 15, 2015 at 12:09 AM, Philippe Verdy >> wrote: >> > I suppose that the "short" form will differentiate from the non short >> form, >> > only by stripping zeroes >> > >> >> Unless the value of is syntactically constrained in ways >> not mentioned in the documentation, this isn't enough, as my example >> about possible literal strings in demonstrates. Here's a >> more realistic example: >> >> The pl locale's is "+H.mm;-H.mm". Note that it uses a >> literal '.' as the time separator, rather than the pattern variable >> ':'. If you were going to strip out the mm field here, you'd also want >> to strip out the '.'. But unless you know that '.' represents a >> separator, rather than some literal portion of the pattern, you really >> can't. And even the fact that '.' is the for pl >> doesn't prove that it's being used that way in the pattern. >> >> My guess is that *is* syntactically constrained -- that >> it's not allowed to use the full pattern syntax -- because if that's >> not true then it seems impossible to implement the short form as >> specified. So, really, I'm just looking for some confirmation about >> what can and cannot appear in . >> >> -Jon >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> > > > > -- > +55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers > http://rafael.xavier.blog.br > -- +55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers http://rafael.xavier.blog.br -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Thu Mar 19 10:53:19 2015 From: doug at ewellic.org (Doug Ewell) Date: Thu, 19 Mar 2015 08:53:19 -0700 Subject: Breaking changes to data file names Message-ID: <20150319085319.665a7a7059d7ee80bb4d670165c8327d.6c60437d77.wbe@email03.secureserver.net> The data files at http://www.unicode.org/Public/cldr/27/ are named: cldr-common-27.0.zip cldr-keyboards-27.0.zip cldr-tools-27.0.zip Is this a permanent change in the naming scheme, or a development artifact that leaked out by accident? If it's permanent, then it breaks the statements in RFCs 6067 (sections 2.1 and 2.2) and 6497 (sections 2.4 and 2.9) that BCP 47 extension data is located in core.zip. At least the registration records in language-tag-extensions-registry, and probably the RFCs themselves, will have to be updated promptly. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From srloomis at us.ibm.com Thu Mar 19 11:03:12 2015 From: srloomis at us.ibm.com (Steven R Loomis) Date: Thu, 19 Mar 2015 09:03:12 -0700 Subject: Breaking changes to data file names In-Reply-To: <20150319085319.665a7a7059d7ee80bb4d670165c8327d.6c60437d77.wbe@email03.secureserver.net> References: <20150319085319.665a7a7059d7ee80bb4d670165c8327d.6c60437d77.wbe@email03.secureserver.net> Message-ID: they are intended to be permanent as per http://unicode.org/cldr/trac/ticket/8031 But this is a good point - https://www.rfc-editor.org/rfc/rfc6067.txt and http://www.rfc-editor.org/rfc/rfc6497.txt Perhaps core.zip could be a redirect (or symlink ) to the actual file? -s "CLDR-Users" wrote on 03/19/2015 08:53:19 AM: > From: "Doug Ewell" > To: cldr-users at unicode.org > Date: 03/19/2015 08:57 AM > Subject: Breaking changes to data file names > Sent by: "CLDR-Users" > > The data files at http://www.unicode.org/Public/cldr/27/ are named: > > cldr-common-27.0.zip > cldr-keyboards-27.0.zip > cldr-tools-27.0.zip > > Is this a permanent change in the naming scheme, or a development > artifact that leaked out by accident? > > If it's permanent, then it breaks the statements in RFCs 6067 (sections > 2.1 and 2.2) and 6497 (sections 2.4 and 2.9) that BCP 47 extension data > is located in core.zip. At least the registration records in > language-tag-extensions-registry, and probably the RFCs themselves, will > have to be updated promptly. > > -- > Doug Ewell | http://ewellic.org | Thornton, CO ???? > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From emmo at us.ibm.com Thu Mar 19 11:15:31 2015 From: emmo at us.ibm.com (John Emmons) Date: Thu, 19 Mar 2015 11:15:31 -0500 Subject: Breaking changes to data file names In-Reply-To: References: <20150319085319.665a7a7059d7ee80bb4d670165c8327d.6c60437d77.wbe@email03.secureserver.net> Message-ID: That's what I was thinking, Steven ( i.e. just symlink to the real file name ). Regards, John C. Emmons Globalization Architect & Unicode CLDR TC Chairman IBM Software Group Internet: emmo at us.ibm.com From: Steven R Loomis/Cupertino/IBM at IBMUS To: "Doug Ewell" Cc: cldr-users at unicode.org Date: 03/19/2015 11:12 AM Subject: Re: Breaking changes to data file names Sent by: "CLDR-Users" they are intended to be permanent as per http://unicode.org/cldr/trac/ticket/8031 But this is a good point - https://www.rfc-editor.org/rfc/rfc6067.txt and http://www.rfc-editor.org/rfc/rfc6497.txt Perhaps core.zip could be a redirect (or symlink ) to the actual file? -s "CLDR-Users" wrote on 03/19/2015 08:53:19 AM: > From: "Doug Ewell" > To: cldr-users at unicode.org > Date: 03/19/2015 08:57 AM > Subject: Breaking changes to data file names > Sent by: "CLDR-Users" > > The data files at http://www.unicode.org/Public/cldr/27/ are named: > > cldr-common-27.0.zip > cldr-keyboards-27.0.zip > cldr-tools-27.0.zip > > Is this a permanent change in the naming scheme, or a development > artifact that leaked out by accident? > > If it's permanent, then it breaks the statements in RFCs 6067 (sections > 2.1 and 2.2) and 6497 (sections 2.4 and 2.9) that BCP 47 extension data > is located in core.zip. At least the registration records in > language-tag-extensions-registry, and probably the RFCs themselves, will > have to be updated promptly. > > -- > Doug Ewell | http://ewellic.org | Thornton, CO ???? > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users _______________________________________________ CLDR-Users mailing list CLDR-Users at unicode.org http://unicode.org/mailman/listinfo/cldr-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From emmo at us.ibm.com Thu Mar 19 11:15:31 2015 From: emmo at us.ibm.com (John Emmons) Date: Thu, 19 Mar 2015 11:15:31 -0500 Subject: Breaking changes to data file names In-Reply-To: <20150319085319.665a7a7059d7ee80bb4d670165c8327d.6c60437d77.wbe@email03.secureserver.net> References: <20150319085319.665a7a7059d7ee80bb4d670165c8327d.6c60437d77.wbe@email03.secureserver.net> Message-ID: It's a permanent change. See http://unicode.org/cldr/trac/ticket/8031 for the rationale. Also the change in name from "core" to "common" was done by agreement of the CLDR TC yesterday, in order to avoid confusion with the cldr-core package being done for the JSON. Regards, John C. Emmons Globalization Architect & Unicode CLDR TC Chairman IBM Software Group Internet: emmo at us.ibm.com From: "Doug Ewell" To: cldr-users at unicode.org Date: 03/19/2015 10:57 AM Subject: Breaking changes to data file names Sent by: "CLDR-Users" The data files at http://www.unicode.org/Public/cldr/27/ are named: cldr-common-27.0.zip cldr-keyboards-27.0.zip cldr-tools-27.0.zip Is this a permanent change in the naming scheme, or a development artifact that leaked out by accident? If it's permanent, then it breaks the statements in RFCs 6067 (sections 2.1 and 2.2) and 6497 (sections 2.4 and 2.9) that BCP 47 extension data is located in core.zip. At least the registration records in language-tag-extensions-registry, and probably the RFCs themselves, will have to be updated promptly. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? _______________________________________________ CLDR-Users mailing list CLDR-Users at unicode.org http://unicode.org/mailman/listinfo/cldr-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Thu Mar 19 11:14:44 2015 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Thu, 19 Mar 2015 17:14:44 +0100 Subject: Breaking changes to data file names In-Reply-To: References: <20150319085319.665a7a7059d7ee80bb4d670165c8327d.6c60437d77.wbe@email03.secureserver.net> Message-ID: Good catch, Doug. Steven, yes, a symlink/redirect sounds good. Mark *? Il meglio ? l?inimico del bene ?* On Thu, Mar 19, 2015 at 5:03 PM, Steven R Loomis wrote: > they are intended to be permanent as per > http://unicode.org/cldr/trac/ticket/8031 > > But this is a good point - https://www.rfc-editor.org/rfc/rfc6067.txt and > http://www.rfc-editor.org/rfc/rfc6497.txt > > Perhaps core.zip could be a redirect (or symlink ) to the actual file? > > -s > > > "CLDR-Users" wrote on 03/19/2015 > 08:53:19 AM: > > > From: "Doug Ewell" > > To: cldr-users at unicode.org > > Date: 03/19/2015 08:57 AM > > Subject: Breaking changes to data file names > > Sent by: "CLDR-Users" > > > > > The data files at http://www.unicode.org/Public/cldr/27/ are named: > > > > cldr-common-27.0.zip > > cldr-keyboards-27.0.zip > > cldr-tools-27.0.zip > > > > Is this a permanent change in the naming scheme, or a development > > artifact that leaked out by accident? > > > > If it's permanent, then it breaks the statements in RFCs 6067 (sections > > 2.1 and 2.2) and 6497 (sections 2.4 and 2.9) that BCP 47 extension data > > is located in core.zip. At least the registration records in > > language-tag-extensions-registry, and probably the RFCs themselves, will > > have to be updated promptly. > > > > -- > > Doug Ewell | http://ewellic.org | Thornton, CO ???? > > > > > > _______________________________________________ > > CLDR-Users mailing list > > CLDR-Users at unicode.org > > http://unicode.org/mailman/listinfo/cldr-users > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Thu Mar 19 11:25:55 2015 From: doug at ewellic.org (Doug Ewell) Date: Thu, 19 Mar 2015 09:25:55 -0700 Subject: Breaking changes to data file names Message-ID: <20150319092555.665a7a7059d7ee80bb4d670165c8327d.76bc66d0d8.wbe@email03.secureserver.net> Steven R Loomis wrote: > they are intended to be permanent as per > http://unicode.org/cldr/trac/ticket/8031 It might not have been quite so bad if the name had been changed to "cldr-XX-core.zip" as originally proposed in the ticket. An application would have no imaginable way to map "core.zip" to "cldr-common-XX.X.zip" on its own. I see from the ticket that the change from "core" to "common" was approved literally yesterday (March 18). That's really not a lot of time before a release to do the due diligence of finding out whether external references, such as RFCs and IANA registries, are affected. It looks like only internal references were considered. > Perhaps core.zip could be a redirect (or symlink ) to the actual file? That would work, since the location of the files within the archive (common\bcp47) has not changed. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From srloomis at us.ibm.com Thu Mar 19 11:59:33 2015 From: srloomis at us.ibm.com (Steven R Loomis) Date: Thu, 19 Mar 2015 09:59:33 -0700 Subject: Breaking changes to data file names In-Reply-To: <20150319092555.665a7a7059d7ee80bb4d670165c8327d.76bc66d0d8.wbe@email03.secureserver.net> References: <20150319092555.665a7a7059d7ee80bb4d670165c8327d.76bc66d0d8.wbe@email03.secureserver.net> Message-ID: "CLDR-Users" wrote on 03/19/2015 09:25:55 AM: > From: "Doug Ewell" > ... > I see from the ticket that the change from "core" to "common" was > approved literally yesterday (March 18). That's really not a lot of time > before a release to do the due diligence of finding out whether external > references, such as RFCs and IANA registries, are affected. It looks > like only internal references were considered. > > > Perhaps core.zip could be a redirect (or symlink ) to the actual file? > > That would work, since the location of the files within the archive > (common\bcp47) has not changed. Perhaps the CLDR-TC should have a line item for releases to verify that files referenced by the external specs have not changed. It shouldn't be too large of a list. At least one of the above RFCs does have a specific URL in it. -s -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Thu Mar 19 12:07:40 2015 From: doug at ewellic.org (Doug Ewell) Date: Thu, 19 Mar 2015 10:07:40 -0700 Subject: Breaking changes to data file names Message-ID: <20150319100740.665a7a7059d7ee80bb4d670165c8327d.3d2e629748.wbe@email03.secureserver.net> Steven R Loomis wrote: > Perhaps the CLDR-TC should have a line item for releases to verify > that files referenced by the external specs have not changed. It > shouldn't be too large of a list. At least one of the above RFCs does > have a specific URL in it. Both do. I wondered the same thing: do any other RFCs, I-Ds, or other specifications reference specific CLDR files by name? I'm not aware of a search engine that would answer that. -- Doug Ewell | http://ewellic.org | Thornton, CO ???? From naoto.sato at oracle.com Tue Mar 31 19:03:19 2015 From: naoto.sato at oracle.com (Naoto Sato) Date: Tue, 31 Mar 2015 17:03:19 -0700 Subject: "no inheritance marker" Message-ID: <551B35C7.5000500@oracle.com> Hello, I have a question on this "no inheritance marker", used in the short form of time zone "metazone" names. In LDML spec, it reads: --- If a given short metazone form is known NOT to be understood in a given locale and the parent locale has this value such that it would normally be inherited, the inheritance of this value can be explicitly disabled by use of the 'no inheritance marker' as the value, which is 3 simultaneous empty set characters ( U+2205 ). [1] --- So if an app tries to display the short names with this marker, what should they actually be? For example, in case of "en_GB" locale, lookup for "America_Pacific" short names ends up with this "U+2205U+2205U+2205" marker in "en_001" locale, which disables inheriting "PT"/"PST"/"PDT" in "en". Naoto [1] http://www.unicode.org/reports/tr35/tr35-39/tr35-dates.html#Metazone_Names