Time zones: the localized GMT formats

Rafael Xavier rxaviers at gmail.com
Thu Mar 19 09:45:27 CDT 2015


I highly encourage documentation gets updated for clarification. I
completely agree with Jon Zeppieri that there are so many nebulous aspects
of tz formatting.

1:
Patterns O and OOOO are defined respectively by "The *short localized GMT
format*", and "The *long localized GMT format*". Both (short and long)
localized GMT format are defined by:

7.1 Time Zone Format Terminology
> Localized GMT format: A constant, specific offset from GMT (or UTC), which
> may be in a translated form. There are two styles for this. The first is
> used when there is an *explicit non-zero offset* from GMT; this style *is
> specified by the <gmtFormat> element and <hourFormat> element*. The *long
> format* always uses *2-digit hours* field and *minutes* field, with *optional
> 2-digit seconds* field. The *short format* is intended for the shortest
> representation and uses *hour* fields* without leading zero*, with *optional
> 2-digit minutes and seconds* fields. The digits used for hours, minutes
> and seconds fields in this format are the locale's default decimal digits:


>    - "GMT+03:30" (long)
>    - "GMT+3:30" (short)
>    - "UTC-03.00" (long)
>    - "UTC-3" (short)
>    - "Гриинуич+03:30" (long)
>
> At [
http://www.unicode.org/reports/tr35/tr35-dates.html#Time_Zone_Format_Terminology
].

Q1: Which format does <hourFormat> define, the short or the long? E.g.,
"en" locale defines *"+HH:mm;-HH:mm"*, which suggests, as Jon has pointed
out, the long format. But, "cs" (or "fi" similarly) defines "+H:mm;-H:mm",
which suggests the short format. If it defines one of them, where is the
other? Should implementations (e.g., ICU) be able to use the above
<hourFormat> and extract the other forms from it? If so, is there any
specification for this algorithm?

Q2: How should the optional seconds be generated? This is somewhat related
to the above question. But, it adds additional questions, for example which
timeSeparator to use? It's not reliable to use the <timeSeparator>
information from numbers data given for example the "am" language, where
the timeSeparator is ":", but hourFormat is "+HHmm;-HHmm" suggesting no
time separator should be used.

Q3: How should the short format be generated? Again, this is somewhat
related to the above question. But, has different complications. An
algorithm should be able to drop the minutes field plus to drop the time
separator. As Jon has pointed out, there are locales that use different
time separators than ":" on their hourFormats ("da", "id", "am" as more
examples). Also as Jon has pointed out, the <timeSeparator> is not always
the same as used in hourFormats ("ar" as another example, where its
timeSeparator is "،", but its hourFormat is "+HH:mm;-HH:mm").


2:

Pattern x is defined by "The ISO8601 basic format with hours field and
optional minutes field".

ISO8601 is defined by:

ISO 8601 time zone formats: The formats based on the ISO 8601 local time
> difference from UTC, or the UTC indicator ("Z" - only when the local time
> offset is 0 and the specifier X* is used). The ISO 8601 basic format does
> not use a separator character between hours and minutes field, while the
> extended format uses colon (':') as the separator. The ISO 8601 basic
> format with hours and minutes fields is equivalent to RFC 822 zone format.
>
>    - "-0800" (basic)
>    - "-08" (basic - short)
>    - "-08:00" (extended)
>    - "Z" (UTC)
>
>  Note: This specification extends the original ISO 8601 formats and some
> format specifiers append seconds field when necessary.
>
At [
http://www.unicode.org/reports/tr35/tr35-dates.html#Time_Zone_Format_Terminology
].

Q1: How to format offset zero: "+0000" or "-0000"? In wikipedia, it says to
use "+0000", because "-0000" is forbidden according to clause 3.4.2 in the
2004 edition of the standard. Although, it's allowed on RFC 3339.

Q2: Should we find any more info of ISO 8601 somewhere else in UTS TR? Does
UTS TR recommend going external to find out more about it (eg. ISO_8601
wikipedia entry <http://en.wikipedia.org/wiki/ISO_8601>, or iso.org
(available for purchase only)
<http://www.iso.org/iso/home/standards/iso8601.htm>).


On Sun, Mar 15, 2015 at 2:23 AM, Jon Zeppieri <zeppieri at gmail.com> wrote:

> On Sun, Mar 15, 2015 at 12:09 AM, Philippe Verdy <verdy_p at wanadoo.fr>
> wrote:
> > I suppose that the "short" form will differentiate from the non short
> form,
> > only by stripping zeroes
> >
>
> Unless the value of <hourFormat> is syntactically constrained in ways
> not mentioned in the documentation, this isn't enough, as my example
> about possible literal strings in <hourFormat> demonstrates. Here's a
> more realistic example:
>
> The pl locale's <hourFormat> is "+H.mm;-H.mm". Note that it uses a
> literal '.' as the time separator, rather than the pattern variable
> ':'. If you were going to strip out the mm field here, you'd also want
> to strip out the '.'. But unless you know that '.' represents a
> separator, rather than some literal portion of the pattern, you really
> can't. And even the fact that '.' is the <timeSeparator> for pl
> doesn't prove that it's being used that way in the pattern.
>
> My guess is that <hourFormat> *is* syntactically constrained -- that
> it's not allowed to use the full pattern syntax -- because if that's
> not true then it seems impossible to implement the short form as
> specified. So, really, I'm just looking for some confirmation about
> what can and cannot appear in <hourFormat>.
>
> -Jon
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>



-- 
+55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers
http://rafael.xavier.blog.br
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20150319/66efd453/attachment-0001.html>


More information about the CLDR-Users mailing list