From kazede at google.com Tue Dec 1 18:41:36 2015 From: kazede at google.com (kz) Date: Tue, 1 Dec 2015 16:41:36 -0800 Subject: Fwd: Data for root for day periods in ICU data In-Reply-To: References: Message-ID: Hi all, In the CLDR data for day periods , there's an entry for root but the content is commented out. This results in root having a presence but not given a rule set in ICU data for day periods . What should be the proper formatting behavior for the root locale for the day period character 'B'? Is an empty string acceptable? Thanks kz -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Wed Dec 2 04:55:33 2015 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Wed, 2 Dec 2015 11:55:33 +0100 Subject: Data for root for day periods in ICU data In-Reply-To: References: Message-ID: There are day periods in CLDR for root; I see: AM PM If a locale has no variable day periods, then AM/PM is the default. We should make that clearer in the spec. Would you mind filing a ticket? For more info, see http://unicode.org/reports/tr35/tr35-dates.html#Day_Period_Rules Mark On Wed, Dec 2, 2015 at 1:41 AM, kz wrote: > Hi all, > > In the CLDR data for day periods > , > there's an entry for root but the content is commented out. This results in > root having a presence but not given a rule set in ICU data for day > periods > > . > > What should be the proper formatting behavior for the root locale for the > day period character 'B'? Is an empty string acceptable? > > > Thanks > kz > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kazede at google.com Wed Dec 2 12:44:30 2015 From: kazede at google.com (kz) Date: Wed, 2 Dec 2015 10:44:30 -0800 Subject: Data for root for day periods in ICU data In-Reply-To: References: Message-ID: My mistake; I meant "variable day periods" when I said "day periods". The lack of localized strings is not the problem; what I'm having trouble with are the rules for when to use what string. Root has a presence in that file but no rules are defined. In ICU data, this results in root pointing to "set1" but set1 doesn't exist. Is this a mistake, or should I hard code in how to handle root? Thanks kz On Wed, Dec 2, 2015 at 2:55 AM, Mark Davis ?? wrote: > There are day periods in CLDR for root; I see: > > > > > AM > PM > > > > > > > > > > > path="../../dayPeriodContext[@type='format']/dayPeriodWidth[@type='abbreviated']"/> > > > > > > > > > > > If a locale has no variable day periods, then AM/PM is the default. We > should make that clearer in the spec. Would you mind filing a ticket? > > For more info, see > http://unicode.org/reports/tr35/tr35-dates.html#Day_Period_Rules > > Mark > > On Wed, Dec 2, 2015 at 1:41 AM, kz wrote: > >> Hi all, >> >> In the CLDR data for day periods >> , >> there's an entry for root but the content is commented out. This results in >> root having a presence but not given a rule set in ICU data for day >> periods >> >> . >> >> What should be the proper formatting behavior for the root locale for the >> day period character 'B'? Is an empty string acceptable? >> >> >> Thanks >> kz >> >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Wed Dec 2 13:26:40 2015 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Wed, 2 Dec 2015 20:26:40 +0100 Subject: Data for root for day periods in ICU data In-Reply-To: References: Message-ID: Makes sense now. Root is empty, which is correct. Based on what you said, I suspect that set1 is being suppressed because it is empty. Probably needs a tweak to the LDML2ICU converter to have a special value for "no rules" to show up in ICU. Mind filing a ticket for that? In the meantime?*as a workaround*?if there is no set1 in the ICU data you can assume it is an empty rule set. Mark On Wed, Dec 2, 2015 at 7:44 PM, kz wrote: > My mistake; I meant "variable day periods" when I said "day periods". > > The lack of localized strings is not the problem; what I'm having trouble > with are the rules > > for when to use what string. Root has a presence in that file but no rules > are defined. In ICU data, this results in root pointing to "set1" but set1 > doesn't exist. > > Is this a mistake, or should I hard code in how to handle root? > > > Thanks > kz > > On Wed, Dec 2, 2015 at 2:55 AM, Mark Davis ?? wrote: > >> There are day periods in CLDR for root; I see: >> >> >> >> >> AM >> PM >> >> >> >> >> >> >> >> >> >> >> > path="../../dayPeriodContext[@type='format']/dayPeriodWidth[@type='abbreviated']"/> >> >> >> >> >> >> >> >> >> >> >> If a locale has no variable day periods, then AM/PM is the default. We >> should make that clearer in the spec. Would you mind filing a ticket? >> >> For more info, see >> http://unicode.org/reports/tr35/tr35-dates.html#Day_Period_Rules >> >> Mark >> >> On Wed, Dec 2, 2015 at 1:41 AM, kz wrote: >> >>> Hi all, >>> >>> In the CLDR data for day periods >>> , >>> there's an entry for root but the content is commented out. This results in >>> root having a presence but not given a rule set in ICU data for day >>> periods >>> >>> . >>> >>> What should be the proper formatting behavior for the root locale for >>> the day period character 'B'? Is an empty string acceptable? >>> >>> >>> Thanks >>> kz >>> >>> >>> _______________________________________________ >>> CLDR-Users mailing list >>> CLDR-Users at unicode.org >>> http://unicode.org/mailman/listinfo/cldr-users >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kazede at google.com Wed Dec 2 13:39:25 2015 From: kazede at google.com (kz) Date: Wed, 2 Dec 2015 11:39:25 -0800 Subject: Data for root for day periods in ICU data In-Reply-To: References: Message-ID: But root shouldn't be empty. TR35 says The set of dayPeriodRules need to completely cover the 24 hours in a day > (from 0:00 before 24:00), with no overlaps between any dayPeriodRules. Now root is allowed to have *no strings* for the variable day periods (making it fall back to am/pm -- btw I submitted a ticket for that clarification), but to conform to the spec, it still needs to state, even trivially, which day period each moment in time falls in. Or we need to change the spec to allow empty rule sets. What do you think? Thanks kz On Wed, Dec 2, 2015 at 11:26 AM, Mark Davis ?? wrote: > Makes sense now. Root is empty, which is correct. Based on what you said, > I suspect that set1 is being suppressed because it is empty. Probably needs > a tweak to the LDML2ICU converter to have a special value for "no rules" to > show up in ICU. Mind filing a ticket for that? > > In the meantime?*as a workaround*?if there is no set1 in the ICU data you > can assume it is an empty rule set. > > Mark > > On Wed, Dec 2, 2015 at 7:44 PM, kz wrote: > >> My mistake; I meant "variable day periods" when I said "day periods". >> >> The lack of localized strings is not the problem; what I'm having trouble >> with are the rules >> >> for when to use what string. Root has a presence in that file but no rules >> are defined. In ICU data, this results in root pointing to "set1" but set1 >> doesn't exist. >> >> Is this a mistake, or should I hard code in how to handle root? >> >> >> Thanks >> kz >> >> On Wed, Dec 2, 2015 at 2:55 AM, Mark Davis ?? wrote: >> >>> There are day periods in CLDR for root; I see: >>> >>> >>> >>> >>> AM >>> PM >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >> path="../../dayPeriodContext[@type='format']/dayPeriodWidth[@type='abbreviated']"/> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> If a locale has no variable day periods, then AM/PM is the default. We >>> should make that clearer in the spec. Would you mind filing a ticket? >>> >>> For more info, see >>> http://unicode.org/reports/tr35/tr35-dates.html#Day_Period_Rules >>> >>> Mark >>> >>> On Wed, Dec 2, 2015 at 1:41 AM, kz wrote: >>> >>>> Hi all, >>>> >>>> In the CLDR data for day periods >>>> , >>>> there's an entry for root but the content is commented out. This results in >>>> root having a presence but not given a rule set in ICU data for day >>>> periods >>>> >>>> . >>>> >>>> What should be the proper formatting behavior for the root locale for >>>> the day period character 'B'? Is an empty string acceptable? >>>> >>>> >>>> Thanks >>>> kz >>>> >>>> >>>> _______________________________________________ >>>> CLDR-Users mailing list >>>> CLDR-Users at unicode.org >>>> http://unicode.org/mailman/listinfo/cldr-users >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Wed Dec 2 14:42:19 2015 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Wed, 2 Dec 2015 21:42:19 +0100 Subject: Data for root for day periods in ICU data In-Reply-To: References: Message-ID: ?No, because if there are no variable rules, there are no variable rules. What the spec should say is: If there are variable day periods specified by a set of dayPeriodRules, that set needs to completely.... The AM/PM are fixed day periods, don't require rules, and always valid as a fallback.? Mark On Wed, Dec 2, 2015 at 8:39 PM, kz wrote: > But root shouldn't be empty. TR35 says > > The set of dayPeriodRules need to completely cover the 24 hours in a day >> (from 0:00 before 24:00), with no overlaps between any dayPeriodRules. > > > Now root is allowed to have *no strings* for the variable day periods > (making it fall back to am/pm -- btw I submitted a ticket for that > clarification), but to conform to the spec, it still needs to state, even > trivially, which day period each moment in time falls in. Or we need to > change the spec to allow empty rule sets. > > What do you think? > > > Thanks > kz > > On Wed, Dec 2, 2015 at 11:26 AM, Mark Davis ?? wrote: > >> Makes sense now. Root is empty, which is correct. Based on what you said, >> I suspect that set1 is being suppressed because it is empty. Probably needs >> a tweak to the LDML2ICU converter to have a special value for "no rules" to >> show up in ICU. Mind filing a ticket for that? >> >> In the meantime?*as a workaround*?if there is no set1 in the ICU data >> you can assume it is an empty rule set. >> >> Mark >> >> On Wed, Dec 2, 2015 at 7:44 PM, kz wrote: >> >>> My mistake; I meant "variable day periods" when I said "day periods". >>> >>> The lack of localized strings is not the problem; what I'm having >>> trouble with are the rules >>> >>> for when to use what string. Root has a presence in that file but no rules >>> are defined. In ICU data, this results in root pointing to "set1" but set1 >>> doesn't exist. >>> >>> Is this a mistake, or should I hard code in how to handle root? >>> >>> >>> Thanks >>> kz >>> >>> On Wed, Dec 2, 2015 at 2:55 AM, Mark Davis ?? >>> wrote: >>> >>>> There are day periods in CLDR for root; I see: >>>> >>>> >>>> >>>> >>>> AM >>>> PM >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> path="../../dayPeriodContext[@type='format']/dayPeriodWidth[@type='abbreviated']"/> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> If a locale has no variable day periods, then AM/PM is the default. We >>>> should make that clearer in the spec. Would you mind filing a ticket? >>>> >>>> For more info, see >>>> http://unicode.org/reports/tr35/tr35-dates.html#Day_Period_Rules >>>> >>>> Mark >>>> >>>> On Wed, Dec 2, 2015 at 1:41 AM, kz wrote: >>>> >>>>> Hi all, >>>>> >>>>> In the CLDR data for day periods >>>>> , >>>>> there's an entry for root but the content is commented out. This results in >>>>> root having a presence but not given a rule set in ICU data for day >>>>> periods >>>>> >>>>> . >>>>> >>>>> What should be the proper formatting behavior for the root locale for >>>>> the day period character 'B'? Is an empty string acceptable? >>>>> >>>>> >>>>> Thanks >>>>> kz >>>>> >>>>> >>>>> _______________________________________________ >>>>> CLDR-Users mailing list >>>>> CLDR-Users at unicode.org >>>>> http://unicode.org/mailman/listinfo/cldr-users >>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kazede at google.com Wed Dec 2 15:24:08 2015 From: kazede at google.com (kz) Date: Wed, 2 Dec 2015 13:24:08 -0800 Subject: Data for root for day periods in ICU data In-Reply-To: References: Message-ID: Okay, got it. Empty rules == fall back to am/pm. I'll submit a ticket for LDML2ICU. Thanks! kz On Wed, Dec 2, 2015 at 12:42 PM, Mark Davis ?? wrote: > ?No, because if there are no variable rules, there are no variable rules. > > What the spec should say is: > > If there are variable day periods specified by a set of > dayPeriodRules, that set needs to completely.... > > > The AM/PM are fixed day periods, don't require rules, and always valid as > a fallback.? > > Mark > > On Wed, Dec 2, 2015 at 8:39 PM, kz wrote: > >> But root shouldn't be empty. TR35 says >> >> The set of dayPeriodRules need to completely cover the 24 hours in a day >>> (from 0:00 before 24:00), with no overlaps between any dayPeriodRules. >> >> >> Now root is allowed to have *no strings* for the variable day periods >> (making it fall back to am/pm -- btw I submitted a ticket for that >> clarification), but to conform to the spec, it still needs to state, even >> trivially, which day period each moment in time falls in. Or we need to >> change the spec to allow empty rule sets. >> >> What do you think? >> >> >> Thanks >> kz >> >> On Wed, Dec 2, 2015 at 11:26 AM, Mark Davis ?? >> wrote: >> >>> Makes sense now. Root is empty, which is correct. Based on what you >>> said, I suspect that set1 is being suppressed because it is empty. Probably >>> needs a tweak to the LDML2ICU converter to have a special value for "no >>> rules" to show up in ICU. Mind filing a ticket for that? >>> >>> In the meantime?*as a workaround*?if there is no set1 in the ICU data >>> you can assume it is an empty rule set. >>> >>> Mark >>> >>> On Wed, Dec 2, 2015 at 7:44 PM, kz wrote: >>> >>>> My mistake; I meant "variable day periods" when I said "day periods". >>>> >>>> The lack of localized strings is not the problem; what I'm having >>>> trouble with are the rules >>>> >>>> for when to use what string. Root has a presence in that file but no rules >>>> are defined. In ICU data, this results in root pointing to "set1" but set1 >>>> doesn't exist. >>>> >>>> Is this a mistake, or should I hard code in how to handle root? >>>> >>>> >>>> Thanks >>>> kz >>>> >>>> On Wed, Dec 2, 2015 at 2:55 AM, Mark Davis ?? >>>> wrote: >>>> >>>>> There are day periods in CLDR for root; I see: >>>>> >>>>> >>>>> >>>>> >>>>> AM >>>>> PM >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> path="../../dayPeriodContext[@type='format']/dayPeriodWidth[@type='abbreviated']"/> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> If a locale has no variable day periods, then AM/PM is the default. We >>>>> should make that clearer in the spec. Would you mind filing a ticket? >>>>> >>>>> For more info, see >>>>> http://unicode.org/reports/tr35/tr35-dates.html#Day_Period_Rules >>>>> >>>>> Mark >>>>> >>>>> On Wed, Dec 2, 2015 at 1:41 AM, kz wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> In the CLDR data for day periods >>>>>> , >>>>>> there's an entry for root but the content is commented out. This results in >>>>>> root having a presence but not given a rule set in ICU data for day >>>>>> periods >>>>>> >>>>>> . >>>>>> >>>>>> What should be the proper formatting behavior for the root locale for >>>>>> the day period character 'B'? Is an empty string acceptable? >>>>>> >>>>>> >>>>>> Thanks >>>>>> kz >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> CLDR-Users mailing list >>>>>> CLDR-Users at unicode.org >>>>>> http://unicode.org/mailman/listinfo/cldr-users >>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kazede at google.com Wed Dec 2 20:47:16 2015 From: kazede at google.com (kz) Date: Wed, 2 Dec 2015 18:47:16 -0800 Subject: Day period rules for locale ee Message-ID: Locale ee has this rule for night1: The interval wraps around midnight and continues into the next day. Is this allowed? The spec doesn't seem to forbid this, although it seems to imply that when the interval goes over midnight, it's broken into two rules (and most of the rule sets do this). >From the spec: dayPeriodRules with the same type must not be adjacent, except when they > meet at 24:00/00:00. Example: > > Is this a mistake? Or are both ways acceptable? Thanks kz -------------- next part -------------- An HTML attachment was scrubbed... URL: From cameron at lumoslabs.com Wed Dec 16 18:25:21 2015 From: cameron at lumoslabs.com (Cameron Dutro) Date: Wed, 16 Dec 2015 16:25:21 -0800 Subject: Transform Rule Syntax Message-ID: Hey cldr-users, I'm working with the CLDR transform rules and finding myself flummoxed. Specifically I'm looking at this rule in the es-es_FONIPA transform rule set. In this rule, we see what appears to be a Unicode set or character class from a regular expression: [-\ ] Either way, this does not appear to be valid syntax. Hyphens are used in character classes to denote ranges of characters, for example [a-z]. Literal hyphens must be escaped. The hyphen in question is neither part of a range nor escaped. Why is this? Finally, it appears the character class contains an escaped space character. Space characters are not required to be escaped in character classes. My suspicion is that this syntax is to be treated in a special way since it is used in the context of transformation rules. Please let me know if this is the case. I have been unable to find any documentation regarding the special treatment of hyphens in UTS #35 or other documents. Thanks! -Cameron -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Wed Dec 16 18:54:31 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Thu, 17 Dec 2015 01:54:31 +0100 Subject: Transform Rule Syntax In-Reply-To: References: Message-ID: When a dash-hyphen "-" appears as the first character within an inclusive (or negative) character class, just after "[" (or after "[^" in a negative class), it does not denote a range separator, but itself literally as being part of the inclusive character class (or being excludedfrom the negative class). This is how most regexp engines treat it, and you don't need to escape it (with a "\"). So "[-\ ]" is the character class containing only the dash-hyphen and the space (which needs to be escaped in CLDR rules because whitespaces are relaxed, as you noted), and it has NO range. Cet e-mail a ?t? envoy? depuis un ordinateur prot?g? par Avast. www.avast.com <#DDB4FAA8-2DD7-40BB-A1B8-4E2AA1F9FDF2> 2015-12-17 1:25 GMT+01:00 Cameron Dutro : > Hey cldr-users, > > I'm working with the CLDR transform rules and finding myself flummoxed. > Specifically I'm looking at this rule > > in the es-es_FONIPA transform rule set. In this rule, we see what appears > to be a Unicode set or character class from a regular expression: [-\ ] > Either way, this does not appear to be valid syntax. Hyphens are used in > character classes to denote ranges of characters, for example [a-z]. > Literal hyphens must be escaped. The hyphen in question is neither part of > a range nor escaped. Why is this? Finally, it appears the character class > contains an escaped space character. Space characters are not required to > be escaped in character classes. > > My suspicion is that this syntax is to be treated in a special way since > it is used in the context of transformation rules. Please let me know if > this is the case. I have been unable to find any documentation regarding > the special treatment of hyphens in UTS #35 or other documents. > > Thanks! > > -Cameron > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cameron at lumoslabs.com Thu Dec 17 13:19:18 2015 From: cameron at lumoslabs.com (Cameron Dutro) Date: Thu, 17 Dec 2015 11:19:18 -0800 Subject: Transform Rule Syntax In-Reply-To: References: Message-ID: Ah wonderful, thanks Philippe. That's something about regular expressions I didn't know, but I was able to verify in several programming languages. Happy holidays! -Cameron On Wed, Dec 16, 2015 at 4:54 PM, Philippe Verdy wrote: > When a dash-hyphen "-" appears as the first character within an inclusive > (or negative) character class, just after "[" (or after "[^" in a negative > class), it does not denote a range separator, but itself literally as being > part of the inclusive character class (or being excludedfrom the negative > class). > This is how most regexp engines treat it, and you don't need to escape it > (with a "\"). > > So "[-\ ]" is the character class containing only the dash-hyphen and the > space (which needs to be escaped in CLDR rules because whitespaces are > relaxed, as you noted), and it has NO range. > > Cet > e-mail a ?t? envoy? depuis un ordinateur prot?g? par Avast. > www.avast.com > > <#151ad6eae99ea346_DDB4FAA8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > > 2015-12-17 1:25 GMT+01:00 Cameron Dutro : > >> Hey cldr-users, >> >> I'm working with the CLDR transform rules and finding myself flummoxed. >> Specifically I'm looking at this rule >> >> in the es-es_FONIPA transform rule set. In this rule, we see what appears >> to be a Unicode set or character class from a regular expression: [-\ ] >> Either way, this does not appear to be valid syntax. Hyphens are used in >> character classes to denote ranges of characters, for example [a-z]. >> Literal hyphens must be escaped. The hyphen in question is neither part of >> a range nor escaped. Why is this? Finally, it appears the character class >> contains an escaped space character. Space characters are not required to >> be escaped in character classes. >> >> My suspicion is that this syntax is to be treated in a special way since >> it is used in the context of transformation rules. Please let me know if >> this is the case. I have been unable to find any documentation regarding >> the special treatment of hyphens in UTS #35 or other documents. >> >> Thanks! >> >> -Cameron >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cameron at lumoslabs.com Sat Dec 19 02:23:43 2015 From: cameron at lumoslabs.com (Cameron Dutro) Date: Sat, 19 Dec 2015 00:23:43 -0800 Subject: Another Transform Rule Syntax Question Message-ID: Hey cldr-users, Could someone explain the meaning of the following transform rule? :: [:Latin:] fullwidth-halfwidth (); It looks like a named transform rule combined with a filter rule, but I can't find any mention of this specific syntax in UTS 35. Thanks! -Cameron -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Sat Dec 19 03:32:36 2015 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Sat, 19 Dec 2015 10:32:36 +0100 Subject: Another Transform Rule Syntax Question In-Reply-To: References: Message-ID: That's an omission from 10.3.6 http://unicode.org/reports/tr35/tr35-general.html#Transform_Rules :: [:Latin:] fullwidth-halfwidth (); [:Latin:] is a filter. This has the effect of only applying the rule " fullwidth-halfwidth" to characters matching [:Latin:] The text discusses filters and rules, but not the combination. Can you please file a ticket for this? Mark On Sat, Dec 19, 2015 at 9:23 AM, Cameron Dutro wrote: > Hey cldr-users, > > Could someone explain the meaning of the following transform rule? > > :: [:Latin:] fullwidth-halfwidth (); > > It looks like a named transform rule combined with a filter rule, but I > can't find any mention of this specific syntax in UTS 35. > > Thanks! > > -Cameron > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cameron at lumoslabs.com Sun Dec 20 00:20:54 2015 From: cameron at lumoslabs.com (Cameron Dutro) Date: Sat, 19 Dec 2015 22:20:54 -0800 Subject: Another Transform Rule Syntax Question In-Reply-To: References: Message-ID: Hey Mark, Wonderful, thank you for the clarification :) -Cameron On Sat, Dec 19, 2015 at 1:32 AM, Mark Davis ?? wrote: > That's an omission from 10.3.6 > http://unicode.org/reports/tr35/tr35-general.html#Transform_Rules > > :: [:Latin:] fullwidth-halfwidth (); > > [:Latin:] is a filter. This has the effect of only applying the rule " > fullwidth-halfwidth" to characters matching [:Latin:] > > The text discusses filters and rules, but not the combination. Can you > please file a ticket for this? > > > > Mark > > On Sat, Dec 19, 2015 at 9:23 AM, Cameron Dutro > wrote: > >> Hey cldr-users, >> >> Could someone explain the meaning of the following transform rule? >> >> :: [:Latin:] fullwidth-halfwidth (); >> >> It looks like a named transform rule combined with a filter rule, but I >> can't find any mention of this specific syntax in UTS 35. >> >> Thanks! >> >> -Cameron >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dzo at bisharat.net Tue Dec 29 09:59:46 2015 From: dzo at bisharat.net (Don Osborn) Date: Tue, 29 Dec 2015 10:59:46 -0500 Subject: iPhone's other languages list from CLDR? Message-ID: <5682ADF2.9000103@bisharat.net> Greetings, Does anyone know if Apple relied on CLDR for its long list of "other languages" (~240 by my estimation) on iPhone6c(plus)? Apologies that this is off-topic (replies offline probably best). The list of "other languages" - not the "iPhone languages" fully supported in iOS - is impressive, though looking at some of the 74 African languages* included (by my count) it seems most are not supported beyond calendars. Charles Riley suggested offline that some aspects of the list make it appear that it lists what's on CLDR. However there are some languages one would expect to see that are not there (Hausa, Amharic, among others). Really interested to know more about Apple's thinking and methods on this. TIA for any info or leads. Best wishes to all for the New Year 2016. Don Osborn * http://niamey.blogspot.com/2015/12/list-of-african-languages-on-iphone6s.html From shervinafshar at gmail.com Tue Dec 29 10:30:25 2015 From: shervinafshar at gmail.com (Shervin Afshar) Date: Tue, 29 Dec 2015 08:30:25 -0800 Subject: iPhone's other languages list from CLDR? In-Reply-To: <5682ADF2.9000103@bisharat.net> References: <5682ADF2.9000103@bisharat.net> Message-ID: Hello, Those are names of languages available as part of data for CLDR-supported locales. The mere fact that CLDR has this data doesn't necessarily mean that that language is a CLDR locale; i.e. having all sort of other information (date/time format, numbers, etc.) beyond these names. Here is the language name for Hausa as appearing in data file for German: http://unicode.org/cldr/trac/browser/trunk/common/main/de.xml#L228 Hope this helps. Best Regards, Shervin On Dec 29, 2015 8:01 AM, "Don Osborn" wrote: > Greetings, Does anyone know if Apple relied on CLDR for its long list of > "other languages" (~240 by my estimation) on iPhone6c(plus)? Apologies that > this is off-topic (replies offline probably best). > > The list of "other languages" - not the "iPhone languages" fully supported > in iOS - is impressive, though looking at some of the 74 African languages* > included (by my count) it seems most are not supported beyond calendars. > Charles Riley suggested offline that some aspects of the list make it > appear that it lists what's on CLDR. However there are some languages one > would expect to see that are not there (Hausa, Amharic, among others). > > Really interested to know more about Apple's thinking and methods on this. > TIA for any info or leads. > > Best wishes to all for the New Year 2016. > > Don Osborn > > * > http://niamey.blogspot.com/2015/12/list-of-african-languages-on-iphone6s.html > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dzo at bisharat.net Wed Dec 30 13:42:00 2015 From: dzo at bisharat.net (Don Osborn) Date: Wed, 30 Dec 2015 14:42:00 -0500 Subject: iPhone's other languages list from CLDR? In-Reply-To: References: <5682ADF2.9000103@bisharat.net> Message-ID: <56843388.70604@bisharat.net> Thank you Shervin and Steven for these responses. Steven, This list with notes is especially helpful - am still digesting. Will compare in more detail with the iPhone list. That exercise would be more rewarding in a cross-operating system comparison - if there were a clear list of what's on Android. Also hoping for more info from someone at Apple who is well-placed to discuss their approach. Best wishes for the New Year 2016! Don On 12/29/2015 2:44 PM, Steven R. Loomis wrote: > ? ICU (which uses CLDR) is noted in iOS and MacOSX?s license information > ? ICU open source (for OSX ) is linked here - > http://opensource.apple.com/source/ICU/ICU-551.41/ > ? default calendar information in CLDR is by region and not by language > ? Many of these locales listed are in CLDR. I printed out a list of > all locales that are in Africa (002): > ( generator source here > https://gist.github.com/srl295/f87d06a1405a23e85827 ) . I did not > correlate this to the iphone 6 list exactly but it seems many, but not > all, are actually CLDR locales. > > > > So these are locales of Africa which have content in CLDR: > > Afar (Djibouti) - aa-DJ [SEED] > Afar (Eritrea) - aa-ER [SEED] > Afar (Ethiopia) - aa-ET [SEED] > Afrikaans (Namibia) - af-NA > Afrikaans (South Africa) - af-ZA > Aghem (Cameroon) - agq-CM > Akan (Ghana) - ak-GH > Akoose (Cameroon) - bss-CM [SEED] > Amharic (Ethiopia) - am-ET > Arabic (Algeria) - ar-DZ > Arabic (Chad) - ar-TD > Arabic (Comoros) - ar-KM > Arabic (Djibouti) - ar-DJ > Arabic (Egypt) - ar-EG > Arabic (Eritrea) - ar-ER > Arabic (Libya) - ar-LY > Arabic (Mauritania) - ar-MR > Arabic (Morocco) - ar-MA > Arabic (Somalia) - ar-SO > Arabic (South Sudan) - ar-SS > Arabic (Sudan) - ar-SD > Arabic (Tunisia) - ar-TN > Arabic (Western Sahara) - ar-EH > Asu (Tanzania) - asa-TZ > Atsam (Nigeria) - cch-NG [SEED] > Bafia (Cameroon) - ksf-CM > Bambara (Mali) - bm-ML > Bambara (N?Ko, Mali) - bm-Nkoo-ML [SEED] > Basaa (Cameroon) - bas-CM > Bemba (Zambia) - bem-ZM > Bena (Tanzania) - bez-TZ > Blin (Eritrea) - byn-ER [SEED] > Central Atlas Tamazight (Morocco) - tzm-MA > Chiga (Uganda) - cgg-UG > Duala (Cameroon) - dua-CM > Embu (Kenya) - ebu-KE > English (Botswana) - en-BW > English (Burundi) - en-BI > English (Cameroon) - en-CM > English (Eritrea) - en-ER > English (Gambia) - en-GM > English (Ghana) - en-GH > English (Kenya) - en-KE > English (Lesotho) - en-LS > English (Liberia) - en-LR > English (Madagascar) - en-MG > English (Malawi) - en-MW > English (Mauritius) - en-MU > English (Namibia) - en-NA > English (Nigeria) - en-NG > English (Rwanda) - en-RW > English (Seychelles) - en-SC > English (Sierra Leone) - en-SL > English (South Africa) - en-ZA > English (South Sudan) - en-SS > English (St. Helena) - en-SH > English (Sudan) - en-SD > English (Swaziland) - en-SZ > English (Tanzania) - en-TZ > English (Uganda) - en-UG > English (Zambia) - en-ZM > English (Zimbabwe) - en-ZW > Ewe (Ghana) - ee-GH > Ewe (Togo) - ee-TG > Ewondo (Cameroon) - ewo-CM > French (Algeria) - fr-DZ > French (Benin) - fr-BJ > French (Burkina Faso) - fr-BF > French (Burundi) - fr-BI > French (Cameroon) - fr-CM > French (Central African Republic) - fr-CF > French (Chad) - fr-TD > French (Comoros) - fr-KM > French (Congo - Brazzaville) - fr-CG > French (Congo - Kinshasa) - fr-CD > French (C?te d?Ivoire) - fr-CI > French (Djibouti) - fr-DJ > French (Equatorial Guinea) - fr-GQ > French (Gabon) - fr-GA > French (Guinea) - fr-GN > French (Madagascar) - fr-MG > French (Mali) - fr-ML > French (Mauritania) - fr-MR > French (Mauritius) - fr-MU > French (Mayotte) - fr-YT > French (Morocco) - fr-MA > French (Niger) - fr-NE > French (R?union) - fr-RE > French (Rwanda) - fr-RW > French (Senegal) - fr-SN > French (Seychelles) - fr-SC > French (Togo) - fr-TG > French (Tunisia) - fr-TN > Fulah (Cameroon) - ff-CM > Fulah (Guinea) - ff-GN > Fulah (Mauritania) - ff-MR > Fulah (Senegal) - ff-SN > Ga (Ghana) - gaa-GH [SEED] > Ganda (Uganda) - lg-UG > Geez (Eritrea) - gez-ER [SEED] > Geez (Ethiopia) - gez-ET [SEED] > Gusii (Kenya) - guz-KE > Hausa (Arabic, Nigeria) - ha-Arab-NG [SEED] > Hausa (Arabic, Sudan) - ha-Arab-SD [SEED] > Hausa (Ghana) - ha-GH > Hausa (Niger) - ha-NE > Hausa (Nigeria) - ha-NG > Igbo (Nigeria) - ig-NG > Jju (Nigeria) - kaj-NG [SEED] > Jola-Fonyi (Senegal) - dyo-SN > Kabuverdianu (Cape Verde) - kea-CV > Kabyle (Algeria) - kab-DZ > Kako (Cameroon) - kkj-CM > Kalenjin (Kenya) - kln-KE > Kamba (Kenya) - kam-KE > Kenyang (Cameroon) - ken-CM [SEED] > Kikuyu (Kenya) - ki-KE > Kinyarwanda (Rwanda) - rw-RW > Koyraboro Senni (Mali) - ses-ML > Koyra Chiini (Mali) - khq-ML > Kpelle (Guinea) - kpe-GN [SEED] > Kpelle (Liberia) - kpe-LR [SEED] > Kwasio (Cameroon) - nmg-CM > Langi (Tanzania) - lag-TZ > Lingala (Angola) - ln-AO > Lingala (Central African Republic) - ln-CF > Lingala (Congo - Brazzaville) - ln-CG > Lingala (Congo - Kinshasa) - ln-CD > Luba-Katanga (Congo - Kinshasa) - lu-CD > Luo (Kenya) - luo-KE > Luyia (Kenya) - luy-KE > Machame (Tanzania) - jmc-TZ > Makhuwa-Meetto (Mozambique) - mgh-MZ > Makonde (Tanzania) - kde-TZ > Malagasy (Madagascar) - mg-MG > Masai (Kenya) - mas-KE > Masai (Tanzania) - mas-TZ > Meru (Kenya) - mer-KE > Meta? (Cameroon) - mgo-CM > Morisyen (Mauritius) - mfe-MU > Mundang (Cameroon) - mua-CM > Nama (Namibia) - naq-NA > Ngiemboon (Cameroon) - nnh-CM > Ngomba (Cameroon) - jgo-CM > Northern Sotho (South Africa) - nso-ZA [SEED] > North Ndebele (Zimbabwe) - nd-ZW > Nuer (South Sudan) - nus-SS > Nyanja (Malawi) - ny-MW [SEED] > Nyankole (Uganda) - nyn-UG > N?Ko (Guinea) - nqo-GN [SEED] > Oromo (Ethiopia) - om-ET > Oromo (Kenya) - om-KE > Portuguese (Angola) - pt-AO > Portuguese (Cape Verde) - pt-CV > Portuguese (Guinea-Bissau) - pt-GW > Portuguese (Mozambique) - pt-MZ > Portuguese (S?o Tom? & Pr?ncipe) - pt-ST > Rombo (Tanzania) - rof-TZ > Rundi (Burundi) - rn-BI > Rwa (Tanzania) - rwk-TZ > Saho (Eritrea) - ssy-ER [SEED] > Samburu (Kenya) - saq-KE > Sango (Central African Republic) - sg-CF > Sangu (Tanzania) - sbp-TZ > Sena (Mozambique) - seh-MZ > Shambala (Tanzania) - ksb-TZ > Shona (Zimbabwe) - sn-ZW > Sidamo (Ethiopia) - sid-ET [SEED] > Soga (Uganda) - xog-UG > Somali (Djibouti) - so-DJ > Somali (Ethiopia) - so-ET > Somali (Kenya) - so-KE > Somali (Somalia) - so-SO > Southern Sotho (Lesotho) - st-LS [SEED] > Southern Sotho (South Africa) - st-ZA [SEED] > South Ndebele (South Africa) - nr-ZA [SEED] > Spanish (Canary Islands) - es-IC > Spanish (Ceuta & Melilla) - es-EA > Spanish (Equatorial Guinea) - es-GQ > Standard Moroccan Tamazight (Morocco) - zgh-MA > Swahili (Congo - Kinshasa) - sw-CD > Swahili (Kenya) - sw-KE > Swahili (Tanzania) - sw-TZ > Swahili (Uganda) - sw-UG > Swati (South Africa) - ss-ZA [SEED] > Swati (Swaziland) - ss-SZ [SEED] > Tachelhit (Latin, Morocco) - shi-Latn-MA > Tachelhit (Tifinagh, Morocco) - shi-Tfng-MA > Taita (Kenya) - dav-KE > Tasawaq (Niger) - twq-NE > Teso (Kenya) - teo-KE > Teso (Uganda) - teo-UG > Tigre (Eritrea) - tig-ER [SEED] > Tigrinya (Eritrea) - ti-ER > Tigrinya (Ethiopia) - ti-ET > Tsonga (South Africa) - ts-ZA [SEED] > Tswana (Botswana) - tn-BW [SEED] > Tswana (South Africa) - tn-ZA [SEED] > Tyap (Nigeria) - kcg-NG [SEED] > Vai (Latin, Liberia) - vai-Latn-LR > Vai (Vai, Liberia) - vai-Vaii-LR > Venda (South Africa) - ve-ZA [SEED] > Vunjo (Tanzania) - vun-TZ > Wolaytta (Ethiopia) - wal-ET [SEED] > Wolof (Senegal) - wo-SN [SEED] > Xhosa (South Africa) - xh-ZA [SEED] > Yangben (Cameroon) - yav-CM > Yoruba (Benin) - yo-BJ > Yoruba (Nigeria) - yo-NG > Zarma (Niger) - dje-NE > Zulu (South Africa) - zu-ZA > > > >> El dic 29, 2015, a las 8:30 AM, Shervin Afshar >> > escribi?: >> >> Hello, >> >> Those are names of languages available as part of data for >> CLDR-supported locales. The mere fact that CLDR has this data doesn't >> necessarily mean that that language is a CLDR locale; i.e. having all >> sort of other information (date/time format, numbers, etc.) beyond >> these names. Here is the language name for Hausa as appearing in data >> file for German: >> >> http://unicode.org/cldr/trac/browser/trunk/common/main/de.xml#L228 >> >> Hope this helps. >> >> Best Regards, >> Shervin >> >> On Dec 29, 2015 8:01 AM, "Don Osborn" > > wrote: >> >> Greetings, Does anyone know if Apple relied on CLDR for its long >> list of "other languages" (~240 by my estimation) on >> iPhone6c(plus)? Apologies that this is off-topic (replies offline >> probably best). >> >> The list of "other languages" - not the "iPhone languages" fully >> supported in iOS - is impressive, though looking at some of the >> 74 African languages* included (by my count) it seems most are >> not supported beyond calendars. Charles Riley suggested offline >> that some aspects of the list make it appear that it lists what's >> on CLDR. However there are some languages one would expect to >> see that are not there (Hausa, Amharic, among others). >> >> Really interested to know more about Apple's thinking and methods >> on this. TIA for any info or leads. >> >> Best wishes to all for the New Year 2016. >> >> Don Osborn >> >> * >> http://niamey.blogspot.com/2015/12/list-of-african-languages-on-iphone6s.html >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: