From emmo at us.ibm.com Wed Apr 1 08:00:00 2015 From: emmo at us.ibm.com (John Emmons) Date: Wed, 1 Apr 2015 08:00:00 -0500 Subject: "no inheritance marker" In-Reply-To: <551B35C7.5000500@oracle.com> References: <551B35C7.5000500@oracle.com> Message-ID: So in the case of your example, ( en_GB for America/Los_Angeles ) - you hit the no-inheritance marker, which means there is no recognized short abbreviation for the metazone in this locale. So per the LDML specification, the value should default to the localized GMT format ( i.e. "GMT-08:00" during standard time, or "GMT-07:00" during daylight savings ). Regards, John C. Emmons Globalization Architect & Unicode CLDR TC Chairman IBM Software Group Internet: emmo at us.ibm.com From: Naoto Sato To: cldr-users at unicode.org Date: 03/31/2015 07:06 PM Subject: "no inheritance marker" Sent by: "CLDR-Users" Hello, I have a question on this "no inheritance marker", used in the short form of time zone "metazone" names. In LDML spec, it reads: --- If a given short metazone form is known NOT to be understood in a given locale and the parent locale has this value such that it would normally be inherited, the inheritance of this value can be explicitly disabled by use of the 'no inheritance marker' as the value, which is 3 simultaneous empty set characters ( U+2205 ). [1] --- So if an app tries to display the short names with this marker, what should they actually be? For example, in case of "en_GB" locale, lookup for "America_Pacific" short names ends up with this "U+2205U+2205U+2205" marker in "en_001" locale, which disables inheriting "PT"/"PST"/"PDT" in "en". Naoto [1] http://www.unicode.org/reports/tr35/tr35-39/tr35-dates.html#Metazone_Names _______________________________________________ CLDR-Users mailing list CLDR-Users at unicode.org http://unicode.org/mailman/listinfo/cldr-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From naoto.sato at oracle.com Wed Apr 1 10:27:18 2015 From: naoto.sato at oracle.com (Naoto Sato) Date: Wed, 01 Apr 2015 08:27:18 -0700 Subject: "no inheritance marker" In-Reply-To: References: <551B35C7.5000500@oracle.com> Message-ID: <551C0E56.5050509@oracle.com> Thanks, John. That makes sense. Naoto On 4/1/15 6:00 AM, John Emmons wrote: > So in the case of your example, ( en_GB for America/Los_Angeles ) - you > hit the no-inheritance marker, which means there is no recognized short > abbreviation for the metazone in this locale. > > So per the LDML specification, the value should default to the localized > GMT format ( i.e. "GMT-08:00" during standard time, or "GMT-07:00" > during daylight savings ). > > > Regards, > > John C. Emmons > Globalization Architect & Unicode CLDR TC Chairman > IBM Software Group > Internet: emmo at us.ibm.com > > > > > From: Naoto Sato > To: cldr-users at unicode.org > Date: 03/31/2015 07:06 PM > Subject: "no inheritance marker" > Sent by: "CLDR-Users" > ------------------------------------------------------------------------ > > > > Hello, > > I have a question on this "no inheritance marker", used in the short > form of time zone "metazone" names. In LDML spec, it reads: > > --- > If a given short metazone form is known NOT to be understood in a given > locale and the parent locale has this value such that it would normally > be inherited, the inheritance of this value can be explicitly disabled > by use of the 'no inheritance marker' as the value, which is 3 > simultaneous empty set characters ( U+2205 ). [1] > --- > > So if an app tries to display the short names with this marker, what > should they actually be? > > For example, in case of "en_GB" locale, lookup for "America_Pacific" > short names ends up with this "U+2205U+2205U+2205" marker in "en_001" > locale, which disables inheriting "PT"/"PST"/"PDT" in "en". > > Naoto > > [1] > http://www.unicode.org/reports/tr35/tr35-39/tr35-dates.html#Metazone_Names > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > From rick at unicode.org Thu Apr 2 14:36:06 2015 From: rick at unicode.org (Rick McGowan) Date: Thu, 02 Apr 2015 12:36:06 -0700 Subject: CLDR 27.0.1 Maintenance Release Message-ID: <551D9A26.2070107@unicode.org> Hello everyone, Unicode CLDR 27.0.1 is a very small maintenance release that is intended to fix some specific problems that were found shortly after CLDR 27 was published. If you have already downloaded version 27 and are not impacted by any of the specific issues mentioned in the release note, then there is no specific need to upgrade from 27 to 27.0.1. All data in common/main is identical between version 27 and version 27.0.1. Further information can be found on the release page: http://cldr.unicode.org/index/downloads/cldr-27#27-0-1 Note: this was finalized late on March 31, but rather than announce on April Fool's day we waited overnight... :-) From markus.icu at gmail.com Fri Apr 3 15:59:50 2015 From: markus.icu at gmail.com (Markus Scherer) Date: Fri, 3 Apr 2015 13:59:50 -0700 Subject: CLDR proposal: Move collator CLDR settings into ICU format Message-ID: Dear CLDR team & users, I would like to propose the following spec & data changes for CLDR 28. Please provide *feedback by next Thursday, 2015-apr-09*. CLDR ticket: http://unicode.org/cldr/trac/ticket/8289 Proposal: - Deprecate XML elements under : import, settings, suppress_contractions, optimize together with their specific attributes - Change the CLDR collation tailorings data to replace the use of these XML elements with equivalent ICU syntax For example: [?-? ?-? ? ? ? ? ?] -> [caseFirst upper] [import da-u-co-standard] [suppressContractions [?-? ?-? ? ? ? ? ?]] [normalization on][alternate shifted][reorder Thai] Rationale: The LDML collation spec provides for two ways for parametric settings and special rules in collation tailoring data: via special XML elements, or as part of the ICU syntax rules in . See the underlined elements in the following line copied from the spec: Two ways of doing the same thing lead to inconsistencies. CLDR tools and tests would not have to convert these elements to ICU syntax any more. The spec would be simpler. This change makes it clearer that the settings get *import*ed too, not just the rules. Note that CLDR 24 deprecated the XML syntax for rules and replaced the XML syntax rules data with equivalent ICU syntax rules. Sincerely, markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Sat Apr 4 01:05:07 2015 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Sat, 4 Apr 2015 08:05:07 +0200 Subject: CLDR proposal: Move collator CLDR settings into ICU format In-Reply-To: References: Message-ID: I'm strongly in favor of these changes. Mark *? Il meglio ? l?inimico del bene ?* On Fri, Apr 3, 2015 at 10:59 PM, Markus Scherer wrote: > Dear CLDR team & users, > > I would like to propose the following spec & data changes for CLDR 28. > Please provide *feedback by next Thursday, 2015-apr-09*. > CLDR ticket: http://unicode.org/cldr/trac/ticket/8289 > > Proposal: > - Deprecate XML elements under : > import, settings, suppress_contractions, optimize > together with their specific attributes > - Change the CLDR collation tailorings data to > replace the use of these XML elements with equivalent ICU syntax > > For example: > > > > [?-? ?-? ? ? ? ? ?] > > > -> > > [caseFirst upper] > [import da-u-co-standard] > [suppressContractions [?-? ?-? ? ? ? ? ?]] > [normalization on][alternate shifted][reorder Thai] > > Rationale: > > The LDML collation spec > > provides for two ways for parametric settings and special rules in > collation tailoring data: via special XML elements, or as part of the ICU > syntax rules in . See the underlined elements in > the following line copied from the spec: > > suppress_contractions?, optimize?*, cr*, special*)) > > > Two ways of doing the same thing lead to inconsistencies. > > CLDR tools and tests would not have to convert these elements to ICU > syntax any more. > > The spec would be simpler. > > This change makes it clearer that the settings get *import*ed too, not > just the rules. > > Note that CLDR 24 > > deprecated the XML syntax for rules and replaced the XML syntax rules data > with equivalent ICU syntax rules. > > Sincerely, > markus > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Sat Apr 4 02:05:56 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sat, 4 Apr 2015 09:05:56 +0200 Subject: CLDR proposal: Move collator CLDR settings into ICU format In-Reply-To: References: Message-ID: May be there's a way to use (or create) a converter tool that will automatically generate an equivalent XML version for at least some versions (allow transitions). These generated files would be explicitly marked as "derived" (so that they are no longer directly supported as references, only provided to be informative). Or put the sources of such conversion tool in an opensource repository (should compile at least on Linux, possibly on Windows too, or written in a portable and widely used language available across platforms such as Javascript.or Java). This open-sourced tool does not need to be optimized (this is a one-shot conversion), it should be demonstrative, so its sources should remain as simple as possible without lots of dependencies with various external libraries or API's and complex data structures. In fact this source can be a useful informative companion of the specifications (often it is just simpler and faster to look at the sources instead of deciphering natural English text and its ambiguities that occur too easily). But this source can also give programming hints to implementers about how to parse correctly the reference data for their applications, even if in fact they will use another appropriate internal format for betrer performance at runtime : collation in applications is a critical functionality where performance is highly desired, in order to efficiently manage large volumes of text, for example in plain text searches or when sorting query result sets, so they in fact do not even use the ICU public syntax or XML syntax internally using parsers repeatedly). 2015-04-04 8:05 GMT+02:00 Mark Davis [image: ?]? : > I'm strongly in favor of these changes. > > > Mark > > *? Il meglio ? l?inimico del bene ?* > > On Fri, Apr 3, 2015 at 10:59 PM, Markus Scherer > wrote: > >> Dear CLDR team & users, >> >> I would like to propose the following spec & data changes for CLDR 28. >> Please provide *feedback by next Thursday, 2015-apr-09*. >> CLDR ticket: http://unicode.org/cldr/trac/ticket/8289 >> >> Proposal: >> - Deprecate XML elements under : >> import, settings, suppress_contractions, optimize >> together with their specific attributes >> - Change the CLDR collation tailorings data to >> replace the use of these XML elements with equivalent ICU syntax >> >> For example: >> >> >> >> [?-? ?-? ? ? ? ? ?] >> >> >> -> >> >> [caseFirst upper] >> [import da-u-co-standard] >> [suppressContractions [?-? ?-? ? ? ? ? ?]] >> [normalization on][alternate shifted][reorder Thai] >> >> Rationale: >> >> The LDML collation spec >> >> provides for two ways for parametric settings and special rules in >> collation tailoring data: via special XML elements, or as part of the ICU >> syntax rules in . See the underlined elements in >> the following line copied from the spec: >> >> > suppress_contractions?, optimize?*, cr*, special*)) > >> >> Two ways of doing the same thing lead to inconsistencies. >> >> CLDR tools and tests would not have to convert these elements to ICU >> syntax any more. >> >> The spec would be simpler. >> >> This change makes it clearer that the settings get *import*ed too, not >> just the rules. >> >> Note that CLDR 24 >> >> deprecated the XML syntax for rules and replaced the XML syntax rules data >> with equivalent ICU syntax rules. >> >> Sincerely, >> markus >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> >> > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: emoji_u2615.png Type: image/png Size: 1890 bytes Desc: not available URL: From verdy_p at wanadoo.fr Wed Apr 15 08:42:35 2015 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Wed, 15 Apr 2015 15:42:35 +0200 Subject: alternate formatting data for algorithmic number systems when they fallback to a decimal system Message-ID: For now the CLDR data for algorithmic number systems are using RBNF rules when this is possible but the last mapping when this does not work is to use a specific decimal format (starting by 0 or #). One problem is that this decimal format is the same independantly of the actual locale (language or number style in that language) for which the number system has been mapped. Different locales using the same number system have in fact different rules for formatting numbers when they are forced to use a fallback to a decimal system. These fallbacks are typically currently specified as the substitution "=#,##0.00=", which is clearly wrong (e.g. for Traditional Tamil): these formats are assuming in fact a specific language, and it is not the same for all locales using this number system. I propose deprecating these mappings and instead just set them to the substitution "==" meaning that it will use the format for the decimal system which will be used instead. Note that when using locale resolution mechanisms to find the appropriate number system to use for formatting numbers, it will (if you don't care about it) map it again to the same traditional algorithmic system so this would recurse infinitely: - the "==" substition must look for a mapping for the locale in the *default* decimal number variant, - but it could also map to the "native" decimal number variant mapped for that locale (replacing the "traditional" variant which is algorithmic, using the substitution "=-native=", so that native digits will still be used (instead of just the Latin digits, when these locales are using by default the Latin digits, and not the native ones) With this proposal, the CLDR data for number systems would no longer contain any data using "=#...=" or "=0...=" substitutions; the traditional systems would still be able to format all numbers even those they do not support internally, using the native digits, and the appropriate separators (decimal, grouping), and appropriate grouping. One way to implement it however does not require changing the CLDR data: the implementation can autodetect the "=#...=" or "=0...=" substition rules found in algorithmic number systems, consider them all equivalent to just "==": it would first try to map the locale to a "native" decimal variant, and use it (note that the "native" variant already has fallbacks for all locales to use the default decimal variant: this is the case for most non Indian locales that are alone to have "native" mappings). In summary the resolution for algorithmic systems would use the following path: - use "traditional" rules if it works (it uses the RBNF data) - when it finds a "==" substitution (or any "=0...=" or "=#...=" substitution), find the decimal number system in the "native" variant, and format numbers in that system, and use the appropriate separators and groupings - if there's no "native" variant mapped for that locale, it will fallback to use the default system (in CLDR data charts, we see that it is the case because there's an entry mapping "All other locales" to the Latin number system which will also use the same separators nad groupings. This will be a major improvement for number systems used in lots of languages (including Latin-written languages) such as the "roman" number system. One more note: The East-Asian scripts in traditional scripts prefer to use their own algorithmic system which cannot format all numbers. As they are rendered using sinographic squares, the fallback "native" digits should use the "fullwidth" variant: this can be specific using "=-native=" or more specifically the "=-fullwidth=". Note that for now no "==" substituon rule can start by a minus sign ("-"), it must only be: - a valid ruleset name (starting by % or %%), or - a decimal format (starting by "0" or "#", that I want to deprecate), or - empty (but the current implementation in ICU creates an infinite loop, or only use Basic Latin decimal digits in a fixed number format, independant of the locale) So there absolutely no conflict when we use a "==" substitution rule starting by minus (-) to mean that it should use another specified number system (such as "native" or "fullwidth" or any specific non-algorithmic number system) which is named just after this minus sign. ---- Alternatively, the standard code of a locale (starting by a letter 'a' to 'z') could be used in these "==" sustitutions, for example: - "=ja=" (it would be used only for spellout number formaters for specific to the Japanese locale), - "=ar-TN=" (for spellout number formatter in Arabic as spoken in Tunisia, when words cannot be used, and the Tunisian Arabic rules should be used, which is different from standard Arabic [ar], as it uses Latin digits instead of Arabic digits: it would still use the separators and groupings specified for the Tunisian Arabic locale, which are also not using the Arabic comma) In that case, the standard way to designate another number system (without reference to a specific language) should use the Unicode locale tags for number systems, but without any leading language subtags (ie. "=-u-ns-native=", instead of just "=-native=") as number formating rules are not expected in most cases to replace the language itself, just to replace the number system): this is the reason for using the leading minus for such usage (but we could also replace the region code only such as "=-CN=" or the script code unly such as "=-Bopo="): this is different from using "=und-CN=" or "=und-Bopo=" because we don't want to replace the language to an undetermined language, which would use only default digits, default grouping separators and default groupings formats instead of keeping them in their current locale. -- Philippe. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cameron at lumoslabs.com Wed Apr 15 12:39:17 2015 From: cameron at lumoslabs.com (Cameron Dutro) Date: Wed, 15 Apr 2015 10:39:17 -0700 Subject: alternate formatting data for algorithmic number systems when they fallback to a decimal system In-Reply-To: References: Message-ID: Hey Philippe, My understanding is that the implementer should just use the number system for the given locale. ICU actually lets you specify the number system, see the docs here: http://www.icu-project.org/apiref/icu4c/classRuleBasedNumberFormat.html (see specifically the icu::RuleBasedNumberFormat::RuleBasedNumberFormat constructor). I understand from your email that converting to a different number system isn't always as straightforward as a 1:1 text replace, but I believe the current CLDR number formatting rules handle these cases, yes? I've noticed that ICU at least formats numbers in RBNF rules using the correct numbering system for the locale. -Cameron On Wed, Apr 15, 2015 at 6:42 AM, Philippe Verdy wrote: > For now the CLDR data for algorithmic number systems are using RBNF rules > when this is possible but the last mapping when this does not work is to > use a specific decimal format (starting by 0 or #). > > One problem is that this decimal format is the same independantly of the > actual locale (language or number style in that language) for which the > number system has been mapped. > > Different locales using the same number system have in fact different > rules for formatting numbers when they are forced to use a fallback to a > decimal system. > > These fallbacks are typically currently specified as the substitution > "=#,##0.00=", which is clearly wrong (e.g. for Traditional Tamil): these > formats are assuming in fact a specific language, and it is not the same > for all locales using this number system. > > I propose deprecating these mappings and instead just set them to the > substitution "==" meaning that it will use the format for the decimal > system which will be used instead. > > Note that when using locale resolution mechanisms to find the appropriate > number system to use for formatting numbers, it will (if you don't care > about it) map it again to the same traditional algorithmic system so this > would recurse infinitely: > > - the "==" substition must look for a mapping for the locale in the > *default* decimal number variant, > > - but it could also map to the "native" decimal number variant mapped for > that locale (replacing the "traditional" variant which is algorithmic, > using the substitution "=-native=", so that native digits will still be > used (instead of just the Latin digits, when these locales are using by > default the Latin digits, and not the native ones) > > With this proposal, the CLDR data for number systems would no longer > contain any data using "=#...=" or "=0...=" substitutions; the traditional > systems would still be able to format all numbers even those they do not > support internally, using the native digits, and the appropriate separators > (decimal, grouping), and appropriate grouping. > > One way to implement it however does not require changing the CLDR data: > the implementation can autodetect the "=#...=" or "=0...=" substition rules > found in algorithmic number systems, consider them all equivalent to just > "==": it would first try to map the locale to a "native" decimal variant, > and use it (note that the "native" variant already has fallbacks for all > locales to use the default decimal variant: this is the case for most non > Indian locales that are alone to have "native" mappings). > > In summary the resolution for algorithmic systems would use the following > path: > - use "traditional" rules if it works (it uses the RBNF data) > - when it finds a "==" substitution (or any "=0...=" or "=#...=" > substitution), find the decimal number system in the "native" variant, and > format numbers in that system, and use the appropriate separators and > groupings > - if there's no "native" variant mapped for that locale, it will fallback > to use the default system (in CLDR data charts, we see that it is the case > because there's an entry mapping "All other locales" to the Latin number > system which will also use the same separators nad groupings. > > This will be a major improvement for number systems used in lots of > languages (including Latin-written languages) such as the "roman" number > system. > > One more note: > > The East-Asian scripts in traditional scripts prefer to use their own > algorithmic system which cannot format all numbers. As they are rendered > using sinographic squares, the fallback "native" digits should use the > "fullwidth" variant: this can be specific using "=-native=" or more > specifically the "=-fullwidth=". > > Note that for now no "==" substituon rule can start by a minus sign ("-"), > it must only be: > - a valid ruleset name (starting by % or %%), or > - a decimal format (starting by "0" or "#", that I want to deprecate), or > - empty (but the current implementation in ICU creates an infinite loop, > or only use Basic Latin decimal digits in a fixed number format, > independant of the locale) > > So there absolutely no conflict when we use a "==" substitution rule > starting by minus (-) to mean that it should use another specified number > system (such as "native" or "fullwidth" or any specific non-algorithmic > number system) which is named just after this minus sign. > > ---- > > Alternatively, the standard code of a locale (starting by a letter 'a' to > 'z') could be used in these "==" sustitutions, for example: > - "=ja=" (it would be used only for spellout number formaters for specific > to the Japanese locale), > - "=ar-TN=" (for spellout number formatter in Arabic as spoken in Tunisia, > when words cannot be used, and the Tunisian Arabic rules should be used, > which is different from standard Arabic [ar], as it uses Latin digits > instead of Arabic digits: it would still use the separators and groupings > specified for the Tunisian Arabic locale, which are also not using the > Arabic comma) > > In that case, the standard way to designate another number system (without > reference to a specific language) should use the Unicode locale tags for > number systems, but without any leading language subtags (ie. > "=-u-ns-native=", instead of just "=-native=") as number formating rules > are not expected in most cases to replace the language itself, just to > replace the number system): this is the reason for using the leading minus > for such usage (but we could also replace the region code only such as > "=-CN=" or the script code unly such as "=-Bopo="): this is different from > using "=und-CN=" or "=und-Bopo=" because we don't want to replace the > language to an undetermined language, which would use only default digits, > default grouping separators and default groupings formats instead of > keeping them in their current locale. > > > -- Philippe. > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cameron at lumoslabs.com Thu Apr 16 11:09:52 2015 From: cameron at lumoslabs.com (Cameron Dutro) Date: Thu, 16 Apr 2015 09:09:52 -0700 Subject: Fwd: alternate formatting data for algorithmic number systems when they fallback to a decimal system In-Reply-To: References: Message-ID: ---------- Forwarded message ---------- From: Cameron Dutro Date: Thu, Apr 16, 2015 at 9:09 AM Subject: Re: alternate formatting data for algorithmic number systems when they fallback to a decimal system To: Philippe Verdy Thank you for the clarification Philippe. In my previous email I was not trying necessarily to respond with approval or disapproval of your proposal, but instead understand the issue better. I am in no position to affect any kind of change in CLDR or ICU. Having read your second and third emails, I think I agree with you. I'd like to hear what Mark and Markus have to say about this too, however. -Cameron On Thu, Apr 16, 2015 at 3:09 AM, Philippe Verdy wrote: > My proposal concerns in fact all types of number formatters currently > supported in CLDR data and that could all be algorithmic: > - number systems (cardinals), > - ordinal, > - year numbering, > - month numbering, > - day numbering, > - century numbering (in French it uses the roman-lower system with > ordinals), > - millenium numbering (in French it uses the roman-upper system with > ordinal), > - accounting amounts, > - currency amounts (displayed prices), > - measurement with unit, > - spellout using translated words for all the usages above... > > It also concerns number parsers, that are built to parse and accept all > these formatted numbers using the same rulesets, plus a lenient parsing > ruleset for accepting numbers not formatted this way (e.g. a "roman-lower" > parser will typically contain lenient parsing rules for accepting all > numbers formatted with a decimal system, as well as numbers formatted in > "roman-upper")... > Le 16 avr. 2015 11:27, "Philippe Verdy" a ?crit : > >> No ICU does NOT handle this case. >> >> When using a locale whose number system is algorithmic, yes it uses that >> system, as specified in CLDR data, and yes it yes the RBNF rulesets >> associated. >> >> But the problem is within these rulesets when one of the rules specifies >> a substitution which is neither another ruleset name and neither an empty >> substitution (such as == or << or >>) but a decimal format starting by 0 or >> #. >> >> On that case the decimal format is used blindly and does not use the >> native decimal digits or the native separators or the native grouping and >> decimal formats or that locale. >> >> The problem being in fact in CLDR data where the rule specifies a >> substitution like this one in the "roman-lower" system: >> >> "5000: =##,##0=" >> >> which should really be >> >> "5000: ==" >> >> to ignore the specified decimal format but instead select an appropriate >> decimal format for the locale in ANOTHER number system that will not be >> algorithmic but decimal, and searched by default first for the "native" >> system when it is mapped for that locale (in CLDR data all locales have a >> mapping of the effective number system to use when we use the "native" >> number system alias, this is mot the case for the "finance" or "traditio" >> number system alias) before the defaut number system for that locale (in >> CLDR data, all locales have a decimal system mapped there which is not >> necessarily the modern latin system but is formatable with ten digits and >> standard separators and signs which are still localized.) >> >> On summary you have still not understood why this an issue not just >> inside ICU but in fact in CLDR data itself independantly of the ICU >> implementation. The problem is NOT: >> ? in the mapping of locales to their number systems in several variants >> (default, native, traditio, finance) and possibly also aliased, >> ? in the mapping of number system to a decimal or algorithmic type. >> ? in the definition of each algorithmic number system by a group of >> rulesets including one which is public (not named with a %% prefix) and >> designated as the main ruleset to use. >> ? in the definition of each ruleset widget several rules, each file being >> keyed either by special rule type (proper fraction, improper fraction, or >> master) or by value (an integer or fraction). >> >> The problem is in the definition of an individual RBNF rule, where it >> uses a substitution to a decimal format starting by 0 or # (such >> substitution may be surrounded by == or << or >> to soecify hiw to compute >> the value to firmat): this is something that I propose to deprecate and >> even completely from CLDR data as it is clearly wrong or insufficient as it >> bypasses the per-locale settings of their prefered decimal system if not >> using their prefered algorithmic system. >> >> However I maintain the role of == or << or >> to compute the value that >> will be passed down the decimal formater. >> >> So your reply in fact gives absolutely no hint and even the link to the >> ICU constructor is inappropriate for this issue (I know what it does, and I >> had already inspected this code before sending my first email with the >> proposal). You had clearly not understood the issue that i have just >> reformulated here with more explicit details. >> Le 15 avr. 2015 19:39, "Cameron Dutro" a ?crit : >> >>> Hey Philippe, >>> >>> My understanding is that the implementer should just use the number >>> system for the given locale. ICU actually lets you specify the number >>> system, see the docs here: >>> http://www.icu-project.org/apiref/icu4c/classRuleBasedNumberFormat.html >>> (see specifically the icu::RuleBasedNumberFormat::RuleBasedNumberFormat >>> constructor). I understand from your email that converting to a different >>> number system isn't always as straightforward as a 1:1 text replace, but I >>> believe the current CLDR number formatting rules handle these cases, yes? >>> I've noticed that ICU at least formats numbers in RBNF rules using the >>> correct numbering system for the locale. >>> >>> -Cameron >>> >>> On Wed, Apr 15, 2015 at 6:42 AM, Philippe Verdy >>> wrote: >>> >>>> For now the CLDR data for algorithmic number systems are using RBNF >>>> rules when this is possible but the last mapping when this does not work is >>>> to use a specific decimal format (starting by 0 or #). >>>> >>>> One problem is that this decimal format is the same independantly of >>>> the actual locale (language or number style in that language) for which the >>>> number system has been mapped. >>>> >>>> Different locales using the same number system have in fact different >>>> rules for formatting numbers when they are forced to use a fallback to a >>>> decimal system. >>>> >>>> These fallbacks are typically currently specified as the substitution >>>> "=#,##0.00=", which is clearly wrong (e.g. for Traditional Tamil): these >>>> formats are assuming in fact a specific language, and it is not the same >>>> for all locales using this number system. >>>> >>>> I propose deprecating these mappings and instead just set them to the >>>> substitution "==" meaning that it will use the format for the decimal >>>> system which will be used instead. >>>> >>>> Note that when using locale resolution mechanisms to find the >>>> appropriate number system to use for formatting numbers, it will (if you >>>> don't care about it) map it again to the same traditional algorithmic >>>> system so this would recurse infinitely: >>>> >>>> - the "==" substition must look for a mapping for the locale in the >>>> *default* decimal number variant, >>>> >>>> - but it could also map to the "native" decimal number variant mapped >>>> for that locale (replacing the "traditional" variant which is algorithmic, >>>> using the substitution "=-native=", so that native digits will still be >>>> used (instead of just the Latin digits, when these locales are using by >>>> default the Latin digits, and not the native ones) >>>> >>>> With this proposal, the CLDR data for number systems would no longer >>>> contain any data using "=#...=" or "=0...=" substitutions; the traditional >>>> systems would still be able to format all numbers even those they do not >>>> support internally, using the native digits, and the appropriate separators >>>> (decimal, grouping), and appropriate grouping. >>>> >>>> One way to implement it however does not require changing the CLDR >>>> data: the implementation can autodetect the "=#...=" or "=0...=" substition >>>> rules found in algorithmic number systems, consider them all equivalent to >>>> just "==": it would first try to map the locale to a "native" decimal >>>> variant, and use it (note that the "native" variant already has fallbacks >>>> for all locales to use the default decimal variant: this is the case for >>>> most non Indian locales that are alone to have "native" mappings). >>>> >>>> In summary the resolution for algorithmic systems would use the >>>> following path: >>>> - use "traditional" rules if it works (it uses the RBNF data) >>>> - when it finds a "==" substitution (or any "=0...=" or "=#...=" >>>> substitution), find the decimal number system in the "native" variant, and >>>> format numbers in that system, and use the appropriate separators and >>>> groupings >>>> - if there's no "native" variant mapped for that locale, it will >>>> fallback to use the default system (in CLDR data charts, we see that it is >>>> the case because there's an entry mapping "All other locales" to the Latin >>>> number system which will also use the same separators nad groupings. >>>> >>>> This will be a major improvement for number systems used in lots of >>>> languages (including Latin-written languages) such as the "roman" number >>>> system. >>>> >>>> One more note: >>>> >>>> The East-Asian scripts in traditional scripts prefer to use their own >>>> algorithmic system which cannot format all numbers. As they are rendered >>>> using sinographic squares, the fallback "native" digits should use the >>>> "fullwidth" variant: this can be specific using "=-native=" or more >>>> specifically the "=-fullwidth=". >>>> >>>> Note that for now no "==" substituon rule can start by a minus sign >>>> ("-"), it must only be: >>>> - a valid ruleset name (starting by % or %%), or >>>> - a decimal format (starting by "0" or "#", that I want to deprecate), >>>> or >>>> - empty (but the current implementation in ICU creates an infinite >>>> loop, or only use Basic Latin decimal digits in a fixed number format, >>>> independant of the locale) >>>> >>>> So there absolutely no conflict when we use a "==" substitution rule >>>> starting by minus (-) to mean that it should use another specified number >>>> system (such as "native" or "fullwidth" or any specific non-algorithmic >>>> number system) which is named just after this minus sign. >>>> >>>> ---- >>>> >>>> Alternatively, the standard code of a locale (starting by a letter 'a' >>>> to 'z') could be used in these "==" sustitutions, for example: >>>> - "=ja=" (it would be used only for spellout number formaters for >>>> specific to the Japanese locale), >>>> - "=ar-TN=" (for spellout number formatter in Arabic as spoken in >>>> Tunisia, when words cannot be used, and the Tunisian Arabic rules should be >>>> used, which is different from standard Arabic [ar], as it uses Latin digits >>>> instead of Arabic digits: it would still use the separators and groupings >>>> specified for the Tunisian Arabic locale, which are also not using the >>>> Arabic comma) >>>> >>>> In that case, the standard way to designate another number system >>>> (without reference to a specific language) should use the Unicode locale >>>> tags for number systems, but without any leading language subtags (ie. >>>> "=-u-ns-native=", instead of just "=-native=") as number formating rules >>>> are not expected in most cases to replace the language itself, just to >>>> replace the number system): this is the reason for using the leading minus >>>> for such usage (but we could also replace the region code only such as >>>> "=-CN=" or the script code unly such as "=-Bopo="): this is different from >>>> using "=und-CN=" or "=und-Bopo=" because we don't want to replace the >>>> language to an undetermined language, which would use only default digits, >>>> default grouping separators and default groupings formats instead of >>>> keeping them in their current locale. >>>> >>>> >>>> -- Philippe. >>>> >>>> >>>> _______________________________________________ >>>> CLDR-Users mailing list >>>> CLDR-Users at unicode.org >>>> http://unicode.org/mailman/listinfo/cldr-users >>>> >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: