Bundle Lookup
Mark Davis ☕️
mark at macchiato.com
Fri Dec 12 10:50:44 CST 2014
I also want to be clear that there are two closely-related but very
different tasks.
1. *Inherited item lookup. *Given that you have a CLDR resource bundle,
with inheritance, where do I go to get inherited items?
That is specified by CLDR by means of the parentLocale + truncation
algorithm, plus the alias element. (There are a few cases where we have
"Lateral Inheritance" where the specification is in the text of LDML, such
as when looking for an alt variant.)
So back to Rafael's original question:
1. en-Latn-GB, and zh-TW are not CLDR bundles, so this doesn't apply to
them.
2. en-US-u-nu-usd: the u-nu-usd doesn't select within a bundle, but
rather customizes a service that uses information in the bundle. The item
lookup (using by the currency formatting service) would be en-US => en
=> root.
2. *Bundle lookup. *Given a locale ID, where do I get the best matching
CLDR bundle?
My application has a set of supported locales, and the user comes in with a
set of desired locales. What is the best bundle for that user?
Here we are not as clear as we should be. The recommended process is in
http://www.unicode.org/reports/tr35/#LanguageMatching
So back to Rafael's original question:
1. en-Latn-GB, and zh-TW. When these are looked up with Language
Matching, assuming that all the CLDR locales are available, they would
return, respectively, en-GB and zh-Hant-TW.
That being said, often people don't understand language matching, and so we
are in the process of adding more information so that there is a direct
mapping from between locale IDs that are always considered to be
"identical" on a deep level, like en-GB and en-Latn-GB.
Mark <https://google.com/+MarkDavis>
*— Il meglio è l’inimico del bene —*
On Fri, Dec 12, 2014 at 5:04 PM, John Emmons <emmo at us.ibm.com> wrote:
> Yes, Edward, there is a very good reason we don't want zh-Hant to inherit
> from zh. Simply put, in situations where you have locale resources that
> aren't 100% populated, allowing zh-Hant to inherit from zh produces a
> mixture of simplified and traditional Chinese, which is acceptable to no
> one. This is what we call "cross script inheritance" in CLDR. While it
> might be acceptable to some in the case of Chinese, it is certainly a
> bigger problem in languages like Serbian, where you have both Latin and
> Cyrillic scripts in use, and you certainly don't ever want a mixture of
> Latin and Cyrillic scripts
>
> These relationships are documented in CLDR's supplemental data, where you
> have specified:
>
> <parentLocale parent="root" locales="az_Cyrl bm_Nkoo bs_Cyrl en_Dsrt
> ha_Arab mn_Mong ms_Arab pa_Arab shi_Latn sr_Latn uz_Arab uz_Cyrl vai_Latn
> zh_Hant"/>
>
>
> Regards,
>
> John C. Emmons
> Globalization Architect & Unicode CLDR TC Chairman
> IBM Software Group
> Internet: emmo at us.ibm.com
>
>
> [image: Inactive hide details for Edwin Hoogerbeets ---12/11/2014 07:41:26
> PM---Rafael, also take a look at common/supplemental/likelyS]Edwin
> Hoogerbeets ---12/11/2014 07:41:26 PM---Rafael, also take a look at
> common/supplemental/likelySubtags.xml. If the caller has passed you an i
>
> From: Edwin Hoogerbeets <ehoogerbeets at gmail.com>
> To: John Emmons/Austin/IBM at IBMUS, Rafael Xavier <rxaviers at gmail.com>
> Cc: Jörn Zaefferer <joern.zaefferer at gmail.com>, "cldr-users at unicode.org" <
> cldr-users at unicode.org>
> Date: 12/11/2014 07:41 PM
> Subject: Re: Bundle Lookup
> ------------------------------
>
>
>
> Rafael, also take a look at common/supplemental/likelySubtags.xml. If the
> caller has passed you an incompletely specified locale, you can use those
> mappings to see if you can get to a locale for which you do have a string
> bundle. I think that is the source for the "language aliases" to which John
> was referring.
>
> John, for the last part of your example zh-TW inheritance chain, wouldn't
> you just truncate "zh-Hant" again to "zh" like in the en-GB example before
> inheriting from the root? If not, what is the reasoning there? Is there
> already a document that specifies the inheritance rules in CLDR?
>
> For efficiency, I can imagine you would put the common translations in
> "zh" where there is no difference between traditional and simplified, and
> other translations in "zh-Hant" or "zh-Hans" where there is. That would
> save some disk space and you could leverage linguistic bug fixes at the
> "zh" level. For other locales like "sr-Latn" and "sr-Cyrl" there would be
> nothing in common so the string bundle at the "sr" level would be
> essentially empty, but it should still appear in the inheritance chain just
> in case.
>
> Edwin
>
>
> On 12/11/2014 02:53 PM, John Emmons wrote:
>
>
> #3 is currently a problem, which we are working on. Basically, "Latn"
> needs to be stripped out because it isn't necessary. Then follow the
> normal inheritance:
>
> en-GB: en-GB → (parentLocale) en-001 → (truncation) en → root
>
> #4 - Any unicode locale extensions are meant to identify particular
> behaviors that are desired in the context of a given locale. Think of them
> like "options". They are not meant to be used in the context of bundle
> lookups.
>
> #5 - zh_TW - Now that proper language aliases are in place ( See
> *http://unicode.org/cldr/trac/ticket/5949*
> <http://unicode.org/cldr/trac/ticket/5949> )
>
> zh-TW: zh-TW → (languageAlias) zh-Hant-TW → (truncation) zh-Hant
> (parentLocale) → root
>
> Regards,
>
> John C. Emmons
> Globalization Architect & Unicode CLDR TC Chairman
> IBM Software Group
> Internet: *emmo at us.ibm.com* <emmo at us.ibm.com>
>
>
> [image: Inactive hide details for Rafael Xavier ---12/11/2014 01:02:57
> PM---Friends, This is a very basic question. See below. There ar]Rafael
> Xavier ---12/11/2014 01:02:57 PM---Friends, This is a very basic question.
> See below. There are lots of documentation
>
> From: Rafael Xavier *<rxaviers at gmail.com>* <rxaviers at gmail.com>
> To: *"cldr-users at unicode.org"* <cldr-users at unicode.org>
> *<cldr-users at unicode.org>* <cldr-users at unicode.org>
> Cc: Jörn Zaefferer *<joern.zaefferer at gmail.com>*
> <joern.zaefferer at gmail.com>
> Date: 12/11/2014 01:02 PM
> Subject: Bundle Lookup
> Sent by: "CLDR-Users" *<cldr-users-bounces at unicode.org>*
> <cldr-users-bounces at unicode.org>
>
> ------------------------------
>
>
>
> Friends,
>
> This is a very basic question. See below. There are lots of
> documentation about locale inheritance and matching. But, it fails in same
> cases to me.
>
> * Giving a locale, what's the procedure to find the **bundle** lookup
> chain?*
>
> 1. en-US: en-US → (truncation) en → root
>
> This one is dead simple. No problem.
>
> 2. en-GB: en-GB → (parentLocale) en-001 → (truncation) en → root
>
> This one is also dead simple. Although, documentation says en-GB → en.
> Is it outdated or am I doing something wrong?
>
> Anyway, the ones I'm interested in knowing are:
>
> 3. en-Latn-GB
> 4. en-US-u-nu-usd
> 5. zh-TW
>
> Please, could someone show me what's the chain of these locales (and
> obviously explain the steps)?
>
> Thanks!
>
> --
> *+55 (16) 98138-1582* <%2B55%20%2816%29%2098138-1582>, *+1 (415) 568-5854*
> <%2B1%20%28415%29%20568-5854>, skype: rxaviers
> *http://rafael.xavier.blog.br* <http://rafael.xavier.blog.br/>
> _______________________________________________
> CLDR-Users mailing list
> *CLDR-Users at unicode.org* <CLDR-Users at unicode.org>
> *http://unicode.org/mailman/listinfo/cldr-users*
> <http://unicode.org/mailman/listinfo/cldr-users>
>
>
>
> _______________________________________________
> CLDR-Users mailing list
> *CLDR-Users at unicode.org* <CLDR-Users at unicode.org>
> *http://unicode.org/mailman/listinfo/cldr-users*
> <http://unicode.org/mailman/listinfo/cldr-users>
>
>
>
>
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20141212/e061e32f/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://unicode.org/pipermail/cldr-users/attachments/20141212/e061e32f/attachment-0001.gif>
More information about the CLDR-Users
mailing list