Bundle Lookup

Mark Davis ☕️ mark at macchiato.com
Fri Dec 12 10:50:44 CST 2014


I also want to be clear that there are two closely-related but very
different tasks.

1. *Inherited item lookup. *Given that you have a CLDR resource bundle,
with inheritance, where do I go to get inherited items?

That is specified by CLDR by means of the parentLocale + truncation
algorithm, plus the alias element. (There are a few cases where we have
"Lateral Inheritance" where the specification is in the text of LDML, such
as when looking for an alt variant.)

So back to Rafael's original question:

   1. en-Latn-GB, and zh-TW are not CLDR bundles, so this doesn't apply to
   them.
   2. en-US-u-nu-usd: the u-nu-usd doesn't select within a bundle, but
   rather customizes a service that uses information in the bundle. The item
   lookup (using by the currency formatting service) would be en-US => en
   => root.


2. *Bundle lookup. *Given a locale ID, where do I get the best matching
CLDR bundle?

My application has a set of supported locales, and the user comes in with a
set of desired locales. What is the best bundle for that user?

Here we are not as clear as we should be. The recommended process is in
http://www.unicode.org/reports/tr35/#LanguageMatching

So back to Rafael's original question:

   1. en-Latn-GB, and zh-TW. When these are looked up with Language
   Matching, assuming that all the CLDR locales are available, they would
   return, respectively, en-GB and zh-Hant-TW.

That being said, often people don't understand language matching, and so we
are in the process of adding more information so that there is a direct
mapping from between locale IDs that are always considered to be
"identical" on a deep level, like en-GB and en-Latn-GB.



Mark <https://google.com/+MarkDavis>

*— Il meglio è l’inimico del bene —*

On Fri, Dec 12, 2014 at 5:04 PM, John Emmons <emmo at us.ibm.com> wrote:

> Yes, Edward, there is a very good reason we don't want zh-Hant to inherit
> from zh.  Simply put, in situations where you have locale resources that
> aren't 100% populated, allowing zh-Hant to inherit from zh produces a
> mixture of simplified and traditional Chinese, which is acceptable to no
> one.  This is what we call "cross script inheritance" in CLDR.  While it
> might be acceptable to some in the case of Chinese, it is certainly a
> bigger problem in languages like Serbian, where you have both Latin and
> Cyrillic scripts in use, and you certainly don't ever want a mixture of
> Latin and Cyrillic scripts
>
> These relationships are documented in CLDR's supplemental data, where you
> have specified:
>
> <parentLocale parent="root" locales="az_Cyrl bm_Nkoo bs_Cyrl en_Dsrt
> ha_Arab mn_Mong ms_Arab pa_Arab shi_Latn sr_Latn uz_Arab uz_Cyrl vai_Latn
> zh_Hant"/>
>
>
> Regards,
>
> John C. Emmons
> Globalization Architect & Unicode CLDR TC Chairman
> IBM Software Group
> Internet: emmo at us.ibm.com
>
>
> [image: Inactive hide details for Edwin Hoogerbeets ---12/11/2014 07:41:26
> PM---Rafael, also take a look at common/supplemental/likelyS]Edwin
> Hoogerbeets ---12/11/2014 07:41:26 PM---Rafael, also take a look at
> common/supplemental/likelySubtags.xml. If the caller has passed you an i
>
> From: Edwin Hoogerbeets <ehoogerbeets at gmail.com>
> To: John Emmons/Austin/IBM at IBMUS, Rafael Xavier <rxaviers at gmail.com>
> Cc: Jörn Zaefferer <joern.zaefferer at gmail.com>, "cldr-users at unicode.org" <
> cldr-users at unicode.org>
> Date: 12/11/2014 07:41 PM
> Subject: Re: Bundle Lookup
> ------------------------------
>
>
>
> Rafael, also take a look at common/supplemental/likelySubtags.xml. If the
> caller has passed you an incompletely specified locale, you can use those
> mappings to see if you can get to a locale for which you do have a string
> bundle. I think that is the source for the "language aliases" to which John
> was referring.
>
> John, for the last part of your example zh-TW inheritance chain, wouldn't
> you just truncate "zh-Hant" again to "zh" like in the en-GB example before
> inheriting from the root? If not, what is the reasoning there? Is there
> already a document that specifies the inheritance rules in CLDR?
>
> For efficiency, I can imagine you would put the common translations in
> "zh" where there is no difference between traditional and simplified, and
> other translations in "zh-Hant" or "zh-Hans" where there is. That would
> save some disk space and you could leverage linguistic bug fixes at the
> "zh" level. For other locales like "sr-Latn" and "sr-Cyrl" there would be
> nothing in common so the string bundle at the "sr" level would be
> essentially empty, but it should still appear in the inheritance chain just
> in case.
>
> Edwin
>
>
> On 12/11/2014 02:53 PM, John Emmons wrote:
>
>
>    #3 is currently a problem, which we are working on.  Basically, "Latn"
>    needs to be stripped out because it isn't necessary.  Then follow the
>    normal inheritance:
>
>    en-GB: en-GB → (parentLocale) en-001 → (truncation) en → root
>
>    #4 - Any unicode locale extensions are meant to identify particular
>    behaviors that are desired in the context of a given locale.  Think of them
>    like "options".  They are not meant to be used in the context of bundle
>    lookups.
>
>    #5 - zh_TW - Now that proper language aliases are in place ( See
>    *http://unicode.org/cldr/trac/ticket/5949*
>    <http://unicode.org/cldr/trac/ticket/5949> )
>
>    zh-TW: zh-TW → (languageAlias) zh-Hant-TW → (truncation) zh-Hant
>     (parentLocale) → root
>
>    Regards,
>
>    John C. Emmons
>    Globalization Architect & Unicode CLDR TC Chairman
>    IBM Software Group
>    Internet: *emmo at us.ibm.com* <emmo at us.ibm.com>
>
>
>    [image: Inactive hide details for Rafael Xavier ---12/11/2014 01:02:57
>    PM---Friends, This is a very basic question. See below. There ar]Rafael
>    Xavier ---12/11/2014 01:02:57 PM---Friends, This is a very basic question.
>    See below. There are lots of documentation
>
>    From: Rafael Xavier *<rxaviers at gmail.com>* <rxaviers at gmail.com>
>    To: *"cldr-users at unicode.org"* <cldr-users at unicode.org>
>    *<cldr-users at unicode.org>* <cldr-users at unicode.org>
>    Cc: Jörn Zaefferer *<joern.zaefferer at gmail.com>*
>    <joern.zaefferer at gmail.com>
>    Date: 12/11/2014 01:02 PM
>    Subject: Bundle Lookup
>    Sent by: "CLDR-Users" *<cldr-users-bounces at unicode.org>*
>    <cldr-users-bounces at unicode.org>
>
>    ------------------------------
>
>
>
>    Friends,
>
>    This is a very basic question. See below. There are lots of
>    documentation about locale inheritance and matching. But, it fails in same
>    cases to me.
>
> * Giving a locale, what's the procedure to find the **bundle** lookup
>    chain?*
>
>    1. en-US: en-US → (truncation) en → root
>
>    This one is dead simple. No problem.
>
>    2. en-GB: en-GB → (parentLocale) en-001 → (truncation) en → root
>
>    This one is also dead simple. Although, documentation says en-GB → en.
>    Is it outdated or am I doing something wrong?
>
>    Anyway, the ones I'm interested in knowing are:
>
>    3. en-Latn-GB
>    4. en-US-u-nu-usd
>    5. zh-TW
>
>    Please, could someone show me what's the chain of these locales (and
>    obviously explain the steps)?
>
>    Thanks!
>
>    --
> *+55 (16) 98138-1582* <%2B55%20%2816%29%2098138-1582>, *+1 (415) 568-5854*
>    <%2B1%20%28415%29%20568-5854>, skype: rxaviers
> *http://rafael.xavier.blog.br* <http://rafael.xavier.blog.br/>
>    _______________________________________________
>    CLDR-Users mailing list
> *CLDR-Users at unicode.org* <CLDR-Users at unicode.org>
> *http://unicode.org/mailman/listinfo/cldr-users*
>    <http://unicode.org/mailman/listinfo/cldr-users>
>
>
>
>    _______________________________________________
>    CLDR-Users mailing list
>    *CLDR-Users at unicode.org* <CLDR-Users at unicode.org>
>    *http://unicode.org/mailman/listinfo/cldr-users*
>    <http://unicode.org/mailman/listinfo/cldr-users>
>
>
>
>
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20141212/e061e32f/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://unicode.org/pipermail/cldr-users/attachments/20141212/e061e32f/attachment-0001.gif>


More information about the CLDR-Users mailing list