Bundle Lookup

Rafael Xavier rxaviers at gmail.com
Fri Dec 12 12:48:08 CST 2014


Mark,

Giving an arbitrary locale ID, the recommended and only process to deduce
its respective bundle (reliably) is through Language Matching.

Is that true?

Considering all bundles are always present, isn't there any less expensive
algorithm that could be recommended?

Thank you.


PS: My use case is a little different. I have *n* distributions of my
application. On each distribution, it's embedded with a different locale.
So, I don't need the full power of Language Matching on what's regard
having an arbitrary list of desired locales vs an aribtrary list of
available locales. Anyway, I do want my application to look up for the
right bundle given a locale (e.g., `zh-Hans-TW` when given `zh-TW`).

On Fri, Dec 12, 2014 at 2:50 PM, Mark Davis ☕️ <mark at macchiato.com> wrote:
>
> I also want to be clear that there are two closely-related but very
> different tasks.
>
> 1. *Inherited item lookup. *Given that you have a CLDR resource bundle,
> with inheritance, where do I go to get inherited items?
>
> That is specified by CLDR by means of the parentLocale + truncation
> algorithm, plus the alias element. (There are a few cases where we have
> "Lateral Inheritance" where the specification is in the text of LDML,
> such as when looking for an alt variant.)
>
> So back to Rafael's original question:
>
>    1. en-Latn-GB, and zh-TW are not CLDR bundles, so this doesn't apply
>    to them.
>    2. en-US-u-nu-usd: the u-nu-usd doesn't select within a bundle, but
>    rather customizes a service that uses information in the bundle. The item
>    lookup (using by the currency formatting service) would be en-US => en
>    => root.
>
>
> 2. *Bundle lookup. *Given a locale ID, where do I get the best matching
> CLDR bundle?
>
> My application has a set of supported locales, and the user comes in with
> a set of desired locales. What is the best bundle for that user?
>
> Here we are not as clear as we should be. The recommended process is in
> http://www.unicode.org/reports/tr35/#LanguageMatching
>
> So back to Rafael's original question:
>
>    1. en-Latn-GB, and zh-TW. When these are looked up with Language
>    Matching, assuming that all the CLDR locales are available, they would
>    return, respectively, en-GB and zh-Hant-TW.
>
> That being said, often people don't understand language matching, and so
> we are in the process of adding more information so that there is a direct
> mapping from between locale IDs that are always considered to be
> "identical" on a deep level, like en-GB and en-Latn-GB.
>
>
>
> Mark <https://google.com/+MarkDavis>
>
> *— Il meglio è l’inimico del bene —*
>
> On Fri, Dec 12, 2014 at 5:04 PM, John Emmons <emmo at us.ibm.com> wrote:
>
>> Yes, Edward, there is a very good reason we don't want zh-Hant to inherit
>> from zh.  Simply put, in situations where you have locale resources that
>> aren't 100% populated, allowing zh-Hant to inherit from zh produces a
>> mixture of simplified and traditional Chinese, which is acceptable to no
>> one.  This is what we call "cross script inheritance" in CLDR.  While it
>> might be acceptable to some in the case of Chinese, it is certainly a
>> bigger problem in languages like Serbian, where you have both Latin and
>> Cyrillic scripts in use, and you certainly don't ever want a mixture of
>> Latin and Cyrillic scripts
>>
>> These relationships are documented in CLDR's supplemental data, where you
>> have specified:
>>
>> <parentLocale parent="root" locales="az_Cyrl bm_Nkoo bs_Cyrl en_Dsrt
>> ha_Arab mn_Mong ms_Arab pa_Arab shi_Latn sr_Latn uz_Arab uz_Cyrl vai_Latn
>> zh_Hant"/>
>>
>>
>> Regards,
>>
>> John C. Emmons
>> Globalization Architect & Unicode CLDR TC Chairman
>> IBM Software Group
>> Internet: emmo at us.ibm.com
>>
>>
>> [image: Inactive hide details for Edwin Hoogerbeets ---12/11/2014
>> 07:41:26 PM---Rafael, also take a look at common/supplemental/likelyS]Edwin
>> Hoogerbeets ---12/11/2014 07:41:26 PM---Rafael, also take a look at
>> common/supplemental/likelySubtags.xml. If the caller has passed you an i
>>
>> From: Edwin Hoogerbeets <ehoogerbeets at gmail.com>
>> To: John Emmons/Austin/IBM at IBMUS, Rafael Xavier <rxaviers at gmail.com>
>> Cc: Jörn Zaefferer <joern.zaefferer at gmail.com>, "cldr-users at unicode.org"
>> <cldr-users at unicode.org>
>> Date: 12/11/2014 07:41 PM
>> Subject: Re: Bundle Lookup
>> ------------------------------
>>
>>
>>
>> Rafael, also take a look at common/supplemental/likelySubtags.xml. If the
>> caller has passed you an incompletely specified locale, you can use those
>> mappings to see if you can get to a locale for which you do have a string
>> bundle. I think that is the source for the "language aliases" to which John
>> was referring.
>>
>> John, for the last part of your example zh-TW inheritance chain, wouldn't
>> you just truncate "zh-Hant" again to "zh" like in the en-GB example before
>> inheriting from the root? If not, what is the reasoning there? Is there
>> already a document that specifies the inheritance rules in CLDR?
>>
>> For efficiency, I can imagine you would put the common translations in
>> "zh" where there is no difference between traditional and simplified, and
>> other translations in "zh-Hant" or "zh-Hans" where there is. That would
>> save some disk space and you could leverage linguistic bug fixes at the
>> "zh" level. For other locales like "sr-Latn" and "sr-Cyrl" there would be
>> nothing in common so the string bundle at the "sr" level would be
>> essentially empty, but it should still appear in the inheritance chain just
>> in case.
>>
>> Edwin
>>
>>
>> On 12/11/2014 02:53 PM, John Emmons wrote:
>>
>>
>>    #3 is currently a problem, which we are working on.  Basically,
>>    "Latn" needs to be stripped out because it isn't necessary.  Then follow
>>    the normal inheritance:
>>
>>    en-GB: en-GB → (parentLocale) en-001 → (truncation) en → root
>>
>>    #4 - Any unicode locale extensions are meant to identify particular
>>    behaviors that are desired in the context of a given locale.  Think of them
>>    like "options".  They are not meant to be used in the context of bundle
>>    lookups.
>>
>>    #5 - zh_TW - Now that proper language aliases are in place ( See
>>    *http://unicode.org/cldr/trac/ticket/5949*
>>    <http://unicode.org/cldr/trac/ticket/5949> )
>>
>>    zh-TW: zh-TW → (languageAlias) zh-Hant-TW → (truncation) zh-Hant
>>     (parentLocale) → root
>>
>>    Regards,
>>
>>    John C. Emmons
>>    Globalization Architect & Unicode CLDR TC Chairman
>>    IBM Software Group
>>    Internet: *emmo at us.ibm.com* <emmo at us.ibm.com>
>>
>>
>>    [image: Inactive hide details for Rafael Xavier ---12/11/2014
>>    01:02:57 PM---Friends, This is a very basic question. See below. There ar]Rafael
>>    Xavier ---12/11/2014 01:02:57 PM---Friends, This is a very basic question.
>>    See below. There are lots of documentation
>>
>>    From: Rafael Xavier *<rxaviers at gmail.com>* <rxaviers at gmail.com>
>>    To: *"cldr-users at unicode.org"* <cldr-users at unicode.org>
>>    *<cldr-users at unicode.org>* <cldr-users at unicode.org>
>>    Cc: Jörn Zaefferer *<joern.zaefferer at gmail.com>*
>>    <joern.zaefferer at gmail.com>
>>    Date: 12/11/2014 01:02 PM
>>    Subject: Bundle Lookup
>>    Sent by: "CLDR-Users" *<cldr-users-bounces at unicode.org>*
>>    <cldr-users-bounces at unicode.org>
>>
>>    ------------------------------
>>
>>
>>
>>    Friends,
>>
>>    This is a very basic question. See below. There are lots of
>>    documentation about locale inheritance and matching. But, it fails in same
>>    cases to me.
>>
>> * Giving a locale, what's the procedure to find the **bundle** lookup
>>    chain?*
>>
>>    1. en-US: en-US → (truncation) en → root
>>
>>    This one is dead simple. No problem.
>>
>>    2. en-GB: en-GB → (parentLocale) en-001 → (truncation) en → root
>>
>>    This one is also dead simple. Although, documentation says en-GB →
>>    en. Is it outdated or am I doing something wrong?
>>
>>    Anyway, the ones I'm interested in knowing are:
>>
>>    3. en-Latn-GB
>>    4. en-US-u-nu-usd
>>    5. zh-TW
>>
>>    Please, could someone show me what's the chain of these locales (and
>>    obviously explain the steps)?
>>
>>    Thanks!
>>
>>    --
>> *+55 (16) 98138-1582* <%2B55%20%2816%29%2098138-1582>, *+1 (415)
>>    568-5854* <%2B1%20%28415%29%20568-5854>, skype: rxaviers
>> *http://rafael.xavier.blog.br* <http://rafael.xavier.blog.br/>
>>    _______________________________________________
>>    CLDR-Users mailing list
>> *CLDR-Users at unicode.org* <CLDR-Users at unicode.org>
>> *http://unicode.org/mailman/listinfo/cldr-users*
>>    <http://unicode.org/mailman/listinfo/cldr-users>
>>
>>
>>
>>    _______________________________________________
>>    CLDR-Users mailing list
>>    *CLDR-Users at unicode.org* <CLDR-Users at unicode.org>
>>    *http://unicode.org/mailman/listinfo/cldr-users*
>>    <http://unicode.org/mailman/listinfo/cldr-users>
>>
>>
>>
>>
>> _______________________________________________
>> CLDR-Users mailing list
>> CLDR-Users at unicode.org
>> http://unicode.org/mailman/listinfo/cldr-users
>>
>>
>

-- 
+55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers
http://rafael.xavier.blog.br
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20141212/355e608f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://unicode.org/pipermail/cldr-users/attachments/20141212/355e608f/attachment.gif>


More information about the CLDR-Users mailing list