Bundle Lookup

Rafael Xavier rxaviers at gmail.com
Fri Dec 12 15:31:50 CST 2014


Looking forward to hearing how that shall work.

Thank you very much so far.

On Fri, Dec 12, 2014 at 6:27 PM, Mark Davis ☕️ <mark at macchiato.com> wrote:
>
>
>
>
> Mark <https://google.com/+MarkDavis>
>
> *— Il meglio è l’inimico del bene —*
>
> On Fri, Dec 12, 2014 at 7:48 PM, Rafael Xavier <rxaviers at gmail.com> wrote:
>
>> Mark,
>>
>> Giving an arbitrary locale ID, the recommended and only process to deduce
>> its respective bundle (reliably) is through Language Matching.
>>
>> Is that true?
>>
>
> ​As I said: "
> That being said, often people don't understand language matching, and so
> we are in the process of adding more information so that there is a direct
> mapping from between locale IDs that are always considered to be
> "identical" on a deep level, like en-GB and en-Latn-GB.
> ​"​
>>
>
>>
>> Considering all bundles are always present, isn't there any less
>> expensive algorithm that could be recommended?
>>
>> Thank you.
>>
>>
>> PS: My use case is a little different. I have *n* distributions of my
>> application. On each distribution, it's embedded with a different locale.
>> So, I don't need the full power of Language Matching on what's regard
>> having an arbitrary list of desired locales vs an aribtrary list of
>> available locales. Anyway, I do want my application to look up for the
>> right bundle given a locale (e.g., `zh-Hans-TW` when given `zh-TW`).
>>
>> On Fri, Dec 12, 2014 at 2:50 PM, Mark Davis ☕️ <mark at macchiato.com>
>> wrote:
>>>
>>> I also want to be clear that there are two closely-related but very
>>> different tasks.
>>>
>>> 1. *Inherited item lookup. *Given that you have a CLDR resource bundle,
>>> with inheritance, where do I go to get inherited items?
>>>
>>> That is specified by CLDR by means of the parentLocale + truncation
>>> algorithm, plus the alias element. (There are a few cases where we have
>>> "Lateral Inheritance" where the specification is in the text of LDML,
>>> such as when looking for an alt variant.)
>>>
>>> So back to Rafael's original question:
>>>
>>>    1. en-Latn-GB, and zh-TW are not CLDR bundles, so this doesn't apply
>>>    to them.
>>>    2. en-US-u-nu-usd: the u-nu-usd doesn't select within a bundle, but
>>>    rather customizes a service that uses information in the bundle. The item
>>>    lookup (using by the currency formatting service) would be en-US =>
>>>    en => root.
>>>
>>>
>>> 2. *Bundle lookup. *Given a locale ID, where do I get the best matching
>>> CLDR bundle?
>>>
>>> My application has a set of supported locales, and the user comes in
>>> with a set of desired locales. What is the best bundle for that user?
>>>
>>> Here we are not as clear as we should be. The recommended process is in
>>> http://www.unicode.org/reports/tr35/#LanguageMatching
>>>
>>> So back to Rafael's original question:
>>>
>>>    1. en-Latn-GB, and zh-TW. When these are looked up with Language
>>>    Matching, assuming that all the CLDR locales are available, they would
>>>    return, respectively, en-GB and zh-Hant-TW.
>>>
>>> That being said, often people don't understand language matching, and so
>>> we are in the process of adding more information so that there is a direct
>>> mapping from between locale IDs that are always considered to be
>>> "identical" on a deep level, like en-GB and en-Latn-GB.
>>>
>>>
>>>
>>> Mark <https://google.com/+MarkDavis>
>>>
>>> *— Il meglio è l’inimico del bene —*
>>>
>>> On Fri, Dec 12, 2014 at 5:04 PM, John Emmons <emmo at us.ibm.com> wrote:
>>>
>>>> Yes, Edward, there is a very good reason we don't want zh-Hant to
>>>> inherit from zh.  Simply put, in situations where you have locale resources
>>>> that aren't 100% populated, allowing zh-Hant to inherit from zh produces a
>>>> mixture of simplified and traditional Chinese, which is acceptable to no
>>>> one.  This is what we call "cross script inheritance" in CLDR.  While it
>>>> might be acceptable to some in the case of Chinese, it is certainly a
>>>> bigger problem in languages like Serbian, where you have both Latin and
>>>> Cyrillic scripts in use, and you certainly don't ever want a mixture of
>>>> Latin and Cyrillic scripts
>>>>
>>>> These relationships are documented in CLDR's supplemental data, where
>>>> you have specified:
>>>>
>>>> <parentLocale parent="root" locales="az_Cyrl bm_Nkoo bs_Cyrl en_Dsrt
>>>> ha_Arab mn_Mong ms_Arab pa_Arab shi_Latn sr_Latn uz_Arab uz_Cyrl vai_Latn
>>>> zh_Hant"/>
>>>>
>>>>
>>>> Regards,
>>>>
>>>> John C. Emmons
>>>> Globalization Architect & Unicode CLDR TC Chairman
>>>> IBM Software Group
>>>> Internet: emmo at us.ibm.com
>>>>
>>>>
>>>> [image: Inactive hide details for Edwin Hoogerbeets ---12/11/2014
>>>> 07:41:26 PM---Rafael, also take a look at common/supplemental/likelyS]Edwin
>>>> Hoogerbeets ---12/11/2014 07:41:26 PM---Rafael, also take a look at
>>>> common/supplemental/likelySubtags.xml. If the caller has passed you an i
>>>>
>>>> From: Edwin Hoogerbeets <ehoogerbeets at gmail.com>
>>>> To: John Emmons/Austin/IBM at IBMUS, Rafael Xavier <rxaviers at gmail.com>
>>>> Cc: Jörn Zaefferer <joern.zaefferer at gmail.com>, "cldr-users at unicode.org"
>>>> <cldr-users at unicode.org>
>>>> Date: 12/11/2014 07:41 PM
>>>> Subject: Re: Bundle Lookup
>>>> ------------------------------
>>>>
>>>>
>>>>
>>>> Rafael, also take a look at common/supplemental/likelySubtags.xml. If
>>>> the caller has passed you an incompletely specified locale, you can use
>>>> those mappings to see if you can get to a locale for which you do have a
>>>> string bundle. I think that is the source for the "language aliases" to
>>>> which John was referring.
>>>>
>>>> John, for the last part of your example zh-TW inheritance chain,
>>>> wouldn't you just truncate "zh-Hant" again to "zh" like in the en-GB
>>>> example before inheriting from the root? If not, what is the reasoning
>>>> there? Is there already a document that specifies the inheritance rules in
>>>> CLDR?
>>>>
>>>> For efficiency, I can imagine you would put the common translations in
>>>> "zh" where there is no difference between traditional and simplified, and
>>>> other translations in "zh-Hant" or "zh-Hans" where there is. That would
>>>> save some disk space and you could leverage linguistic bug fixes at the
>>>> "zh" level. For other locales like "sr-Latn" and "sr-Cyrl" there would be
>>>> nothing in common so the string bundle at the "sr" level would be
>>>> essentially empty, but it should still appear in the inheritance chain just
>>>> in case.
>>>>
>>>> Edwin
>>>>
>>>>
>>>> On 12/11/2014 02:53 PM, John Emmons wrote:
>>>>
>>>>
>>>>    #3 is currently a problem, which we are working on.  Basically,
>>>>    "Latn" needs to be stripped out because it isn't necessary.  Then follow
>>>>    the normal inheritance:
>>>>
>>>>    en-GB: en-GB → (parentLocale) en-001 → (truncation) en → root
>>>>
>>>>    #4 - Any unicode locale extensions are meant to identify particular
>>>>    behaviors that are desired in the context of a given locale.  Think of them
>>>>    like "options".  They are not meant to be used in the context of bundle
>>>>    lookups.
>>>>
>>>>    #5 - zh_TW - Now that proper language aliases are in place ( See
>>>>    *http://unicode.org/cldr/trac/ticket/5949*
>>>>    <http://unicode.org/cldr/trac/ticket/5949> )
>>>>
>>>>    zh-TW: zh-TW → (languageAlias) zh-Hant-TW → (truncation) zh-Hant
>>>>     (parentLocale) → root
>>>>
>>>>    Regards,
>>>>
>>>>    John C. Emmons
>>>>    Globalization Architect & Unicode CLDR TC Chairman
>>>>    IBM Software Group
>>>>    Internet: *emmo at us.ibm.com* <emmo at us.ibm.com>
>>>>
>>>>
>>>>    [image: Inactive hide details for Rafael Xavier ---12/11/2014
>>>>    01:02:57 PM---Friends, This is a very basic question. See below. There ar]Rafael
>>>>    Xavier ---12/11/2014 01:02:57 PM---Friends, This is a very basic question.
>>>>    See below. There are lots of documentation
>>>>
>>>>    From: Rafael Xavier *<rxaviers at gmail.com>* <rxaviers at gmail.com>
>>>>    To: *"cldr-users at unicode.org"* <cldr-users at unicode.org>
>>>>    *<cldr-users at unicode.org>* <cldr-users at unicode.org>
>>>>    Cc: Jörn Zaefferer *<joern.zaefferer at gmail.com>*
>>>>    <joern.zaefferer at gmail.com>
>>>>    Date: 12/11/2014 01:02 PM
>>>>    Subject: Bundle Lookup
>>>>    Sent by: "CLDR-Users" *<cldr-users-bounces at unicode.org>*
>>>>    <cldr-users-bounces at unicode.org>
>>>>
>>>>    ------------------------------
>>>>
>>>>
>>>>
>>>>    Friends,
>>>>
>>>>    This is a very basic question. See below. There are lots of
>>>>    documentation about locale inheritance and matching. But, it fails in same
>>>>    cases to me.
>>>>
>>>> * Giving a locale, what's the procedure to find the **bundle** lookup
>>>>    chain?*
>>>>
>>>>    1. en-US: en-US → (truncation) en → root
>>>>
>>>>    This one is dead simple. No problem.
>>>>
>>>>    2. en-GB: en-GB → (parentLocale) en-001 → (truncation) en → root
>>>>
>>>>    This one is also dead simple. Although, documentation says en-GB →
>>>>    en. Is it outdated or am I doing something wrong?
>>>>
>>>>    Anyway, the ones I'm interested in knowing are:
>>>>
>>>>    3. en-Latn-GB
>>>>    4. en-US-u-nu-usd
>>>>    5. zh-TW
>>>>
>>>>    Please, could someone show me what's the chain of these locales
>>>>    (and obviously explain the steps)?
>>>>
>>>>    Thanks!
>>>>
>>>>    --
>>>> *+55 (16) 98138-1582* <%2B55%20%2816%29%2098138-1582>, *+1 (415)
>>>>    568-5854* <%2B1%20%28415%29%20568-5854>, skype: rxaviers
>>>> *http://rafael.xavier.blog.br* <http://rafael.xavier.blog.br/>
>>>>    _______________________________________________
>>>>    CLDR-Users mailing list
>>>> *CLDR-Users at unicode.org* <CLDR-Users at unicode.org>
>>>> *http://unicode.org/mailman/listinfo/cldr-users*
>>>>    <http://unicode.org/mailman/listinfo/cldr-users>
>>>>
>>>>
>>>>
>>>>    _______________________________________________
>>>>    CLDR-Users mailing list
>>>>    *CLDR-Users at unicode.org* <CLDR-Users at unicode.org>
>>>>    *http://unicode.org/mailman/listinfo/cldr-users*
>>>>    <http://unicode.org/mailman/listinfo/cldr-users>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> CLDR-Users mailing list
>>>> CLDR-Users at unicode.org
>>>> http://unicode.org/mailman/listinfo/cldr-users
>>>>
>>>>
>>>
>>
>> --
>> +55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers
>> http://rafael.xavier.blog.br
>>
>
>

-- 
+55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers
http://rafael.xavier.blog.br
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20141212/14804bd9/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://unicode.org/pipermail/cldr-users/attachments/20141212/14804bd9/attachment.gif>


More information about the CLDR-Users mailing list