Bundle Lookup

John Emmons emmo at us.ibm.com
Fri Dec 12 10:04:51 CST 2014


Yes, Edward, there is a very good reason we don't want zh-Hant to inherit
from zh.  Simply put, in situations where you have locale resources that
aren't 100% populated, allowing zh-Hant to inherit from zh produces a
mixture of simplified and traditional Chinese, which is acceptable to no
one.  This is what we call "cross script inheritance" in CLDR.  While it
might be acceptable to some in the case of Chinese, it is certainly a
bigger problem in languages like Serbian, where you have both Latin and
Cyrillic scripts in use, and you certainly don't ever want a mixture of
Latin and Cyrillic scripts

These relationships are documented in CLDR's supplemental data, where you
have specified:

<parentLocale parent="root" locales="az_Cyrl bm_Nkoo bs_Cyrl en_Dsrt
ha_Arab mn_Mong ms_Arab pa_Arab shi_Latn sr_Latn uz_Arab uz_Cyrl vai_Latn
zh_Hant"/>


Regards,

John C. Emmons
Globalization Architect & Unicode CLDR TC Chairman
IBM Software Group
Internet: emmo at us.ibm.com




From:	Edwin Hoogerbeets <ehoogerbeets at gmail.com>
To:	John Emmons/Austin/IBM at IBMUS, Rafael Xavier
            <rxaviers at gmail.com>
Cc:	Jörn Zaefferer <joern.zaefferer at gmail.com>,
            "cldr-users at unicode.org" <cldr-users at unicode.org>
Date:	12/11/2014 07:41 PM
Subject:	Re: Bundle Lookup



Rafael, also take a look at common/supplemental/likelySubtags.xml. If the
caller has passed you an incompletely specified locale, you can use those
mappings to see if you can get to a locale for which you do have a string
bundle. I think that is the source for the "language aliases" to which John
was referring.

John, for the last part of your example zh-TW inheritance chain, wouldn't
you just truncate "zh-Hant" again to "zh" like in the en-GB example before
inheriting from the root? If not, what is the reasoning there? Is there
already a document that specifies the inheritance rules in CLDR?

For efficiency, I can imagine you would put the common translations in "zh"
where there is no difference between traditional and simplified, and other
translations in "zh-Hant" or "zh-Hans" where there is. That would save some
disk space and you could leverage linguistic bug fixes at the "zh" level.
For other locales like "sr-Latn" and "sr-Cyrl" there would be nothing in
common so the string bundle at the "sr" level would be essentially empty,
but it should still appear in the inheritance chain just in case.

Edwin


On 12/11/2014 02:53 PM, John Emmons wrote:


      #3 is currently a problem, which we are working on.  Basically,
      "Latn" needs to be stripped out because it isn't necessary.  Then
      follow the normal inheritance:

      en-GB: en-GB → (parentLocale) en-001 → (truncation) en → root

      #4 - Any unicode locale extensions are meant to identify particular
      behaviors that are desired in the context of a given locale.  Think
      of them like "options".  They are not meant to be used in the context
      of bundle lookups.

      #5 - zh_TW - Now that proper language aliases are in place ( See
      http://unicode.org/cldr/trac/ticket/5949 )

      zh-TW: zh-TW → (languageAlias) zh-Hant-TW → (truncation) zh-Hant
      (parentLocale) → root

      Regards,

      John C. Emmons
      Globalization Architect & Unicode CLDR TC Chairman
      IBM Software Group
      Internet: emmo at us.ibm.com


      Inactive
          hide details for Rafael Xavier ---12/11/2014
      01:02:57
          PM---Friends, This is a very basic question. See
      below. There
          arRafael Xavier ---12/11/2014 01:02:57
      PM---Friends, This is a very basic question. See below. There are
      lots of documentation

      From: Rafael Xavier <rxaviers at gmail.com>
      To: "cldr-users at unicode.org" <cldr-users at unicode.org>
      Cc: Jörn Zaefferer <joern.zaefferer at gmail.com>
      Date: 12/11/2014 01:02 PM
      Subject: Bundle Lookup
      Sent by: "CLDR-Users" <cldr-users-bounces at unicode.org>





      Friends,

      This is a very basic question. See below. There are lots of
      documentation about locale inheritance and matching. But, it fails in
      same cases to me.

      Giving a locale, what's the procedure to find the bundle lookup
      chain?

      1. en-US: en-US → (truncation) en → root

      This one is dead simple. No problem.

      2. en-GB: en-GB → (parentLocale) en-001 → (truncation) en → root

      This one is also dead simple. Although, documentation says en-GB →
      en. Is it outdated or am I doing something wrong?

      Anyway, the ones I'm interested in knowing are:

      3. en-Latn-GB
      4. en-US-u-nu-usd
      5. zh-TW

      Please, could someone show me what's the chain of these locales (and
      obviously explain the steps)?

      Thanks!

      --
      +55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers
      http://rafael.xavier.blog.br
      _______________________________________________
      CLDR-Users mailing list
      CLDR-Users at unicode.org
      http://unicode.org/mailman/listinfo/cldr-users



      _______________________________________________
      CLDR-Users mailing list
      CLDR-Users at unicode.org
      http://unicode.org/mailman/listinfo/cldr-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20141212/c077a09b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://unicode.org/pipermail/cldr-users/attachments/20141212/c077a09b/attachment.gif>


More information about the CLDR-Users mailing list