Bundle Lookup
John Emmons
emmo at us.ibm.com
Fri Dec 12 10:04:51 CST 2014
Yes, Edward, there is a very good reason we don't want zh-Hant to inherit
from zh. Simply put, in situations where you have locale resources that
aren't 100% populated, allowing zh-Hant to inherit from zh produces a
mixture of simplified and traditional Chinese, which is acceptable to no
one. This is what we call "cross script inheritance" in CLDR. While it
might be acceptable to some in the case of Chinese, it is certainly a
bigger problem in languages like Serbian, where you have both Latin and
Cyrillic scripts in use, and you certainly don't ever want a mixture of
Latin and Cyrillic scripts
These relationships are documented in CLDR's supplemental data, where you
have specified:
<parentLocale parent="root" locales="az_Cyrl bm_Nkoo bs_Cyrl en_Dsrt
ha_Arab mn_Mong ms_Arab pa_Arab shi_Latn sr_Latn uz_Arab uz_Cyrl vai_Latn
zh_Hant"/>
Regards,
John C. Emmons
Globalization Architect & Unicode CLDR TC Chairman
IBM Software Group
Internet: emmo at us.ibm.com
From: Edwin Hoogerbeets <ehoogerbeets at gmail.com>
To: John Emmons/Austin/IBM at IBMUS, Rafael Xavier
<rxaviers at gmail.com>
Cc: Jörn Zaefferer <joern.zaefferer at gmail.com>,
"cldr-users at unicode.org" <cldr-users at unicode.org>
Date: 12/11/2014 07:41 PM
Subject: Re: Bundle Lookup
Rafael, also take a look at common/supplemental/likelySubtags.xml. If the
caller has passed you an incompletely specified locale, you can use those
mappings to see if you can get to a locale for which you do have a string
bundle. I think that is the source for the "language aliases" to which John
was referring.
John, for the last part of your example zh-TW inheritance chain, wouldn't
you just truncate "zh-Hant" again to "zh" like in the en-GB example before
inheriting from the root? If not, what is the reasoning there? Is there
already a document that specifies the inheritance rules in CLDR?
For efficiency, I can imagine you would put the common translations in "zh"
where there is no difference between traditional and simplified, and other
translations in "zh-Hant" or "zh-Hans" where there is. That would save some
disk space and you could leverage linguistic bug fixes at the "zh" level.
For other locales like "sr-Latn" and "sr-Cyrl" there would be nothing in
common so the string bundle at the "sr" level would be essentially empty,
but it should still appear in the inheritance chain just in case.
Edwin
On 12/11/2014 02:53 PM, John Emmons wrote:
#3 is currently a problem, which we are working on. Basically,
"Latn" needs to be stripped out because it isn't necessary. Then
follow the normal inheritance:
en-GB: en-GB → (parentLocale) en-001 → (truncation) en → root
#4 - Any unicode locale extensions are meant to identify particular
behaviors that are desired in the context of a given locale. Think
of them like "options". They are not meant to be used in the context
of bundle lookups.
#5 - zh_TW - Now that proper language aliases are in place ( See
http://unicode.org/cldr/trac/ticket/5949 )
zh-TW: zh-TW → (languageAlias) zh-Hant-TW → (truncation) zh-Hant
(parentLocale) → root
Regards,
John C. Emmons
Globalization Architect & Unicode CLDR TC Chairman
IBM Software Group
Internet: emmo at us.ibm.com
Inactive
hide details for Rafael Xavier ---12/11/2014
01:02:57
PM---Friends, This is a very basic question. See
below. There
arRafael Xavier ---12/11/2014 01:02:57
PM---Friends, This is a very basic question. See below. There are
lots of documentation
From: Rafael Xavier <rxaviers at gmail.com>
To: "cldr-users at unicode.org" <cldr-users at unicode.org>
Cc: Jörn Zaefferer <joern.zaefferer at gmail.com>
Date: 12/11/2014 01:02 PM
Subject: Bundle Lookup
Sent by: "CLDR-Users" <cldr-users-bounces at unicode.org>
Friends,
This is a very basic question. See below. There are lots of
documentation about locale inheritance and matching. But, it fails in
same cases to me.
Giving a locale, what's the procedure to find the bundle lookup
chain?
1. en-US: en-US → (truncation) en → root
This one is dead simple. No problem.
2. en-GB: en-GB → (parentLocale) en-001 → (truncation) en → root
This one is also dead simple. Although, documentation says en-GB →
en. Is it outdated or am I doing something wrong?
Anyway, the ones I'm interested in knowing are:
3. en-Latn-GB
4. en-US-u-nu-usd
5. zh-TW
Please, could someone show me what's the chain of these locales (and
obviously explain the steps)?
Thanks!
--
+55 (16) 98138-1582, +1 (415) 568-5854, skype: rxaviers
http://rafael.xavier.blog.br
_______________________________________________
CLDR-Users mailing list
CLDR-Users at unicode.org
http://unicode.org/mailman/listinfo/cldr-users
_______________________________________________
CLDR-Users mailing list
CLDR-Users at unicode.org
http://unicode.org/mailman/listinfo/cldr-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20141212/c077a09b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://unicode.org/pipermail/cldr-users/attachments/20141212/c077a09b/attachment.gif>
More information about the CLDR-Users
mailing list