CLDR proposal: Unicode algorithms should fall back to root, not to unrelated default locale
markus.icu at gmail.com
Thu Apr 3 12:01:02 CDT 2014
Dear CLDR team & users,
We have consensus in the ICU team for a modified fallback policy for when
data is requested for a service based on a Unicode algorithm.
Assuming that such a policy is appropriate for the LDML spec (I have not
looked whether the spec currently mentions fallbacks in the absence of
data), I propose that we add the following:
When requesting a specific locale for collation, break iteration, or case
mapping, when we do not have any data for even the locale's base language,
then we should fall back to the root locale rather than the default locale.
Note: This will not change behavior for languages for which we do have
specific data for the service, even if it is an empty data file.
Each of these services tailors a Unicode algorithm which is explicitly
designed to provide reasonable default behavior when no language-specific
behavior is known or available.
For example, in 2012/ICU 52m1, we had an “environment test” failure (
ticket:10277 <http://bugs.icu-project.org/trac/ticket/10277>) that was
caused by requesting Basque (eu) collation and AlphabeticIndex when the
default locale was Azerbaijani (az), Lithuanian, or Ethiopian (et) (and
maybe more languages); in Azerbaijani, x sorts between h and i; this is
undesirable when the request was for Basque. In the absence of specific
Basque data, we should assume that the all-Unicode root sort order is
Similarly, it is undesirable to fall back from French to Turkish case
mappings, or from Italian to Finnish line breaking.
By contrast, for UI languages, display names, and formatting, the root
locale is not useful: No UI messages, ISO codes instead of display names,
minimal patterns. By falling back to a default locale, the user gets
strings in what is hopefully a language they understand, even if not the
language they requested.
Google Internationalization Engineering
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the CLDR-Users