CLDR proposal: Unicode algorithms should fall back to root, not to unrelated default locale

Markus Scherer at
Thu Apr 3 22:01:40 CDT 2014

On Thu, Apr 3, 2014 at 1:21 PM, Richard Wordingham <
richard.wordingham at> wrote:

> Would language matching data take preference over either?

Language matching should happen earlier. You would match a desired language
against the list of known available languages. Then when you open a service
object there with the resulting language, you don't get into this situation.

How are break iteration rules meant to interact with dictionary-based
> word and line-breakers?

In CLDR and ICU, the rules specify the set of characters that need
dictionary support. (It's triggered by script, not by language.)

I expect that there will generally be data for language-specific
exceptions, overrides and such for more languages than character-level
segmentation rules. Those low-level rules should always fall back to root
when there is no language-specific data. I think the higher-level
exceptions should probably also avoid going through some default language.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the CLDR-Users mailing list