CLDR proposal: Unicode algorithms should fall back to root, not to unrelated default locale

Markus Scherer markus.icu at gmail.com
Thu Apr 3 22:01:40 CDT 2014


On Thu, Apr 3, 2014 at 1:21 PM, Richard Wordingham <
richard.wordingham at ntlworld.com> wrote:

> Would language matching data take preference over either?
>

Language matching should happen earlier. You would match a desired language
against the list of known available languages. Then when you open a service
object there with the resulting language, you don't get into this situation.

How are break iteration rules meant to interact with dictionary-based
> word and line-breakers?
>

In CLDR and ICU, the rules specify the set of characters that need
dictionary support. (It's triggered by script, not by language.)

I expect that there will generally be data for language-specific
exceptions, overrides and such for more languages than character-level
segmentation rules. Those low-level rules should always fall back to root
when there is no language-specific data. I think the higher-level
exceptions should probably also avoid going through some default language.

markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20140403/681c51ce/attachment.html>


More information about the CLDR-Users mailing list