CLDR proposal: Unicode algorithms should fall back to root, not to unrelated default locale
Markus Scherer
markus.icu at gmail.com
Thu Apr 3 22:01:40 CDT 2014
On Thu, Apr 3, 2014 at 1:21 PM, Richard Wordingham <
richard.wordingham at ntlworld.com> wrote:
> Would language matching data take preference over either?
>
Language matching should happen earlier. You would match a desired language
against the list of known available languages. Then when you open a service
object there with the resulting language, you don't get into this situation.
How are break iteration rules meant to interact with dictionary-based
> word and line-breakers?
>
In CLDR and ICU, the rules specify the set of characters that need
dictionary support. (It's triggered by script, not by language.)
I expect that there will generally be data for language-specific
exceptions, overrides and such for more languages than character-level
segmentation rules. Those low-level rules should always fall back to root
when there is no language-specific data. I think the higher-level
exceptions should probably also avoid going through some default language.
markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20140403/681c51ce/attachment.html>
More information about the CLDR-Users
mailing list