Peter Edberg via CLDR-Users
cldr-users at unicode.org
Thu May 25 18:55:29 CDT 2017
> On May 25, 2017, at 4:30 PM, Richard Wordingham via CLDR-Users <cldr-users at unicode.org> wrote:
> On Thu, 25 May 2017 14:39:58 -0700
> Peter Edberg via CLDR-Users <cldr-users at unicode.org> wrote:
>>> -u-ld-thai0-pali0 (using 0 to pad the subtags to 5 alphanum)
> I'm not sure why there should be line-breaking 'dictionary' for Pali in
> Thai script,
>>> Perhaps the -nodict should also be by script, e.g.
>>> still allows dictionary use for CJK, just none for Thai script.
> Most dictionaries should be identified by language, not script. The
> problem being addressed is the use of a Siamese dictionary for breaking
> text in other languages.
The issue is that libraries that implement this spec, such as ICU , would typically choose a dictionary to use based on script range. So one needs to be able to specify, e.g.
- For Thai script, use xxx dictionary.
- For Khmer script, use yyy dictionary.
The xxx and yyy would specify language, but you still need to associate them with a script.
- Peter E
> There is something practical that we haven't touched on. Should we be
> defining the language to be assumed for embedded foreign text?
> CLDR-Users mailing list
> CLDR-Users at unicode.org
More information about the CLDR-Users