propose th-u-lb-nodict

Peter Edberg via CLDR-Users cldr-users at unicode.org
Thu May 25 18:55:29 CDT 2017


> On May 25, 2017, at 4:30 PM, Richard Wordingham via CLDR-Users <cldr-users at unicode.org> wrote:
> 
> On Thu, 25 May 2017 14:39:58 -0700
> Peter Edberg via CLDR-Users <cldr-users at unicode.org> wrote:
> 
>>> -u-ld-thai0-pali0 (using 0 to pad the subtags to 5 alphanum)
>>> -u-ld-thai0-sanskrit
> 
> I'm not sure why there should be line-breaking 'dictionary' for Pali in
> Thai script, 
> 
>>> Perhaps the -nodict should also be by script, e.g.
>>> -u-ld-thai0-nodict
>>> still allows dictionary use for CJK, just none for Thai script.
> 
> Most dictionaries should be identified by language, not script.  The
> problem being addressed is the use of a Siamese dictionary for breaking
> text in other languages. 

The issue is that libraries that implement this spec, such as ICU , would typically choose a dictionary to use based on script range. So one needs to be able to specify, e.g.
- For Thai script, use xxx dictionary.
- For Khmer script, use yyy dictionary.

The xxx and yyy would specify language, but you still need to associate them with a script.

- Peter E

> 
> There is something practical that we haven't touched on.  Should we be
> defining the language to be assumed for embedded foreign text?
> 
> Richard.
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users



More information about the CLDR-Users mailing list