Re: Chén , Shěn and 沈 pinyin confusion

Markus Scherer markus.icu at gmail.com
Tue Sep 13 16:47:02 CDT 2016


The Names variant of the Han-Latin transform (e.g., via ICU Transliterator)
should do this -- as a preprocessing step.

The CLDR/ICU Collator does not currently offer a tailoring that would do
this automatically just while sorting. Adding such a variant would add at
least a couple of 100kB to the data size.

For Chinese and Japanese, I suggest you add a pronunciation field (pinyin
for zh-CN, Hiragana for ja); prefill it via the Transliterator, make it
visible to the user, let them fix it; sort by that.

markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20160913/db538648/attachment.html>


More information about the CLDR-Users mailing list