Re: Chén , Shěn and 沈 pinyin confusion

Wed Sep 14 01:55:57 CDT 2016

For info I tried using the transformation demo and selected Names and Names (Variant) and pasted 沈 to the input and got Chén at the output.

Does this mean 沈 will never transform to Shěn or there is some manual addition I need to make to the   'Compound 1' text box contents?

Sent from my iPhone

> On 14 Sep 2016, at 05:27, Work <Mr at eibbor.co.uk> wrote:
> 
> From what has been said earlier by Markus and Peter does anyone know if 沈 transforms/transliterates  to Shěn if the Names variant of Han-Latin transform is invoked ?
> 
> I think Peter's reply was saying it would, but I was not sure.
> 
> I will talk to Dev team about invoking the names variant and have a chat with guys about the pronunciation field as a catch all fall back.
> 
> At the minute the subject field mapping when views as a sorted list seems to be the big groan coming back at me, so maybe the invoking the  Names variant of Han-Latin transform is a quick win while we look into the pronunciation suggestion.
> 
> Thanks again.
> 
> Sent from my iPhone
> 
>> On 13 Sep 2016, at 22:47, Markus Scherer <markus.icu at gmail.com> wrote:
>> 
>> The Names variant of the Han-Latin transform (e.g., via ICU Transliterator) should do this -- as a preprocessing step.
>> 
>> The CLDR/ICU Collator does not currently offer a tailoring that would do this automatically just while sorting. Adding such a variant would add at least a couple of 100kB to the data size.
>> 
>> For Chinese and Japanese, I suggest you add a pronunciation field (pinyin for zh-CN, Hiragana for ja); prefill it via the Transliterator, make it visible to the user, let them fix it; sort by that.
>> 
>> markus