Re: Chén , Shěn and 沈 pinyin confusion

Work Mr at eibbor.co.uk
Tue Sep 13 15:41:26 CDT 2016


K,

Thanks for the response.

With respect to the China strict surname convention. Is there an authoritative online reference for this which I can use to independently check our basic CLDR and ICU mapping implementation and so our sorting of names and identify where our implementation  will attract criticism and so needs tailoring... Rather than wait for my colleagues in China to drip feed me complaints as they stumble over them, which I fear may end up happening.



Sent from my iPhone

> On 13 Sep 2016, at 21:06, kz <kazede at google.com> wrote:
> 
> Hi Mark,
> 
> Commenting as a Chinese speaker (and not a dev).
> 
> Quite a few characters in Chinese have more than one pronunciations. In contexts such as people's names, it often comes down to which pronunciation their parents preferred while naming them. CLDR might have data on all the possible pronunciations of a character, but a phonebook application should allow users to override inferred pronunciation of a name.
> 
> There's just a caveat for collating though. Collations are usually done on surnames in Chinese. Surnames in China (and other Chinese-speaking regions) follow a strict convention, so in the context of a surname, 沈 is 99% likely to be shěn rather than chén. Similar examples out of the top of my head: 华 (usually huá, as a surname huà) and 单 (usually dān, as a surname shàn). One should also take care of compound surnames (rare but not that rare).
> 
> I'm not certain how much support CLDR provides for this use case.
> 
> 
> Thanks
> k
> 
>> On Tue, Sep 13, 2016 at 12:33 PM, Mark Robbie <mr at eibbor.co.uk> wrote:
>> Hi,
>> 
>>  
>> 
>> We are using ICU and CLDR with SQLite. I am not a software developer but a user of the output.
>> 
>>  
>> 
>> We have had some comments from Chinese colleagues on name sorting and I am unsure if what we have is correct or if it is expected our development team are supposed to use the tools in a different way.  We are currently sorting the phonebook by pinyin and an example of a comment we have had is regarding “沈” when ends up being sorted as Chen, but our China team are saying it should be Shen.
>> 
>>  
>> 
>> I am trying to figure out if  the utilities should come up with the generally accepted match out of the box or if  “沈” really does map to 2 pinyin equivalents or if our dev team is supposed to override the default rule to make Chen a Shen. I did notice in CLDR 24 for zh.xml that there is an additional section called compounds and then says “Here 沈 collates as shěn/7stk/rad85, between 弞 7/stk/rad57, 审 8stk/rad40”.  I have not a clue how to interpret this but am wondering if this means to override the mapping to chén earlier in the table and if this was something learned in CLDR for v24 onwards ?
>> 
>>  
>> 
>> Not being able to read Chinese I am unsure if there will be loads of these examples or only a few and I believe our dev team have a similar problem too and are relying of the default collations.
>> 
>>  
>> 
>> Any advice is very much appreciated.
>> 
>>  
>> 
>> Ps I did visit some other sites like Chinese tools and on searching for “沈” was offered Chén , Shěn and Tán as pinyin equivalents so I guess there are more than 1, I am just wondering if for names (which in our case it is a phonebook) there is some common knowledge it can only be Shěn.
>> 
>>  
>> 
>> I also managed to pin down a passing Chinese work colleague but all he could say was is only and Chén is a ‘suggestions’ rather than actual match (and then exited stage left in haste) – is that correct ?
>> 
>>  
>> 
>> Kind regards,
>> 
>>  
>> 
>> Mark Robbie,
>> 
>>  
>> 
>> 
>> _______________________________________________
>> CLDR-Users mailing list
>> CLDR-Users at unicode.org
>> http://unicode.org/mailman/listinfo/cldr-users
> 
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20160913/369b401c/attachment.html>


More information about the CLDR-Users mailing list