Re: Chén , Shěn and 沈 pinyin confusion
kazede at
Tue Sep 13 16:09:39 CDT 2016
Hi Mark,
I don't know of a authoritative list of Chinese surname pronunciations, and
a cursory Google search didn't reveal anything interesting.
>From what Peter's saying though, it sounds like CLDR has decent data on
this, so we might not need a second list.
On Tue, Sep 13, 2016 at 1:52 PM, Work <Mr at> wrote:
> Peter,
> Thankyou for your response.
> That in essence is what our application is trying to do - transform to
> pinyin then sort as pinyin (but display the Chinese text), but somehow we
> may be using the utilities at our disposal incorrectly?
> If it makes any difference I think we are using CLDR22 which is a little
> old and so not sure of limitations here or if Names variant was around then
> too.
> Is it possible to tell from the zh.xml file alone how names would resolve
> to the most likely result or is it trickier than that.
> Also can you advise how do you invoke the Hans-Latin names variant ?
> Sent from my iPhone
> On 13 Sep 2016, at 21:28, Peter Edberg <pedberg at> wrote:
> CLDR transforms do make this distinction. CLDR has a Names variant of the
> Han-Latin transform that specifically intended for surnames; this does in
> fact transform 沈, 华, and 单 using the name readings given below, as well as
> doing the same for a number of other characters.
> We do not currently have a collation variant that sorts by surname
> readings. However one could emulate that by first transforming to pinyin
> using the Han-Latin/Names transform, and then sorting using the pinyin
> result.
> - Peter E
> On Sep 13, 2016, at 1:06 PM, kz <kazede at> wrote:
> Hi Mark,
> Commenting as a Chinese speaker (and not a dev).
> Quite a few characters in Chinese have more than one pronunciation. In
> contexts such as people's names, it often comes down to which pronunciation
> their parents preferred while naming them. CLDR might have data on all the
> possible pronunciations of a character, but a phonebook application should
> allow users to override inferred pronunciation of a name.
> There's just a caveat for collating though. Collations are usually done on
> surnames in Chinese. Surnames in China (and other Chinese-speaking regions)
> follow a strict convention, so in the context of a surname, 沈 is 99% likely
> to be shěn rather than chén. Similar examples off the top of my head: 华
> (usually huá, as a surname huà) and 单 (usually dān, as a surname shàn). One
> should also take care of compound surnames
> <> (rare but not
> that rare).
> I'm not certain how much support CLDR provides for this use case.
> Thanks
> k
> On Tue, Sep 13, 2016 at 12:33 PM, Mark Robbie <mr at> wrote:
>> Hi,
>> We are using ICU and CLDR with SQLite. I am not a software developer but
>> a user of the output.
>> We have had some comments from Chinese colleagues on name sorting and I
>> am unsure if what we have is correct or if it is expected our development
>> team are supposed to use the tools in a different way. We are currently
>> sorting the phonebook by pinyin and an example of a comment we have had is
>> regarding “沈” when ends up being sorted as Chen, but our China team are
>> saying it should be Shen.
>> I am trying to figure out if the utilities should come up with the
>> generally accepted match out of the box or if “沈” really does map to 2
>> pinyin equivalents or if our dev team is supposed to override the default
>> rule to make Chen a Shen. I did notice in CLDR 24 for zh.xml that there is
>> an additional section called compounds and then says “Here 沈 collates as
>> shěn/7stk/rad85, between 弞 7/stk/rad57, 审 8stk/rad40”. I have not a
>> clue how to interpret this but am wondering if this means to override the
>> mapping to chén earlier in the table and if this was something learned in
>> CLDR for v24 onwards ?
>> Not being able to read Chinese I am unsure if there will be loads of
>> these examples or only a few and I believe our dev team have a similar
>> problem too and are relying of the default collations.
>> Any advice is very much appreciated.
>> Ps I did visit some other sites like Chinese tools and on searching for “
>> 沈” was offered Chén , Shěn and Tán as pinyin equivalents so I guess
>> there are more than 1, I am just wondering if for names (which in our case
>> it is a phonebook) there is some common knowledge it can only be Shěn.
>> I also managed to pin down a passing Chinese work colleague but all he
>> could say was is only and Chén is a ‘suggestions’ rather than actual match
>> (and then exited stage left in haste) – is that correct ?
>> Kind regards,
>> Mark Robbie,
>> _______________________________________________
>> CLDR-Users mailing list
>> CLDR-Users at
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>
More information about the CLDR-Users
mailing list