Re: Chén , Shěn and 沈 pinyin confusion

kz kazede at google.com
Tue Sep 13 16:09:39 CDT 2016


Hi Mark,

I don't know of a authoritative list of Chinese surname pronunciations, and
a cursory Google search didn't reveal anything interesting.

>From what Peter's saying though, it sounds like CLDR has decent data on
this, so we might not need a second list.


Thanks
k

On Tue, Sep 13, 2016 at 1:52 PM, Work <Mr at eibbor.co.uk> wrote:

> Peter,
>
> Thankyou for your response.
>
> That in essence is what our application is trying to do - transform to
> pinyin then sort as pinyin (but display the Chinese text), but somehow we
> may be using the utilities at our disposal incorrectly?
>
> If it makes any difference I think we are using CLDR22 which is a little
> old and so not sure of limitations here or if Names variant was around then
> too.
>
> Is it possible to tell from the zh.xml file alone how names would resolve
> to the most likely result or is it trickier than that.
>
> Also can you advise how do you invoke the Hans-Latin names variant ?
>
>
> Sent from my iPhone
>
> On 13 Sep 2016, at 21:28, Peter Edberg <pedberg at apple.com> wrote:
>
> CLDR transforms do make this distinction. CLDR has a Names variant of the
> Han-Latin transform that specifically intended for surnames; this does in
> fact transform 沈, 华, and 单 using the name readings given below, as well as
> doing the same for a number of other characters.
>
> We do not currently have a collation variant that sorts by surname
> readings. However one could emulate that by first transforming to pinyin
> using the Han-Latin/Names transform, and then sorting using the pinyin
> result.
>
> - Peter E
>
>
> On Sep 13, 2016, at 1:06 PM, kz <kazede at google.com> wrote:
>
> Hi Mark,
>
> Commenting as a Chinese speaker (and not a dev).
>
> Quite a few characters in Chinese have more than one pronunciation. In
> contexts such as people's names, it often comes down to which pronunciation
> their parents preferred while naming them. CLDR might have data on all the
> possible pronunciations of a character, but a phonebook application should
> allow users to override inferred pronunciation of a name.
>
> There's just a caveat for collating though. Collations are usually done on
> surnames in Chinese. Surnames in China (and other Chinese-speaking regions)
> follow a strict convention, so in the context of a surname, 沈 is 99% likely
> to be shěn rather than chén. Similar examples off the top of my head: 华
> (usually huá, as a surname huà) and 单 (usually dān, as a surname shàn). One
> should also take care of compound surnames
> <https://en.wikipedia.org/wiki/Chinese_compound_surname> (rare but not
> that rare).
>
> I'm not certain how much support CLDR provides for this use case.
>
>
> Thanks
> k
>
> On Tue, Sep 13, 2016 at 12:33 PM, Mark Robbie <mr at eibbor.co.uk> wrote:
>
>> Hi,
>>
>>
>>
>> We are using ICU and CLDR with SQLite. I am not a software developer but
>> a user of the output.
>>
>>
>>
>> We have had some comments from Chinese colleagues on name sorting and I
>> am unsure if what we have is correct or if it is expected our development
>> team are supposed to use the tools in a different way.  We are currently
>> sorting the phonebook by pinyin and an example of a comment we have had is
>> regarding “沈” when ends up being sorted as Chen, but our China team are
>> saying it should be Shen.
>>
>>
>>
>> I am trying to figure out if  the utilities should come up with the
>> generally accepted match out of the box or if  “沈” really does map to 2
>> pinyin equivalents or if our dev team is supposed to override the default
>> rule to make Chen a Shen. I did notice in CLDR 24 for zh.xml that there is
>> an additional section called compounds and then says “Here 沈 collates as
>> shěn/7stk/rad85, between 弞 7/stk/rad57, 审 8stk/rad40”.  I have not a
>> clue how to interpret this but am wondering if this means to override the
>> mapping to chén earlier in the table and if this was something learned in
>> CLDR for v24 onwards ?
>>
>>
>>
>> Not being able to read Chinese I am unsure if there will be loads of
>> these examples or only a few and I believe our dev team have a similar
>> problem too and are relying of the default collations.
>>
>>
>>
>> Any advice is very much appreciated.
>>
>>
>>
>> Ps I did visit some other sites like Chinese tools and on searching for “
>> 沈” was offered Chén , Shěn and Tán as pinyin equivalents so I guess
>> there are more than 1, I am just wondering if for names (which in our case
>> it is a phonebook) there is some common knowledge it can only be Shěn.
>>
>>
>>
>> I also managed to pin down a passing Chinese work colleague but all he
>> could say was is only and Chén is a ‘suggestions’ rather than actual match
>> (and then exited stage left in haste) – is that correct ?
>>
>>
>>
>> Kind regards,
>>
>>
>>
>> Mark Robbie,
>>
>>
>>
>> _______________________________________________
>> CLDR-Users mailing list
>> CLDR-Users at unicode.org
>> http://unicode.org/mailman/listinfo/cldr-users
>>
>>
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>
>
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20160913/5cc59c4f/attachment.html>


More information about the CLDR-Users mailing list