Traditional and Simplified Han in UTS 39

Philippe Verdy via Unicode unicode at unicode.org
Wed Dec 27 16:39:21 CST 2017


I bet it means the difference in terms of scripts, not in terms of
languages. So it says to use "Hani" instead of "Hans" or "Hant" if the
character forms cannot be determined, and this will apply equally if the
language is Chinese/Mandarin, Cantonese/Yue, Taiwanese, Wu, or even
Japanese.

For the Japanese language there's an additional mixed script code "Jpan"
when it uses a mis of sinograms, Katakana and Hiragana.
For the Chinese languages there should be a script code for
sinograms+Bopomofo (Bopomofo is rarely used alone, but most often with
Traditional sinograms; it occurs sometimes with Simplified sinograms as
well)

2017-12-27 22:31 GMT+01:00 Karl Williamson via Unicode <unicode at unicode.org>
:

> In UTS 39, it says, that optionally,
>
> "Mark Chinese strings as “mixed script” if they contain both simplified
> (S) and traditional (T) Chinese characters, using the Unihan data in the
> Unicode Character Database [UCD].
>
> "The criterion can only be applied if the language of the string is known
> to be Chinese."
>
> What does it mean for the language to "be known to be Chinese"?  Is this
> something algorithmically determinable, or does it come from information
> about the input text that comes from outside the UCD?
>
> The example given shows some Hirigana in the text.  That clearly indicates
> the language isn't Chinese.  So in this example we can algorithmically rule
> out that its Chinese.
>
> And what does Chinese really mean here?
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20171227/7bf43425/attachment.html>


More information about the Unicode mailing list