Traditional and Simplified Han in UTS 39
Karl Williamson via Unicode
unicode at unicode.org
Wed Dec 27 15:31:19 CST 2017
In UTS 39, it says, that optionally,
"Mark Chinese strings as “mixed script” if they contain both simplified
(S) and traditional (T) Chinese characters, using the Unihan data in the
Unicode Character Database [UCD].
"The criterion can only be applied if the language of the string is
known to be Chinese."
What does it mean for the language to "be known to be Chinese"? Is this
something algorithmically determinable, or does it come from information
about the input text that comes from outside the UCD?
The example given shows some Hirigana in the text. That clearly
indicates the language isn't Chinese. So in this example we can
algorithmically rule out that its Chinese.
And what does Chinese really mean here?
More information about the Unicode
mailing list