Traditional and Simplified Han in UTS 39

Karl Williamson via Unicode unicode at unicode.org
Wed Dec 27 15:31:19 CST 2017


In UTS 39, it says, that optionally,

"Mark Chinese strings as “mixed script” if they contain both simplified 
(S) and traditional (T) Chinese characters, using the Unihan data in the 
Unicode Character Database [UCD].

"The criterion can only be applied if the language of the string is 
known to be Chinese."

What does it mean for the language to "be known to be Chinese"?  Is this 
something algorithmically determinable, or does it come from information 
about the input text that comes from outside the UCD?

The example given shows some Hirigana in the text.  That clearly 
indicates the language isn't Chinese.  So in this example we can 
algorithmically rule out that its Chinese.

And what does Chinese really mean here?



More information about the Unicode mailing list