<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-cite-prefix">If applied to text fields of a certain
length, good heuristics should be able to tease apart language use
even for a unified encoding. I once played with a toy system for
European languages and with extremely simple techniques got fairly
decent discrimination. And I'm not a computational linguist. That
effort convinced me that the problem is fairly tractable and that
you should be able to reduce misidentification to some acceptable
level in many situations.</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">A./<br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">On 9/7/2021 3:56 PM, Ken Whistler via
Unicode wrote:<br>
</div>
<blockquote type="cite"
cite="mid:aa9ccb28-ad02-d9ce-3277-f0a0842b24eb@sonic.net">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<p>William</p>
<p>It wasn't done *in* the CNS and JIS encodings. In other words,
if you are looking for some fancy mechanism that was used inside
those old legacy encodings to do "signaling", you aren't going
to find it.</p>
<p>The point Doug was making was that in the old days if you knew
(or could detect heuristically) that your data was in the CNS
11643 encoding, well, by gum, it was pretty darn likely that it
was data in the Chinese language, and people would prefer to
look at it with a Chinese-style font. Contrariwise, if you knew
(or could detect heuristically) that your data was in the JIS X
0208 encoding, well, it was pretty darn likely that it was data
in the Japanese language, and people would prefer to look at it
with a Japanese-style font.</p>
<p>This is really no different than knowing (or detecting
heuristically) that your data was in the ASMO 449 standard, then
it was pretty darn likely that it contained data in the Arabic
language, and you'd better have a corresponding Arabic font
ready to display it.</p>
<p>--Ken<br>
</p>
<div class="moz-cite-prefix">On 9/7/2021 3:23 PM, William_J_G
Overington via Unicode wrote:<br>
</div>
<blockquote type="cite"
cite="mid:3cfa9ef5.35cf1.17bc25b7bc6.Webtop.110@btinternet.com">
<p><span style="white-space: pre-wrap; display: inline !important;">Could someone possibly write about how "</span><span style="white-space: pre-wrap; display: inline !important;">character-set signaling — in-band or out-of-band — as a hint to display text in a Chinese-type or Japanese-type font" was/is done in the CNS and JIS encodings please?</span></p>
<p><span style="white-space: pre-wrap; display: inline !important;">
</span></p>
</blockquote>
</blockquote>
<p><br>
</p>
</body>
</html>