<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <div class="moz-cite-prefix">If applied to text fields of a certain

      length, good heuristics should be able to tease apart language use

      even for a unified encoding. I once played with a toy system for

      European languages and with extremely simple techniques got fairly

      decent discrimination. And I'm not a computational linguist. That

      effort convinced me that the problem is fairly tractable and that

      you should be able to reduce misidentification to some acceptable

      level in many situations.</div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">A./<br>

    </div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">On 9/7/2021 3:56 PM, Ken Whistler via

      Unicode wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:aa9ccb28-ad02-d9ce-3277-f0a0842b24eb@sonic.net">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      <p>William</p>

      <p>It wasn't done *in* the CNS and JIS encodings. In other words,

        if you are looking for some fancy mechanism that was used inside

        those old legacy encodings to do "signaling", you aren't going

        to find it.</p>

      <p>The point Doug was making was that in the old days if you knew

        (or could detect heuristically) that your data was in the CNS

        11643 encoding, well, by gum, it was pretty darn likely that it

        was data in the Chinese language, and people would prefer to

        look at it with a Chinese-style font. Contrariwise, if you knew

        (or could detect heuristically) that your data was in the JIS X

        0208 encoding, well, it was pretty darn likely that it was data

        in the Japanese language, and people would prefer to look at it

        with a Japanese-style font.</p>

      <p>This is really no different than knowing (or detecting

        heuristically) that your data was in the ASMO 449 standard, then

        it was pretty darn likely that it contained data in the Arabic

        language, and you'd better have a corresponding Arabic font

        ready to display it.</p>

      <p>--Ken<br>

      </p>

      <div class="moz-cite-prefix">On 9/7/2021 3:23 PM, William_J_G

        Overington via Unicode wrote:<br>

      </div>

      <blockquote type="cite"

        cite="mid:3cfa9ef5.35cf1.17bc25b7bc6.Webtop.110@btinternet.com">

        <p><span style="white-space: pre-wrap; display: inline !important;">Could someone possibly write about how "</span><span style="white-space: pre-wrap; display: inline !important;">character-set signaling — in-band or out-of-band — as a hint to display text in a Chinese-type or Japanese-type font" was/is done in the CNS and JIS encodings please?</span></p>

        <p><span style="white-space: pre-wrap; display: inline !important;">

</span></p>

      </blockquote>

    </blockquote>

    <p><br>

    </p>

  </body>

</html>