<!DOCTYPE html>

<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>On 2024-08-19 03:33, Henri Sivonen wrote:</p>

    <blockquote type="cite"

cite="mid:CAJHk+8QHsNv_dkJ5iOvUZ+KJrxSnsS7AaxmR4z_NFjS8uVhhpg@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="ltr">

        <div class="gmail_quote">

          <div dir="ltr" class="gmail_attr">On Fri, Aug 16, 2024 at

            10:32 PM Jim DeLaHunt <<a

              href="mailto:list%2Bunicode@jdlh.com" target="_blank"

              moz-do-not-send="true">list+unicode@jdlh.com</a>>

            wrote:<br>

          </div>

          <blockquote class="gmail_quote"

style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On

            2024-08-15 02:08, Henri Sivonen via Unicode wrote:<br>

            <br>

            > UTS #39 is commonly used as the baseline for detecting

            IDN spoofs, and <br>

            > UTS #39 explicitly allows combining Han and Bopomofo.

            Considering that <br>

            > ㄚ looks confusable with 丫 and ㄠ looks confusable with

            幺, I’m wondering <br>

            > if it’s appropriate to explicitly allow this

            combination in the spoof <br>

            > detection context.…<br>

            <br>

            Are you asking about whether UTS #39 should allow this

            combination vs <br>

            being changed to forbid this combination? Or are you asking

            about <br>

            whether the rules of the Domain Name System should allow

            this combination?<br>

          </blockquote>

          <div><br>

          </div>

          <div>Foremost I'm asking if it's appropriate that browsers

            that in general refuse to render mixed-script domain labels

            in the Unicode form in the user interface (in the URL bar in

            particular) make an exception… for the combination of Han

            and Bopomofo.…</div>

        </div>

      </div>

    </blockquote>

    <p>Ah. I did not interpret "allow this combination" as referring to

      browser location bar behaviour, nor to it meaning "display in

      Unicode (U-Label) form instead of encoded ASCII (A-Label) form". <br>

    </p>

    <p>So you asking whether browsers should indicate to users that a

      domain name which combines Han and Bopmofo is untrustworthy?</p>

    <p>…</p>

    <p>Also,<br>

    </p>

    <blockquote type="cite"

cite="mid:CAJHk+8QHsNv_dkJ5iOvUZ+KJrxSnsS7AaxmR4z_NFjS8uVhhpg@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_quote">

          <blockquote class="gmail_quote"

style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            …There are a set of Label Generation Rules for the root

            zone[2] of the <br>

            DNS. They include rules for Chinese script labels[3] in the

            root zone. <br>

            In my simple-minded reading of those rules, Bopomofo

            characters are not <br>

            included in the repertoire. I suspect that means that the

            rules prevent <br>

            anyone from registering a .ㄅㄆㄇㄈ top-level domain, or a

            Chinese domain <br>

            with Bopomofo inclusions.<br>

            …<br>

            [2] <a class="moz-txt-link-rfc2396E"

href="https://icannwiki.org/Root_Zone_Label_Generation_Rules"><https://icannwiki.org/Root_Zone_Label_Generation_Rules><br>

              [3] </a><a class="moz-txt-link-rfc2396E"

href="https://www.icann.org/sites/default/files/lgr/rz-lgr-5-chinese-script-26may22-en.html"><https://www.icann.org/sites/default/files/lgr/rz-lgr-5-chinese-script-26may22-en.html></a>

            <br>

          </blockquote>

          <div><br>

          </div>

          <div>It indeed looks like the root LGRs currently don't allow

            Bopomofo, but it appears that they also don't allow Cyrillic

            TLDs, which do exist, so it seems that root LGRs are enough

            in a work-in-progress state not to draw definite conclusions

            from.<br>

          </div>

        </div>

      </div>

    </blockquote>

    <p>I overlooked something important in [2]: the ICANNWiki content is

      not ICANN content, it is a separate org documenting ICANN. And it

      turns out that their Root Zone Label Generation Rules page at [2]

      has stale content. ICANN's own page on Root Zone Label Generation

      Rules [6] describes version 5 of the root zone LGRs, which include

      entries for Cyrllic, Japanese, and Korean scripts in addition to

      Chinese.</p>

    <p>(I am making a note to update the ICANNWiki Root Zone LGRs page,

      [2], if that is how their wiki works.)<br>

    </p>

    <p>[6]

<a class="moz-txt-link-rfc2396E" href="https://www.icann.org/resources/pages/root-zone-lgr-2015-06-21-en"><https://www.icann.org/resources/pages/root-zone-lgr-2015-06-21-en></a>

      (Content dates from 2022. No, I don't know why they have a 2015

      date in their URL.)<br>

    </p>

    <blockquote type="cite"

cite="mid:CAJHk+8QHsNv_dkJ5iOvUZ+KJrxSnsS7AaxmR4z_NFjS8uVhhpg@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_quote">

          <div> </div>

          <blockquote class="gmail_quote"

style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            I understand that each top-level registry sets the rules for

            <br>

            second-level labels they will accept, though there is

            pressure from <br>

            ICANN communities to adopt standard LGRs. There are a set of

            suggested <br>

            Label Generation Rules for second-level labels[4]. As I read

            those <br>

            rules, at a superficial level they also seem to rule out

            Bopomofo <br>

            characters within Chinese language labels or Bopomofo-only

            labels.<br>

          </blockquote>

          <div><br>

          </div>

          <div>That particular rule set also excludes Hiragana and

            Katakana, so it's not clear that LGRs for Hani existing

            means the exclusion of Hanb, <span>Jpan, and Kore</span>.…

          </div>

        </div>

      </div>

    </blockquote>

    <p>Have a look at the version 5 LGRs [6]. There may also be

      second-level LGRs for other scripts like Japanese, Korean, and

      Cyrillic. I have not checked. Does that clarify?</p>

    <p><br>

    </p>

    <blockquote type="cite"

cite="mid:CAJHk+8QHsNv_dkJ5iOvUZ+KJrxSnsS7AaxmR4z_NFjS8uVhhpg@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_quote">

          <div>…(I didn't ask about Jpan in my initial post, despite Han

            口 and Katakana ロ existing, because of the different role of

            Hiragana and Katakana compared to the role of Bopomofo. I

            didn't ask about Kore, because I'm not aware of a

            confusability issue even if I have doubts about demand for

            Han + Hangul domain labels. I am curious, though, how users

            and domain holders deal with the 口 vs. ロ issue. Is the glyph

            size distinction consistent and obvious enough?)<br>

          </div>

        </div>

      </div>

    </blockquote>

    <p>You are not the first person to ask this question. Answers at

      Japanese Stack Exchange[7], Reddit[8], WaniKani[9]. Summary:

      readers differentiate the based on context, and sometimes when the

      context is ambiguous people interpret the written kanji to be the

      kana. The best summary: "Context is always the key in Japanese."

      Those replies also point out other visually similar kana and kanji

      pairs.</p>

    <p>[7]

<a class="moz-txt-link-rfc2396E" href="https://japanese.stackexchange.com/questions/13678/%E5%8F%A3%E3%83%AD-those-are-supposed-to-be-different-characters-how-can-you-tell/3025"><https://japanese.stackexchange.com/questions/13678/%E5%8F%A3%E3%83%AD-those-are-supposed-to-be-different-characters-how-can-you-tell/3025></a><br>

      [8]

<a class="moz-txt-link-rfc2396E" href="https://www.reddit.com/r/LearnJapanese/comments/ck3w4w/the_one_time_its_okay_to_confuse_%E3%83%AD_and_%E5%8F%A3_%E3%83%AD%E3%83%91%E3%82%AF/"><https://www.reddit.com/r/LearnJapanese/comments/ck3w4w/the_one_time_its_okay_to_confuse_%E3%83%AD_and_%E5%8F%A3_%E3%83%AD%E3%83%91%E3%82%AF/></a><br>

      [9]

<a class="moz-txt-link-rfc2396E" href="https://community.wanikani.com/t/katakana-ro-vs-mouth-kanji/26641"><https://community.wanikani.com/t/katakana-ro-vs-mouth-kanji/26641></a></p>

    <p>I hope this is helpful. Cheers!<br>

           —Jim DeLaHunt<br>

    </p>

    <pre class="moz-signature" cols="72">-- 

.   --Jim DeLaHunt, <a class="moz-txt-link-abbreviated" href="mailto:jdlh@jdlh.com">jdlh@jdlh.com</a>     <a class="moz-txt-link-freetext" href="http://blog.jdlh.com/">http://blog.jdlh.com/</a> (<a class="moz-txt-link-freetext" href="http://jdlh.com/">http://jdlh.com/</a>)

      multilingual websites consultant, Vancouver, B.C., Canada</pre>

  </body>

</html>