<!DOCTYPE html>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>On 2024-08-19 03:33, Henri Sivonen wrote:</p>
    <blockquote type="cite"
cite="mid:CAJHk+8QHsNv_dkJ5iOvUZ+KJrxSnsS7AaxmR4z_NFjS8uVhhpg@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div class="gmail_quote">
          <div dir="ltr" class="gmail_attr">On Fri, Aug 16, 2024 at
            10:32 PM Jim DeLaHunt <<a
              href="mailto:list%2Bunicode@jdlh.com" target="_blank"
              moz-do-not-send="true">list+unicode@jdlh.com</a>>
            wrote:<br>
          </div>
          <blockquote class="gmail_quote"
style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On
            2024-08-15 02:08, Henri Sivonen via Unicode wrote:<br>
            <br>
            > UTS #39 is commonly used as the baseline for detecting
            IDN spoofs, and <br>
            > UTS #39 explicitly allows combining Han and Bopomofo.
            Considering that <br>
            > ㄚ looks confusable with 丫 and ㄠ looks confusable with
            幺, I’m wondering <br>
            > if it’s appropriate to explicitly allow this
            combination in the spoof <br>
            > detection context.…<br>
            <br>
            Are you asking about whether UTS #39 should allow this
            combination vs <br>
            being changed to forbid this combination? Or are you asking
            about <br>
            whether the rules of the Domain Name System should allow
            this combination?<br>
          </blockquote>
          <div><br>
          </div>
          <div>Foremost I'm asking if it's appropriate that browsers
            that in general refuse to render mixed-script domain labels
            in the Unicode form in the user interface (in the URL bar in
            particular) make an exception… for the combination of Han
            and Bopomofo.…</div>
        </div>
      </div>
    </blockquote>
    <p>Ah. I did not interpret "allow this combination" as referring to
      browser location bar behaviour, nor to it meaning "display in
      Unicode (U-Label) form instead of encoded ASCII (A-Label) form". <br>
    </p>
    <p>So you asking whether browsers should indicate to users that a
      domain name which combines Han and Bopmofo is untrustworthy?</p>
    <p>…</p>
    <p>Also,<br>
    </p>
    <blockquote type="cite"
cite="mid:CAJHk+8QHsNv_dkJ5iOvUZ+KJrxSnsS7AaxmR4z_NFjS8uVhhpg@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <blockquote class="gmail_quote"
style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            …There are a set of Label Generation Rules for the root
            zone[2] of the <br>
            DNS. They include rules for Chinese script labels[3] in the
            root zone. <br>
            In my simple-minded reading of those rules, Bopomofo
            characters are not <br>
            included in the repertoire. I suspect that means that the
            rules prevent <br>
            anyone from registering a .ㄅㄆㄇㄈ top-level domain, or a
            Chinese domain <br>
            with Bopomofo inclusions.<br>
            …<br>
            [2] <a class="moz-txt-link-rfc2396E"
href="https://icannwiki.org/Root_Zone_Label_Generation_Rules"><https://icannwiki.org/Root_Zone_Label_Generation_Rules><br>
              [3] </a><a class="moz-txt-link-rfc2396E"
href="https://www.icann.org/sites/default/files/lgr/rz-lgr-5-chinese-script-26may22-en.html"><https://www.icann.org/sites/default/files/lgr/rz-lgr-5-chinese-script-26may22-en.html></a>
            <br>
          </blockquote>
          <div><br>
          </div>
          <div>It indeed looks like the root LGRs currently don't allow
            Bopomofo, but it appears that they also don't allow Cyrillic
            TLDs, which do exist, so it seems that root LGRs are enough
            in a work-in-progress state not to draw definite conclusions
            from.<br>
          </div>
        </div>
      </div>
    </blockquote>
    <p>I overlooked something important in [2]: the ICANNWiki content is
      not ICANN content, it is a separate org documenting ICANN. And it
      turns out that their Root Zone Label Generation Rules page at [2]
      has stale content. ICANN's own page on Root Zone Label Generation
      Rules [6] describes version 5 of the root zone LGRs, which include
      entries for Cyrllic, Japanese, and Korean scripts in addition to
      Chinese.</p>
    <p>(I am making a note to update the ICANNWiki Root Zone LGRs page,
      [2], if that is how their wiki works.)<br>
    </p>
    <p>[6]
<a class="moz-txt-link-rfc2396E" href="https://www.icann.org/resources/pages/root-zone-lgr-2015-06-21-en"><https://www.icann.org/resources/pages/root-zone-lgr-2015-06-21-en></a>
      (Content dates from 2022. No, I don't know why they have a 2015
      date in their URL.)<br>
    </p>
    <blockquote type="cite"
cite="mid:CAJHk+8QHsNv_dkJ5iOvUZ+KJrxSnsS7AaxmR4z_NFjS8uVhhpg@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <div> </div>
          <blockquote class="gmail_quote"
style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            I understand that each top-level registry sets the rules for
            <br>
            second-level labels they will accept, though there is
            pressure from <br>
            ICANN communities to adopt standard LGRs. There are a set of
            suggested <br>
            Label Generation Rules for second-level labels[4]. As I read
            those <br>
            rules, at a superficial level they also seem to rule out
            Bopomofo <br>
            characters within Chinese language labels or Bopomofo-only
            labels.<br>
          </blockquote>
          <div><br>
          </div>
          <div>That particular rule set also excludes Hiragana and
            Katakana, so it's not clear that LGRs for Hani existing
            means the exclusion of Hanb, <span>Jpan, and Kore</span>.…
          </div>
        </div>
      </div>
    </blockquote>
    <p>Have a look at the version 5 LGRs [6]. There may also be
      second-level LGRs for other scripts like Japanese, Korean, and
      Cyrillic. I have not checked. Does that clarify?</p>
    <p><br>
    </p>
    <blockquote type="cite"
cite="mid:CAJHk+8QHsNv_dkJ5iOvUZ+KJrxSnsS7AaxmR4z_NFjS8uVhhpg@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <div>…(I didn't ask about Jpan in my initial post, despite Han
            口 and Katakana ロ existing, because of the different role of
            Hiragana and Katakana compared to the role of Bopomofo. I
            didn't ask about Kore, because I'm not aware of a
            confusability issue even if I have doubts about demand for
            Han + Hangul domain labels. I am curious, though, how users
            and domain holders deal with the 口 vs. ロ issue. Is the glyph
            size distinction consistent and obvious enough?)<br>
          </div>
        </div>
      </div>
    </blockquote>
    <p>You are not the first person to ask this question. Answers at
      Japanese Stack Exchange[7], Reddit[8], WaniKani[9]. Summary:
      readers differentiate the based on context, and sometimes when the
      context is ambiguous people interpret the written kanji to be the
      kana. The best summary: "Context is always the key in Japanese."
      Those replies also point out other visually similar kana and kanji
      pairs.</p>
    <p>[7]
<a class="moz-txt-link-rfc2396E" href="https://japanese.stackexchange.com/questions/13678/%E5%8F%A3%E3%83%AD-those-are-supposed-to-be-different-characters-how-can-you-tell/3025"><https://japanese.stackexchange.com/questions/13678/%E5%8F%A3%E3%83%AD-those-are-supposed-to-be-different-characters-how-can-you-tell/3025></a><br>
      [8]
<a class="moz-txt-link-rfc2396E" href="https://www.reddit.com/r/LearnJapanese/comments/ck3w4w/the_one_time_its_okay_to_confuse_%E3%83%AD_and_%E5%8F%A3_%E3%83%AD%E3%83%91%E3%82%AF/"><https://www.reddit.com/r/LearnJapanese/comments/ck3w4w/the_one_time_its_okay_to_confuse_%E3%83%AD_and_%E5%8F%A3_%E3%83%AD%E3%83%91%E3%82%AF/></a><br>
      [9]
<a class="moz-txt-link-rfc2396E" href="https://community.wanikani.com/t/katakana-ro-vs-mouth-kanji/26641"><https://community.wanikani.com/t/katakana-ro-vs-mouth-kanji/26641></a></p>
    <p>I hope this is helpful. Cheers!<br>
           —Jim DeLaHunt<br>
    </p>
    <pre class="moz-signature" cols="72">-- 
.   --Jim DeLaHunt, <a class="moz-txt-link-abbreviated" href="mailto:jdlh@jdlh.com">jdlh@jdlh.com</a>     <a class="moz-txt-link-freetext" href="http://blog.jdlh.com/">http://blog.jdlh.com/</a> (<a class="moz-txt-link-freetext" href="http://jdlh.com/">http://jdlh.com/</a>)
      multilingual websites consultant, Vancouver, B.C., Canada</pre>
  </body>
</html>