Hanb in domain labels
Jim DeLaHunt
list+unicode at jdlh.com
Mon Aug 19 14:19:35 CDT 2024
On 2024-08-19 03:33, Henri Sivonen wrote:
> On Fri, Aug 16, 2024 at 10:32 PM Jim DeLaHunt <list+unicode at jdlh.com
> <mailto:list%2Bunicode at jdlh.com>> wrote:
>
> On 2024-08-15 02:08, Henri Sivonen via Unicode wrote:
>
> > UTS #39 is commonly used as the baseline for detecting IDN
> spoofs, and
> > UTS #39 explicitly allows combining Han and Bopomofo.
> Considering that
> > ㄚ looks confusable with 丫 and ㄠ looks confusable with 幺, I’m
> wondering
> > if it’s appropriate to explicitly allow this combination in the
> spoof
> > detection context.…
>
> Are you asking about whether UTS #39 should allow this combination vs
> being changed to forbid this combination? Or are you asking about
> whether the rules of the Domain Name System should allow this
> combination?
>
>
> Foremost I'm asking if it's appropriate that browsers that in general
> refuse to render mixed-script domain labels in the Unicode form in the
> user interface (in the URL bar in particular) make an exception… for
> the combination of Han and Bopomofo.…
Ah. I did not interpret "allow this combination" as referring to browser
location bar behaviour, nor to it meaning "display in Unicode (U-Label)
form instead of encoded ASCII (A-Label) form".
So you asking whether browsers should indicate to users that a domain
name which combines Han and Bopmofo is untrustworthy?
…
Also,
> …There are a set of Label Generation Rules for the root zone[2] of
> the
> DNS. They include rules for Chinese script labels[3] in the root
> zone.
> In my simple-minded reading of those rules, Bopomofo characters
> are not
> included in the repertoire. I suspect that means that the rules
> prevent
> anyone from registering a .ㄅㄆㄇㄈ top-level domain, or a Chinese
> domain
> with Bopomofo inclusions.
> …
> [2] <https://icannwiki.org/Root_Zone_Label_Generation_Rules>
> [3]
> <https://www.icann.org/sites/default/files/lgr/rz-lgr-5-chinese-script-26may22-en.html>
>
>
>
> It indeed looks like the root LGRs currently don't allow Bopomofo, but
> it appears that they also don't allow Cyrillic TLDs, which do exist,
> so it seems that root LGRs are enough in a work-in-progress state not
> to draw definite conclusions from.
I overlooked something important in [2]: the ICANNWiki content is not
ICANN content, it is a separate org documenting ICANN. And it turns out
that their Root Zone Label Generation Rules page at [2] has stale
content. ICANN's own page on Root Zone Label Generation Rules [6]
describes version 5 of the root zone LGRs, which include entries for
Cyrllic, Japanese, and Korean scripts in addition to Chinese.
(I am making a note to update the ICANNWiki Root Zone LGRs page, [2], if
that is how their wiki works.)
[6] <https://www.icann.org/resources/pages/root-zone-lgr-2015-06-21-en>
(Content dates from 2022. No, I don't know why they have a 2015 date in
their URL.)
> I understand that each top-level registry sets the rules for
> second-level labels they will accept, though there is pressure from
> ICANN communities to adopt standard LGRs. There are a set of
> suggested
> Label Generation Rules for second-level labels[4]. As I read those
> rules, at a superficial level they also seem to rule out Bopomofo
> characters within Chinese language labels or Bopomofo-only labels.
>
>
> That particular rule set also excludes Hiragana and Katakana, so it's
> not clear that LGRs for Hani existing means the exclusion of Hanb,
> Jpan, and Kore.…
Have a look at the version 5 LGRs [6]. There may also be second-level
LGRs for other scripts like Japanese, Korean, and Cyrillic. I have not
checked. Does that clarify?
> …(I didn't ask about Jpan in my initial post, despite Han 口 and
> Katakana ロ existing, because of the different role of Hiragana and
> Katakana compared to the role of Bopomofo. I didn't ask about Kore,
> because I'm not aware of a confusability issue even if I have doubts
> about demand for Han + Hangul domain labels. I am curious, though, how
> users and domain holders deal with the 口 vs. ロ issue. Is the glyph
> size distinction consistent and obvious enough?)
You are not the first person to ask this question. Answers at Japanese
Stack Exchange[7], Reddit[8], WaniKani[9]. Summary: readers
differentiate the based on context, and sometimes when the context is
ambiguous people interpret the written kanji to be the kana. The best
summary: "Context is always the key in Japanese." Those replies also
point out other visually similar kana and kanji pairs.
[7]
<https://japanese.stackexchange.com/questions/13678/%E5%8F%A3%E3%83%AD-those-are-supposed-to-be-different-characters-how-can-you-tell/3025>
[8]
<https://www.reddit.com/r/LearnJapanese/comments/ck3w4w/the_one_time_its_okay_to_confuse_%E3%83%AD_and_%E5%8F%A3_%E3%83%AD%E3%83%91%E3%82%AF/>
[9] <https://community.wanikani.com/t/katakana-ro-vs-mouth-kanji/26641>
I hope this is helpful. Cheers!
—Jim DeLaHunt
--
. --Jim DeLaHunt,jdlh at jdlh.com http://blog.jdlh.com/ (http://jdlh.com/)
multilingual websites consultant, Vancouver, B.C., Canada
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240819/b2a0d36d/attachment.htm>
More information about the Unicode
mailing list