Hanb in domain labels
Jim DeLaHunt
list+unicode at jdlh.com
Fri Aug 16 14:32:11 CDT 2024
On 2024-08-15 02:08, Henri Sivonen via Unicode wrote:
> UTS #39 is commonly used as the baseline for detecting IDN spoofs, and
> UTS #39 explicitly allows combining Han and Bopomofo. Considering that
> ㄚ looks confusable with 丫 and ㄠ looks confusable with 幺, I’m wondering
> if it’s appropriate to explicitly allow this combination in the spoof
> detection context.…
Are you asking about whether UTS #39 should allow this combination vs
being changed to forbid this combination? Or are you asking about
whether the rules of the Domain Name System should allow this combination?
I am involved with Universal Acceptance advocacy[1]. That means I have
one foot in the DNS world, and the ICANN rules which govern it. I am not
an expert, but I am aware of some principles there. My understanding is
that the DNS world writes its own rules for detecting and preventing IDN
spoofs. I have not heard that UTS #39 is a fundamental document for them.
> Is combining Han and Bopomofo in one domain label something that
> occurs commonly enough in domains…?
This sounds like a question about what the DNS, what names are already
registered, and what are the rules for registering further names. The
former is backward-looking, the latter is forward-looking. Thus the
answer has two parts.
For the backward-looking question, I have some awareness of the rules
ICANN has put into place. Again, I am not an expert, but I have heard
experts talk about some of the terminology and concepts.
The ICANN communities have put a lot of effort in recent years into
"Label Generation Rules". ("Label" means the identifiers separated by
periods in a domain name. In "example.com", "example" and "com" are
Labels.) The LGRs are script-specific, so there are LGRs for scripts
like Chinese, Bangla, Arabic, etc. The LGRs specifically try to prevent
spoofs and confusion between labels. The LGRs define a repertoire of
characters which may be used in a label. They define characters or
strings which are variants of each other, which a human reader might
consider to have the same meaning. There are rules about the
registration of one variant label requires that the other variant labels
either be registered to the same entity, or be protected from registration.
There are a set of Label Generation Rules for the root zone[2] of the
DNS. They include rules for Chinese script labels[3] in the root zone.
In my simple-minded reading of those rules, Bopomofo characters are not
included in the repertoire. I suspect that means that the rules prevent
anyone from registering a .ㄅㄆㄇㄈ top-level domain, or a Chinese domain
with Bopomofo inclusions.
I understand that each top-level registry sets the rules for
second-level labels they will accept, though there is pressure from
ICANN communities to adopt standard LGRs. There are a set of suggested
Label Generation Rules for second-level labels[4]. As I read those
rules, at a superficial level they also seem to rule out Bopomofo
characters within Chinese language labels or Bopomofo-only labels.
If you really want to understand what rules govern domain names, don't
rely on my simple-minded understanding. Get in touch with ICANN
communities[5] who specialise in those rules. The Generic Names
Supporting Organisation might be a good place to start.
For the backward-looking question, about what names are already
registered in various top-level domains, I don't have specific
information. I have the impression that a lot of domain names were
registered before the current LGRs were developed. I won't be surprised
to hear that some of them don't comply with the LGRs. For instance, the
.com and .org domains might have registered some labels with Bopomofo
characters in the page. Again, the ICANN communities[5] would be a place
to ask.
All of that seems to say that (if my understanding is correct),
"combining Han and Bopomofo in one domain label" is not "something that
occurs commonly… in domains" registered under the LGRs, but that might
have occurred with legacy labels registered in the past.
Does this help answer your questions?
—Jim DeLaHunt
[1] <https://uasg.tech/>
[2] <https://icannwiki.org/Root_Zone_Label_Generation_Rules>
[3]
<https://www.icann.org/sites/default/files/lgr/rz-lgr-5-chinese-script-26may22-en.html>
[4]
<https://www.icann.org/sites/default/files/packages/lgr/lgr-second-level-chinese-full-variant-script-24jan24-en.html>
[5] <https://www.icann.org/community>
--
. --Jim DeLaHunt, jdlh at jdlh.com http://blog.jdlh.com/ (http://jdlh.com/)
multilingual websites consultant, Vancouver, B.C., Canada
More information about the Unicode
mailing list