Security consideration: math symbols in an exotic IP address format in a phishing mail
Martin J. Dürst
duerst at it.aoyama.ac.jp
Wed May 20 19:59:13 CDT 2020
Hello Markus, others,
On 20/05/2020 04:33, Markus Scherer via Unicode wrote:
> On Tue, May 19, 2020 at 12:24 PM Phake Nick via Unicode <unicode at unicode.org>
> wrote:
>
>> Somewhat relevant, I have previously observed that, if you type/produce a
>> link of http://www.abc.def/ghi?jk=lm , and then replace symbol characters
>> in the link with some other confusable symbols, like full width punctuation
>> and such, that link will still take you to the intended address. Different
>> browsers accept different characters. Sometimes when such a link format is
>> being posted onto internet communities that restrict link sharing, such
>> alternative unicode characters formed links can bypass link restrictions in
>> those communities and potentially take unsuspecting netizens to harmful
>> websites.
>> I don't understand why browsers would normalize links being clicked/typed
>> in such way which would expose users to such risk.
>>
>
> IDNA implementations process domain names using a "mapping" step which is
> like a variant of NFKC_Casefold.
That in itself isn't a problem, but it depends on the details.
> That's why you can use uppercase
Good.
> as well
> as other canonical
Good.
> and compatibility equivalents,
Good up to a point. As discussed already, mapping full-width characters
to their half-width equivalents can make a lot of sense for users in
China, Japan,... But mapping other compatibility equivalents doesn't
make sense at all. Definitely not for Math Bold like in the example at
hand, and definitely not for circled characters and the like.
> and out-of-order
> combining marks.
That's just part of canonical equivalence, isn't it?
The other very important point is of course that IP addresses are not
domain names, and therefore are not covered by IDNA, and shouldn't be
mapped in any way. But what happens inside browsers is probably the
following:
(1) Check if the authority part is an IP address or a domain name.
(2) It doesn't look like an (ASCII) IP address, so it's handled
as a domain name.
(3) Apply IDNA mapping (see above). Produces ASCII numbers.
(4) Apply IDNA toASCII conversion (no-op in the case at hand)
(5) Feed this to a generic resolver, which includes octal->regular
IP address conversion.
Narrowing the IDNA mapping as discussed above would fix this case,
because the toASCII operation would reject Math Bold as invalid
characters. For security checks, rejecting Math Bold (and the like)
would also work. But that would have to be restricted to the authority
part of a Web address, because such numbers can of course occur in other
parts.
Regards, Martin.
> markus
>
More information about the Unicode
mailing list