Security consideration: math symbols in an exotic IP address format in a phishing mail

Martin J. Dürst duerst at it.aoyama.ac.jp
Wed May 20 19:59:13 CDT 2020


Hello Markus, others,

On 20/05/2020 04:33, Markus Scherer via Unicode wrote:
> On Tue, May 19, 2020 at 12:24 PM Phake Nick via Unicode <unicode at unicode.org>
> wrote:
> 
>> Somewhat relevant, I have previously observed that, if you type/produce a
>> link of http://www.abc.def/ghi?jk=lm , and then replace symbol characters
>> in the link with some other confusable symbols, like full width punctuation
>> and such, that link will still take you to the intended address. Different
>> browsers accept different characters. Sometimes when such a link format is
>> being posted onto internet communities that restrict link sharing, such
>> alternative unicode characters formed links can bypass link restrictions in
>> those communities and potentially take unsuspecting netizens to harmful
>> websites.
>> I don't understand why browsers would normalize links being clicked/typed
>> in such way which would expose users to such risk.
>>
> 
> IDNA implementations process domain names using a "mapping" step which is
> like a variant of NFKC_Casefold.

That in itself isn't a problem, but it depends on the details.

> That's why you can use uppercase

Good.

> as well
> as other canonical

Good.

> and compatibility equivalents,

Good up to a point. As discussed already, mapping full-width characters 
to their half-width equivalents can make a lot of sense for users in 
China, Japan,... But mapping other compatibility equivalents doesn't 
make sense at all. Definitely not for Math Bold like in the example at 
hand, and definitely not for circled characters and the like.

> and out-of-order
> combining marks.

That's just part of canonical equivalence, isn't it?

The other very important point is of course that IP addresses are not 
domain names, and therefore are not covered by IDNA, and shouldn't be 
mapped in any way. But what happens inside browsers is probably the 
following:

(1) Check if the authority part is an IP address or a domain name.
(2) It doesn't look like an (ASCII) IP address, so it's handled
     as a domain name.
(3) Apply IDNA mapping (see above). Produces ASCII numbers.
(4) Apply IDNA toASCII conversion (no-op in the case at hand)
(5) Feed this to a generic resolver, which includes octal->regular
     IP address conversion.

Narrowing the IDNA mapping as discussed above would fix this case, 
because the toASCII operation would reject Math Bold as invalid 
characters. For security checks, rejecting Math Bold (and the like) 
would also work. But that would have to be restricted to the authority 
part of a Web address, because such numbers can of course occur in other 
parts.

Regards,   Martin.

> markus
> 



More information about the Unicode mailing list