Confusables.txt might be too sensitive

Doug Ewell doug at ewellic.org
Mon Jun 7 13:00:28 CDT 2021


Sławomir Osipiuk wrote:

>> No other small Latin letter is flagged as a confusable. (Not even the
>> letter "o").
>
> All the other latin letters ARE listed as confusable.

But not in confusables.txt. It's entirely likely, as Mark Dawson surmised, that the MetaMask people simply grabbed that one file and used it as their entire security strategy.

It would hardly be the first time that someone took a small component of the Unicode (or other) standard and used it as their implementation, instead of actually reading and understanding the standard. Look what happens when someone browses the Unicode code charts and declares that language X isn't fully supported because the contextual forms aren't there. (The same happens in BCP 47 when people look only at the Language Subtag Registry and don't read the document.)

> I'm curious how the implementation decides which ones to flag. The
> only thing unique about "m", versus the rest of the latin alphabet,
> seems to be that it's confusable with a two-character sequence. But
> surely the implementation doesn't restrict itself to only such cases,
> so what is happening here?

Actually, that is probably exactly what is happening: the implementation is taking confusables.txt out of context and using it as a sledgehammer.

> Why is "m" causing a problem, but "o" is not, when both are confusable
> with other characters? Does it have to do with the input being
> restricted to ASCII (or some other limited set) and so other
> characters are removed as possibilities, leaving the latin set as non-
> confusable (aside from "m")?

I think an interesting experiment would be to try other types of confusable scenarios, such as an ENS name wholly or partially in another script such as Greek or Cyrillic, to see if MetaMask allows those while flagging 'm'.

In any case, if MetaMask flags all ENS names that contain an 'm' (or '1' or 'I'), then a whole lot of users besides Mark are sure to run into the same problem. Gosh, even the example name at ens.domains ("Yourname.eth") would generate the warning.

--
Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org





More information about the Unicode mailing list