<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-cite-prefix">On 6/7/2021 10:05 AM, Sławomir Osipiuk
via Unicode wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAM+ijLg7J64h4MQ3aNKiLYqNEw84105DzOR9VwFyOu4fB0N_7Q@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div dir="ltr">On Mon, Jun 7, 2021 at 1:16 AM Mark Dawson via
Unicode <<a href="mailto:unicode@corp.unicode.org"
moz-do-not-send="true">unicode@corp.unicode.org</a>>
wrote:<br>
</div>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div><br>
</div>
<div>No other small Latin letter is flagged as a
confusable. (Not even the letter "o"). </div>
</div>
</blockquote>
<div><br>
</div>
<div>All the other latin letters ARE listed as confusable. I'm
curious how the implementation decides which ones to flag.
The only thing unique about "m", versus the rest of the
latin alphabet, seems to be that it's confusable with a
two-character sequence. But surely the implementation
doesn't restrict itself to only such cases, so what is
happening here? Why is "m" causing a problem, but "o" is
not, when both are confusable with other characters? Does it
have to do with the input being restricted to ASCII (or some
other limited set) and so other characters are removed as
possibilities, leaving the latin set as non-confusable
(aside from "m")?</div>
<div><br>
</div>
<div>Sławomir Osipiuk</div>
</div>
</div>
</blockquote>
<p><font face="Candara">Confusable relations are symmetric (but not
always transitive).</font></p>
<p><font face="Candara">However, they are listed in one direction
only, generally from some specialized character towards the
generic/ASCII one. The exception is m --> rn, presumably
because rn is a sequence.</font></p>
<p><font face="Candara">Unless you make the data collection
symmetric, it's really not possible determine all characters
that "have confusables", which is what that software was
apparently trying to do.</font></p>
<p><font face="Candara">As you can see from the case of m <-->
rn it is not always possible to claim that one of the characters
is "unexpected" in the context. That analysis requires
additional research and can't simply be done with a "data
dump".</font></p>
<p><font face="Candara">A./</font><br>
</p>
</body>
</html>