Directionality controls for malicious code

Sławomir Osipiuk sosipiuk at gmail.com
Thu Dec 2 11:10:40 CST 2021


Replying to several messages here:

On Thu, Dec 2, 2021 at 1:35 AM Martin J. Dürst <duerst at it.aoyama.ac.jp> wrote:
> There are many other tools involved, in particular editors. There are
> probably way less serious editors than programming languages. Editors
> can clearly show problematic characters, so that users can decide
> whether they are dangerous or necessary (or both).

It's better to do this at the language/compiler level because the
effects of BiDi "trickery" will vary with language, not with the
editor. The editor cannot be relied on to help in this instance,
because any contributor may decide that the one-line change he wants
to add to a giant project can be done with Notepad. The compiler
should know when code, not string contents or comments, is being
manipulated with RTL controls.

On Thu, Dec 2, 2021 at 2:27 AM Doug Ewell via Unicode
<unicode at corp.unicode.org> wrote:
> Going into a panic and writing this into programming language specifications is what doesn't need to happen.

No one is advising panic.

On Thu, Dec 2, 2021 at 3:27 AM Eli Zaretskii via Unicode
<unicode at corp.unicode.org> wrote:
> Blindly showing these controls wherever they are should not happen,
> either, because most of their uses are not malicious.

Yes, it should. This is not general prose intended to look nice. It's
a programming language demanding precision where a one-character typo
can majorly change functionality. The "users" in this case are assumed
to be a (relatively) specialist technical audience. Clarity of "what's
happening" outweighs other considerations.

> There are many projects that require to compile without any warnings, or treat warnings as errors, and those won't compile with such "draconian" compilers.

Which is why I mentioned that whitelisting, or some method of
suppressing the warnings, i.e. an "I know what I'm doing" option,
should also be added. But it should not be the default behavior. This
is a classic security vs. usability tradeoff, but I think you're
overestimating the amount of projects this would actually cause
problems for.

On Thu, Dec 2, 2021 at 3:38 AM Eli Zaretskii via Unicode
<unicode at corp.unicode.org> wrote:
> It's even against UAX#9, which says those
> controls should be invisible.

That rule should be ignored when it is counterproductive in a
specialist context.

On Thu, Dec 2, 2021 at 10:22 AM Daniel Bünzli via Unicode
<unicode at corp.unicode.org> wrote:
> We need a formal criterion that allows to check that a given span of characters in logical order does not visually overflow those characters that preceed or succeed them.

Yes, this is ideal. The problem is that Unicode doesn't "understand"
that string-terminating or comment-introducing characters in any given
programming language should reset the directionality. That's why the
solution must be at the same level that gives meaning to strings and
comments (and variables, etc.) i.e. the programming language itself.

> (if you happen to need that imbalance for your program in your string literals just use your Unicode character escape notation).

Yes. It makes perfect sense for control characters to be permitted
only as escape sequences. This is already common, if not required, in
many cases. I've seen plenty of "\r\n" in strings, and no one
complains that it doesn't look good, it's just how it's done.



More information about the Unicode mailing list