Directionality controls for malicious code

Thu Dec 2 09:19:12 CST 2021

On 2 December 2021 at 09:24:23, Eli Zaretskii via Unicode (unicode at corp.unicode.org) wrote:

> Blindly showing these controls wherever they are should not happen,
> either, because most of their uses are not malicious. The tests must
> be smarter than just looking at the codepoint, they should also look
> at the surrounding text and examine the effect of those directional
> controls on that text.

I agree with Eli and I think programming language specifications should say something about it.

We need a formal criterion that allows to check that a given span of characters in logical order does not visually overflow those characters that preceed or succeed them.

This check can then be applied on the content of the various syntactic constructs of your language (e.g. string literals, comments, etc.) and you report a syntax error if there's a visual overflow. This makes sure no text is allowed to visually escape the boundaries it's supposed to be confined to.

I'm not familiar enough with the bidi algorithm but for example it seems that unbounded RLO or RLI in a span should be forbidden unless they are properly balanced with a matching PDI or PDF (if you happen to need that imbalance for your program in your string literals just use your Unicode character escape notation). But I'm sure the problem is much more complex than that and I'd be curious if people in the know of the algorithm have an idea on how to go about it. 

There's also likely quite a few other security contexts where such a check could be useful (e.g. untrusted user input).

Best, 

Daniel