Directionality controls for malicious code

Eli Zaretskii eliz at
Fri Dec 3 01:22:00 CST 2021

> Date: Fri, 3 Dec 2021 00:43:13 +0100
> Cc: unicode at
> From: Daniel Bünzli via Unicode <unicode at>
> > Yes, this is ideal. The problem is that Unicode doesn't "understand" 
> > that string-terminating or comment-introducing characters 
> > in any given programming language should reset the directionality. 
> Indeed directionality reset is precisely what I would like to be able to detect or enforce for arbitrary spans of Unicode text.

I don't see how it would help.  For example, if you examine the
examples provided in that paper, you will see that the directional
format controls were inserted inside comments, but in a way that made
parts of the comments to look like part of the code.

> Basically I think it would be nice to have: 
> 1) An algorithm that given text and a span therein determines if the span visually overflows its own content.

What do you mean by "visually overflows"?

> 2) An algorithm that given text and a span therein returns a new span of text with the same textual content but with additional bidi control characters that make sure the span is visually contained to its content in the given text.

This is not clear, either.

> Formulated differently: how can we make sure arbitrary spans of Unicode text behave, as far as UBA is concerned, as a self-contained paragraph. 

"Self-contained" in what sense?

More information about the Unicode mailing list