Directionality controls for malicious code

Eli Zaretskii eliz at gnu.org
Fri Dec 3 05:43:47 CST 2021


> Date: Fri, 3 Dec 2021 10:00:21 +0100
> From: Daniel Bünzli <daniel.buenzli at erratique.ch>
> Cc: unicode at corp.unicode.org
> 
> Yes. The idea is to disallow in the grammar of your language visual reorderings to occur across certain textual boundaries specific to your language.

Text editors usually understand very little of the language grammar,
or not at all.

> If you take C multi-line comments /* … */ the idea is that: 
> 
> 1. No text logically between the /* and */ should visually be able to get on the left of /* 
> 2. No text logically between the /* and */ should visually be able to get on the right of */
> 3. No text logically before the /* should visually be able to get on the right of /*
> 4. No text logically after the */ should visually be able to get on the left of */ 
> 
> I'd say that a short way of saying that is that the text logically inside the /* and */ should be made to behave as an UBA paragraph – since no reorderings occur accross paragraphs. Violations of that property should result in a syntax error or a warning.

That's a tough ticket.  It requires the editor to perform the kind of
processing that is much more complicated than what they do now.  Think
about nested comments, comment-like text inside strings embedded
within the code, etc.

> For example in the case above, for enforcing them, would it be sufficient to insert a LRI (or RLI, or FSI) after /* and a PDI before */ ? Would that make sure that the properties 1-4 are satisfied for all contexts and contents of comments ?

If you ever programmed an editor, you know that actually inserting
something into the text of a file that wasn't there to begin with is a
no-no: you are likely to leak those insertions to the outside world.
Basically, there are user expectations that if you open a file, go
through it without any changes, then save it, it ends up identical to
what it was before.  If you start inserting characters into the text,
you will have a hard time keeping that promise, because it is hard to
distinguish between text you insert and the text the user inserts.


More information about the Unicode mailing list