Directionality controls for malicious code

Eli Zaretskii eliz at gnu.org
Tue Nov 30 12:59:13 CST 2021


> Date: Tue, 30 Nov 2021 11:38:48 -0700
> From: Karl Williamson via Unicode <unicode at corp.unicode.org>
> 
> Is there any legitimate use of BiDi controls in text that doesn't have a 
> mixture of LtoR and RtoL strings?

Yes, although it's rare.  For example, there could be text that is
used to explain the effect of these format controls on LTR characters.
Another legitimate use would be a string of LTR characters that is
enclosed in these formatting controls so that it could be later placed
in RTL context without risking to get a jumbled text due to characters
with weak directionality.

Moreover, in real-life applications it could be quite hard to even
know whether a given chunk of text contains mixed LTR and RTL
characters, because the region could be very large and the application
doesn't necessarily consider all of it.

> If not, and since there are relatively few scripts of RtoL characters, 
> is there any legitimate use of BiDi controls outside of script runs of 
> those scripts.

Of course.  A typical use is for LTR characters embedded inside
otherwise RTL text.  There are examples of that in UAX#9, I think.

> Or could a new property be created that allowed for machine detection of 
> malicious use?

"Malicious use" is hard to define precisely in this case, IME.  We,
humans, know it when we see it, but the malicious intent is many times
extremely context-dependent and semantically-loaded, so it's hard to
detect it algorithmically, because most algorithms don't understand
the semantics of the text.


More information about the Unicode mailing list