Directionality controls for malicious code

Mark Davis ☕️ mark at macchiato.com
Thu Dec 2 18:31:41 CST 2021


I think those are good suggestions. Note that that section doesn't
necessarily mean that a special UBA algorithm is used; the results could be
accomplished by modifying the line before displaying it. It sounds like the
text isn't clear about that.


Some things I think are fairly easy to do irrespective of the compiler; for
example, I think it would be safe to forbid all unescaped stateful bidi
controls in source code. And that eliminates a significant class of
potential issues, but not all. As to your #1 and #2

#1. An algorithm to guarantee that tokens are self-contained wouldn't be
too hard. It would take something like a line plus token boundaries and
return which tokens (if any) are broken in display. (For performance
reasons you probably wouldn't want to do each token span separately.)

#2. By using bidi isolates, it is pretty easy to mark-up the text so that
you get a consistent order of tokens when applying the UBA. Any editing of
the result could get pretty surprising for users, however.

Mark


On Thu, Dec 2, 2021 at 3:43 PM Daniel Bünzli <daniel.buenzli at erratique.ch>
wrote:

> On 2 December 2021 at 19:51:19, Mark Davis ☕️ via Unicode (
> unicode at corp.unicode.org) wrote:
>
> > The UBA explicitly carves out room for specialized text handling in
> > https://unicode.org/reports/tr9/#Higher-Level_Protocols. The goal of
> that
> > is to allow editors to handle bidi ordering in a sensible (and not
> > misleading) fashion in environments such as programming language editing,
> > specifically so that tokens are 'self-contained' and the ordering among
> > tokens is clear.
>
> I would prefer if that was a property we could check/enforce on spans of
> the Unicode text itself. In my opinion using a viewer that uses a special
> UBA is not really a good solution, if not a solution at all (e.g. if you
> want to check these properties when you embed user generated content to be
> rendered via a browser).
>
> On 2 December 2021 at 18:10:40, Sławomir Osipiuk via Unicode (
> unicode at corp.unicode.org) wrote:
>
> > Yes, this is ideal. The problem is that Unicode doesn't "understand"
> > that string-terminating or comment-introducing characters
> > in any given programming language should reset the directionality.
>
> Indeed directionality reset is precisely what I would like to be able to
> detect or enforce for arbitrary spans of Unicode text. Basically I think it
> would be nice to have:
>
> 1) An algorithm that given text and a span therein determines if the span
> visually overflows its own content.
>
> 2) An algorithm that given text and a span therein returns a new span of
> text with the same textual content but with additional bidi control
> characters that make sure the span is visually contained to its content in
> the given text.
>
> Formulated differently: how can we make sure arbitrary spans of Unicode
> text behave, as far as UBA is concerned, as a self-contained paragraph.
>
> Best,
>
> Daniel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20211202/8ea80083/attachment.htm>


More information about the Unicode mailing list