Directionality controls for malicious code

Daniel Bünzli daniel.buenzli at erratique.ch
Thu Dec 2 17:43:13 CST 2021


On 2 December 2021 at 19:51:19, Mark Davis ☕️ via Unicode (unicode at corp.unicode.org) wrote:

> The UBA explicitly carves out room for specialized text handling in
> https://unicode.org/reports/tr9/#Higher-Level_Protocols. The goal of that
> is to allow editors to handle bidi ordering in a sensible (and not
> misleading) fashion in environments such as programming language editing,
> specifically so that tokens are 'self-contained' and the ordering among
> tokens is clear.

I would prefer if that was a property we could check/enforce on spans of the Unicode text itself. In my opinion using a viewer that uses a special UBA is not really a good solution, if not a solution at all (e.g. if you want to check these properties when you embed user generated content to be rendered via a browser).

On 2 December 2021 at 18:10:40, Sławomir Osipiuk via Unicode (unicode at corp.unicode.org) wrote:

> Yes, this is ideal. The problem is that Unicode doesn't "understand" 
> that string-terminating or comment-introducing characters 
> in any given programming language should reset the directionality. 

Indeed directionality reset is precisely what I would like to be able to detect or enforce for arbitrary spans of Unicode text. Basically I think it would be nice to have: 

1) An algorithm that given text and a span therein determines if the span visually overflows its own content.

2) An algorithm that given text and a span therein returns a new span of text with the same textual content but with additional bidi control characters that make sure the span is visually contained to its content in the given text.

Formulated differently: how can we make sure arbitrary spans of Unicode text behave, as far as UBA is concerned, as a self-contained paragraph. 

Best, 

Daniel



More information about the Unicode mailing list