<div dir="ltr"><div class="gmail_default" style="font-family:times new roman,serif">I think those are good suggestions. Note that that section doesn't necessarily mean that a special UBA algorithm is used; the results could be accomplished by modifying the line before displaying it. It sounds like the text isn't clear about that.</div><div class="gmail_default" style="font-family:times new roman,serif"><br></div><div class="gmail_default" style="font-family:times new roman,serif"><br></div><div class="gmail_default" style="font-family:times new roman,serif">Some things I think are fairly easy to do irrespective of the compiler; for example, I think it would be safe to forbid all unescaped stateful bidi controls in source code. And that eliminates a significant class of potential issues, but not all. As to your #1 and #2</div><div class="gmail_default" style="font-family:times new roman,serif"><br></div><div class="gmail_default" style="font-family:times new roman,serif">#1. An algorithm to guarantee that tokens are self-contained wouldn't be too hard. It would take something like a line plus token boundaries and return which tokens (if any) are broken in display. (For performance reasons you probably wouldn't want to do each token span separately.)</div><div class="gmail_default" style="font-family:times new roman,serif"><br></div><div class="gmail_default" style="font-family:times new roman,serif">#2. By using bidi isolates, it is pretty easy to mark-up the text so that you get a consistent order of tokens when applying the UBA. Any editing of the result could get pretty surprising for users, however.</div><div class="gmail_default" style="font-family:times new roman,serif"><br></div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><font face="'times new roman', serif"><div style="background-color:transparent;margin-top:0px;margin-left:0px;margin-bottom:0px;margin-right:0px"><div></div></div><div style="background-color:transparent;margin-top:0px;margin-left:0px;margin-bottom:0px;margin-right:0px">Mark</div></font><div><div><font face="'times new roman', serif"><i><span style="font-style:normal"><i></i></span><i></i></i></font></div></div></div></div></div></div></div></div></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Dec 2, 2021 at 3:43 PM Daniel Bünzli <<a href="mailto:daniel.buenzli@erratique.ch">daniel.buenzli@erratique.ch</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 2 December 2021 at 19:51:19, Mark Davis ☕️ via Unicode (<a href="mailto:unicode@corp.unicode.org" target="_blank">unicode@corp.unicode.org</a>) wrote:<br>

<br>

> The UBA explicitly carves out room for specialized text handling in<br>

> <a href="https://unicode.org/reports/tr9/#Higher-Level_Protocols" rel="noreferrer" target="_blank">https://unicode.org/reports/tr9/#Higher-Level_Protocols</a>. The goal of that<br>

> is to allow editors to handle bidi ordering in a sensible (and not<br>

> misleading) fashion in environments such as programming language editing,<br>

> specifically so that tokens are 'self-contained' and the ordering among<br>

> tokens is clear.<br>

<br>

I would prefer if that was a property we could check/enforce on spans of the Unicode text itself. In my opinion using a viewer that uses a special UBA is not really a good solution, if not a solution at all (e.g. if you want to check these properties when you embed user generated content to be rendered via a browser).<br>

<br>

On 2 December 2021 at 18:10:40, Sławomir Osipiuk via Unicode (<a href="mailto:unicode@corp.unicode.org" target="_blank">unicode@corp.unicode.org</a>) wrote:<br>

<br>

> Yes, this is ideal. The problem is that Unicode doesn't "understand" <br>

> that string-terminating or comment-introducing characters <br>

> in any given programming language should reset the directionality. <br>

<br>

Indeed directionality reset is precisely what I would like to be able to detect or enforce for arbitrary spans of Unicode text. Basically I think it would be nice to have: <br>

<br>

1) An algorithm that given text and a span therein determines if the span visually overflows its own content.<br>

<br>

2) An algorithm that given text and a span therein returns a new span of text with the same textual content but with additional bidi control characters that make sure the span is visually contained to its content in the given text.<br>

<br>

Formulated differently: how can we make sure arbitrary spans of Unicode text behave, as far as UBA is concerned, as a self-contained paragraph. <br>

<br>

Best, <br>

<br>

Daniel<br>

</blockquote></div>