UAX #9: applicability of higher-level protocols to bidi plaintext

Shai Berger via Unicode unicode at unicode.org
Mon Jul 9 17:33:28 CDT 2018


Hello all,

About two and a half years ago, I suggested adding a FAQ about the
applicability of higher-level protocols for bidirectional plaintext, as
specified by http://www.unicode.org/reports/tr9/ -- my suggestion was
to clarify that higher-level protocols can only be applied upon
agreement between all producers and consumers, and that such agreements
effectively mean that the text is "special text" -- no longer plain.

In the time since then, I have been mostly removed from this issue, but
I came back to it recently, to find that my suggested text was
rejected, and instead, two FAQs were added to
http://www.unicode.org/faq/bidi.html: The first, which is marked by the
HTML anchor bidi7, goes with my understanding and defines a
higher-level protocol as an agreement; but the second, marked as bidi8,
goes the other way, and explains that actually, agreement is not
necessary -- a program is at liberty to "implicitly define an overall
directional context for display, and that implicit definition of
direction is itself an example of application of a higher-level
protocol for the purposes of the UBA".

One result of this is the following scenario: I open my
standard-compliant text editor, and write a line of text (to make
things accessible to a wider audience, I use capitals for right-to-left
English and small letters for normal, left-to-right English; note this
sentence starts from the right):

	SESU RETHO DNA email ROF plaintext REFERP I

I save this line in a text file. Then I display it using my
standards-compliant text viewer, but now it looks like this:

	REFERP I plaintext ROF email SESU RETHO DNA

And this is because my standard-compliant text-viewer chooses to apply
its higher-level protocol and treat the line as a LTR paragraph.

Since bidi8 is a little abstract on this point, and focuses on terminal
windows rather than editors and viewers, I would like to ask:
Does this concrete result represent the intents of the UTC?

Thanks for your attention,

	Shai.


More information about the Unicode mailing list