UAX #9: applicability of higher-level protocols to bidi plaintext
Shai Berger via Unicode
unicode at unicode.org
Mon Jul 9 17:33:28 CDT 2018
Hello all,
About two and a half years ago, I suggested adding a FAQ about the
applicability of higher-level protocols for bidirectional plaintext, as
specified by http://www.unicode.org/reports/tr9/ -- my suggestion was
to clarify that higher-level protocols can only be applied upon
agreement between all producers and consumers, and that such agreements
effectively mean that the text is "special text" -- no longer plain.
In the time since then, I have been mostly removed from this issue, but
I came back to it recently, to find that my suggested text was
rejected, and instead, two FAQs were added to
http://www.unicode.org/faq/bidi.html: The first, which is marked by the
HTML anchor bidi7, goes with my understanding and defines a
higher-level protocol as an agreement; but the second, marked as bidi8,
goes the other way, and explains that actually, agreement is not
necessary -- a program is at liberty to "implicitly define an overall
directional context for display, and that implicit definition of
direction is itself an example of application of a higher-level
protocol for the purposes of the UBA".
One result of this is the following scenario: I open my
standard-compliant text editor, and write a line of text (to make
things accessible to a wider audience, I use capitals for right-to-left
English and small letters for normal, left-to-right English; note this
sentence starts from the right):
SESU RETHO DNA email ROF plaintext REFERP I
I save this line in a text file. Then I display it using my
standards-compliant text viewer, but now it looks like this:
REFERP I plaintext ROF email SESU RETHO DNA
And this is because my standard-compliant text-viewer chooses to apply
its higher-level protocol and treat the line as a LTR paragraph.
Since bidi8 is a little abstract on this point, and focuses on terminal
windows rather than editors and viewers, I would like to ask:
Does this concrete result represent the intents of the UTC?
Thanks for your attention,
Shai.
More information about the Unicode
mailing list