UAX #9: applicability of higher-level protocols to bidi plaintext

Philippe Verdy via Unicode unicode at unicode.org
Tue Jul 10 06:37:56 CDT 2018


Your "standard compliant" plain text editor just forces a LTR default for
the whole document, and does not tolerate that individual paragraphs may
start with an undetermined direction (which should then be determined by
the first character on the line that defines a direction.)
In my opinion, even if your text editor still does not enforce the default
left margin side for aligning the text, it should still treat individual
paragraphs isolately and determine the direction to use (each paragraph
break should cancel the direction inheritance).

A plain text editor should not have a default strong LTR default, it should
have a weak undetermined direction, independantly of the fact that it will
align the pagraph to the left of right margin according to the resolved
direction of the first character. That's what web browsers are doing for
example in input fields (where automatic side of the start margin does not
change when you start typing some text in the input field and there's no
"text-align:left" or "text-align:right" to force it, just
"text-align:justify" or "text-align:normal"; note that CSS
"text-align:justify" positions the start margin according to the CSS
direction of the container element, this makes a difference for the last
line of the paragraph, but with automatic determination of an unspecified
direction, a justified paragraph may look ugly if this does not also
properly sets the start margin of the paragraph according to the resolved
direction of the first character of the paragraph or block element

Note also that images or other inline objects embedded in paragraphs/block
also don't have a defined strong direction for themselves, they act like
Unicode "isolates", but you may want to style them to set its outer
direction, independantly of the inner direction of the isolate; I'm not
sure however if images e.g. in SVG, may inherit their direction from the
outer context of the isolate, but if they do, I doubt it can, then they are
acting more like the old-fashioned Unicode "embeds" rather than "isolates",
except that what is after the image should not depend on the last direction
used inside the SVG; images should be completely isolated from their
context of use and completly define their expected rendering; SVG images
also contain their own upper layer protocol as they can embeded mutliple
texts, but in the context of the SVG document; now with SVG elements
directly in the HTML5 DOM as plain elements, the situation may have changed
because they can inherit many things from the HTML5 doc, including shared
stylesheets...).


2018-07-10 0:33 GMT+02:00 Shai Berger via Unicode <unicode at unicode.org>:

> Hello all,
>
> About two and a half years ago, I suggested adding a FAQ about the
> applicability of higher-level protocols for bidirectional plaintext, as
> specified by http://www.unicode.org/reports/tr9/ -- my suggestion was
> to clarify that higher-level protocols can only be applied upon
> agreement between all producers and consumers, and that such agreements
> effectively mean that the text is "special text" -- no longer plain.
>
> In the time since then, I have been mostly removed from this issue, but
> I came back to it recently, to find that my suggested text was
> rejected, and instead, two FAQs were added to
> http://www.unicode.org/faq/bidi.html: The first, which is marked by the
> HTML anchor bidi7, goes with my understanding and defines a
> higher-level protocol as an agreement; but the second, marked as bidi8,
> goes the other way, and explains that actually, agreement is not
> necessary -- a program is at liberty to "implicitly define an overall
> directional context for display, and that implicit definition of
> direction is itself an example of application of a higher-level
> protocol for the purposes of the UBA".
>
> One result of this is the following scenario: I open my
> standard-compliant text editor, and write a line of text (to make
> things accessible to a wider audience, I use capitals for right-to-left
> English and small letters for normal, left-to-right English; note this
> sentence starts from the right):
>
>         SESU RETHO DNA email ROF plaintext REFERP I
>
> I save this line in a text file. Then I display it using my
> standards-compliant text viewer, but now it looks like this:
>
>         REFERP I plaintext ROF email SESU RETHO DNA
>
> And this is because my standard-compliant text-viewer chooses to apply
> its higher-level protocol and treat the line as a LTR paragraph.
>
> Since bidi8 is a little abstract on this point, and focuses on terminal
> windows rather than editors and viewers, I would like to ask:
> Does this concrete result represent the intents of the UTC?
>
> Thanks for your attention,
>
>         Shai.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180710/652e8ea8/attachment.html>


More information about the Unicode mailing list