UAX #9: applicability of higher-level protocols to bidi plaintext

Eli Zaretskii via Unicode unicode at unicode.org
Fri Jul 13 03:22:51 CDT 2018


> Date: Fri, 13 Jul 2018 08:57:25 +0100
> From: Richard Wordingham via Unicode <unicode at unicode.org>
> 
> Even just for horizontal text, one problem is the shape of the canvas.
> If it has a left and a right-hand margin, than having an undetermined
> direction by default can work, given enough memory.  The rendering
> system then has to have enough memory to store the entire paragraph -
> the strongly directional character may be the last one in the
> paragraph.  I'm not sure that a protocol is allowed to be based on
> analysing the first 100 characters of a paragraph.

Indeed.  We've discovered this problem in Emacs when the UBA was
implemented: some buffers, like those visiting log files, have very
long stretches of weak characters (digits and punctuation), which
require the automatic paragraph direction search very far, potentially
slowing down the display engine.

> However, it is common for displays to provide a window into a canvas
> that is unbounded both downwards and either rightwards or leftwards.
> If it is unbounded rightwards, one needs an LTR paragraph direction: if
> it is unbounded leftwards, one needs an RTL paragraph direction.

Yes.  In Emacs, there are commands that display text derived from
standardized templates.  In these cases, we cannot rely on the default
determination of the paragraph direction, because the first strong
directional character might be unpredictable.  We must force a certain
paragraph direction in those cases.

> I believe that having a mix of paragraphs unbounded on the left and
> paragraphs unbounded on the right would feel distinctly odd; it
> could also be a challenge to manage panning the window.  It also
> raises the question of where the LTR and RTL paragraphs would
> overlap.

Different applications will have different needs here, so there's
definitely a need to provide applications and users with some control
of paragraph direction, and the way to do this is define high-level
protocols controlled by some optional variables.  A well-known example
of that is the paragraph-direction buttons in Word and similar
processors (although they don't produce plain text, so the analogy is
limited).


More information about the Unicode mailing list