Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

Eli Zaretskii via Unicode unicode at unicode.org
Sun Feb 3 10:35:40 CST 2019


> Date: Sun, 03 Feb 2019 18:10:15 +0200
> Cc: richard.wordingham at ntlworld.com, unicode at unicode.org
> From: Eli Zaretskii via Unicode <unicode at unicode.org>
> 
> I think there are hard problems even for such "simple" utilities, and
> I will start a separate thread about this.

I think we spent enough time discussing issues of complex script
shaping in terminal emulators, something that IMO took us too far
aside.  The basic problems with bidi reordering of text-mode output
start much sooner, and are much more fundamental.  I think they should
be considered first.

The document cited at the beginning of the parent thread states that
"simple" text-mode utilities, such as 'echo', 'cat', 'ls' etc. should
use the "implicit" mode of bidi reordering, with automatic guessing of
the base paragraph direction.  I think this already present
non-trivial problems.

The fundamental problem here is that most "simple" utilities use hard
newlines to present text in some visually plausible format.  Even when
these utilities just emit text read from files (as opposed to
generating the text from the program), you will normally see each line
end with a hard newline, because the absolute majority of text files
have a hard newline and the end of each line.

When bidirectional text is reordered by the terminal emulator, these
hard newlines will make each line be a separate paragraph.  And this
is a problem, because the result will be completely random, depending
on the first strong directional character in each line, and will be
visually very unpleasant.  Just take the output produced by any
utility when invoked with, say, the --help option, and try imagining
how this will look when translated into a language that uses RTL
script.

So I think determination of the paragraph direction even in this
simplest case cannot be left to the UBA defaults, and there's a need
to use "higher-level" protocols for paragraph direction.  IOW, the
implicit mode described in the above-mentioned document needs to be
augmented by a smarter method of determining the base paragraph
direction.  (I might have a suggestion for that, if people agree with
the above reasoning.)


More information about the Unicode mailing list