Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

Egmont Koblinger via Unicode unicode at unicode.org
Sun Feb 3 10:54:25 CST 2019


Hi Eli,

> The document cited at the beginning of the parent thread states that
> "simple" text-mode utilities, such as 'echo', 'cat', 'ls' etc. should
> use the "implicit" mode of bidi reordering, with automatic guessing of
> the base paragraph direction.

Not exactly. I take the SCP escape sequence from ECMA TR/53 (and
slightly reinterpret it) so that it specifies the paragraph direction,
plus introduce a new one that specifies whether autodetection is
enabled. I'm arguing, although my reasons are not rock solid, that
IMHO the default should be the strict direction as set by SCP, without
autodetection.

> The fundamental problem here is that most "simple" utilities use hard
> newlines to present text in some visually plausible format.

Could you please list examples?

What I have in mind are "echo", "cat", "grep" and alike, they don't
care about the terminal width.

If an app cares about the terminal width, how does it care about it?
What does it use this information for? To truncate overlong strings,
for example? At this very moment I'd argue that such applications need
to do BiDi on their own, and thus set the terminal to explicit mode.
In ap app does any kind of string truncation, it can no longer
delegate the task of BiDi to the terminal emulator.

I'm also mentioning that you cannot both logically and visually
truncate a BiDi string at once. Either you truncate the logical
string, which may result in a visual nonsense, or you truncate the
visual string, risking that it's not an initial fragment of the data
that ends up getting displayed. Along these lines I'm arguing that
basic utilities like "cut" shouldn't care about BiDi, the logical
behavior there is more important than the visual one. There could, of
course, be sophisticated "bidi-cut" and similar utilities at one point
which cut the visual string, but they should use the terminal's
explicit mode.

> Even when
> these utilities just emit text read from files (as opposed to
> generating the text from the program), you will normally see each line
> end with a hard newline, because the absolute majority of text files
> have a hard newline and the end of each line.

How does a BiDi text file look like, to begin with? Can a heavily BiDi
text file be formatted to 72 (or whatever) columns using explicit
newlines, keeping BiDi both semantically and visually correct? I truly
doubt that. Can you show me such files?

> When bidirectional text is reordered by the terminal emulator, these
> hard newlines will make each line be a separate paragraph.  And this
> is a problem, because the result will be completely random, depending
> on the first strong directional character in each line, and will be
> visually very unpleasant.  Just take the output produced by any
> utility when invoked with, say, the --help option, and try imagining
> how this will look when translated into a language that uses RTL
> script.

First, having no autodetection by default but rather an explicit
control for the overall direction hopefully mitigates this problem.
Second, I outline a possible future extension with a different
definition of a "paragraph", maybe something between empty lines, or
other kinds of explicit markers.

> So I think determination of the paragraph direction even in this
> simplest case cannot be left to the UBA defaults, and there's a need
> to use "higher-level" protocols for paragraph direction.

That higher level protocol is part of my recommendation, part of ECMA
TR/53, as the SCP sequence.

Does this make sense?


cheers,
egmont


More information about the Unicode mailing list