Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

Eli Zaretskii via Unicode unicode at unicode.org
Sun Feb 3 11:50:50 CST 2019


> From: Egmont Koblinger <egmont at gmail.com>
> Date: Sun, 3 Feb 2019 17:54:25 +0100
> Cc: unicode at unicode.org
> 
> I'm arguing, although my reasons are not rock solid, that IMHO the
> default should be the strict direction as set by SCP, without
> autodetection.

I think it's unreasonable and impractical to expect 'echo', 'cat', and
its ilk to emit bidi controls (or any other controls) to force
paragraph direction.  For starters, they won't know what direction to
force, because they don't understand the text they are processing.

No, this simple case must work reasonably well with the application
_completely_ oblivious to the bidi aspects.  If this can't work
reasonably well, I submit that the entire concept of having a
bidi-aware terminal emulator doesn't "hold water".

> > The fundamental problem here is that most "simple" utilities use hard
> > newlines to present text in some visually plausible format.
> 
> Could you please list examples?

Just redirect any of them to a file, and look at the file with a hex
editor.  You will see a hard newline character, 0x0A, at the end of
each line.

> What I have in mind are "echo", "cat", "grep" and alike, they don't
> care about the terminal width.

Terminal width is not always relevant here, and I didn't mention it.
However, as long as you allude to that, I think your garden-variety
text utility does assume the width of a terminal window is 80 columns,
and the messages displayed by these programs are formatted
accordingly.

> If an app cares about the terminal width, how does it care about it?
> What does it use this information for? To truncate overlong strings,
> for example?

To break long lines at appropriate places, and to emit text that fits
on a line in the first place.

Just try invoking any such utility with the --help option, and you
will see what I mean.  I give one example below.

> At this very moment I'd argue that such applications need
> to do BiDi on their own, and thus set the terminal to explicit mode.
> In ap app does any kind of string truncation, it can no longer
> delegate the task of BiDi to the terminal emulator.

I'm afraid this won't fly, because most "simple" utilities do it that
way.  If you insist on them doing their own bidi, you've just lost
your cause.  No upstream developer will be interested in adapting
their utilities to a terminal emulator that requires them to do their
own bidi.

> I'm also mentioning that you cannot both logically and visually
> truncate a BiDi string at once.

I don't understand why you talk about truncation; I didn't.

Here, look at this random example:

  Copyright (c) 1990-2008 Info-ZIP - Type 'zip "-L"' for software license.
  Zip 3.0 (July 5th 2008). Usage:
  zip [-options] [-b path] [-t mmddyyyy] [-n suffixes] [zipfile list] [-xi list]
    The default action is to add or replace zipfile entries from list, which
    can include the special name - to compress standard input.
    If zipfile and list are omitted, zip compresses stdin to stdout.
    -f   freshen: only changed files  -u   update: only changed or new files
    -d   delete entries in zipfile    -m   move into zipfile (delete OS files)
    -r   recurse into directories     -j   junk (don't record) directory names
    -0   store only                   -l   convert LF to CR LF (-ll CR LF to LF)
    -1   compress faster              -9   compress better
    -q   quiet operation              -v   verbose operation/print version info
    -c   add one-line comments        -z   add zipfile comment
    -@   read names from stdin        -o   make zipfile as old as latest entry
    -x   exclude the following names  -i   include only the following names
    -F   fix zipfile (-FF try harder) -D   do not add directory entries
    -A   adjust self-extracting exe   -J   junk zipfile prefix (unzipsfx)
    -T   test zipfile integrity       -X   eXclude eXtra file attributes
    -!   use privileges (if granted) to obtain all aspects of WinNT security
    -$   include volume label         -S   include system and hidden files
    -e   encrypt                      -n   don't compress these suffixes
    -h2  show more help

Do you see how this is carefully formatted to avoid overflowing an
80-column line of a typical terminal?  Now suppose this is translated
into a RTL language, which causes the Copyright line to start with a
strong R letter (because "Copyright" is translated).  You will see the
first line flushed to the right margin, then the next line flushed to
the left margin (because it's a separate paragraph, and starts with a
strong L letter).  Then the line which says "The default action..."
will again start at the right.  And so on and so forth -- the result
is extremely ugly.

> > Even when
> > these utilities just emit text read from files (as opposed to
> > generating the text from the program), you will normally see each line
> > end with a hard newline, because the absolute majority of text files
> > have a hard newline and the end of each line.
> 
> How does a BiDi text file look like, to begin with?

Exactly like any other text file, just with some of the characters
belonging to RTL scripts.

> Can a heavily BiDi text file be formatted to 72 (or whatever)
> columns using explicit newlines, keeping BiDi both semantically and
> visually correct?

Of course.

> I truly doubt that.

Why is that?

> Can you show me such files?

See, for example, the Hebrew tutorial in Emacs, TUTORIAL.he.

Please note that Emacs bumps into this problem all the time, because
almost always text buffers in Emacs use hard newlines, whether their
text came from files or was just typed by the user.  E.g., most
plain-text email messages use hard newlines, and the Emacs built-in
MUAs produce such plain-text messages; using "flowed" text is much
more rare.  Emacs has an "Auto-Fill mode" which automatically inserts
a hard newline and starts a new line when the current line exceeds a
given column number, and Emacs users typing text usually enable this
mode (as did I when typing this message).

So how to determine base paragraph direction in a sane way was about
the first problem I needed to solve when I made Emacs support bidi.

> First, having no autodetection by default but rather an explicit
> control for the overall direction hopefully mitigates this problem.

It doesn't, IMO, because it requires the applications to understand
enough to emit the correct control.  Most simple text-processing
utilities are not that smart.

> Second, I outline a possible future extension with a different
> definition of a "paragraph", maybe something between empty lines, or
> other kinds of explicit markers.

I think this kind of extension cannot be deferred to some "future", it
must be there in the very first version you produce.  Otherwise, the
result will be so unpleasant that people will be averted.

> > So I think determination of the paragraph direction even in this
> > simplest case cannot be left to the UBA defaults, and there's a need
> > to use "higher-level" protocols for paragraph direction.
> 
> That higher level protocol is part of my recommendation, part of ECMA
> TR/53, as the SCP sequence.

It must be the default, a necessary part of any compliant emulator.
That's my opinion based on my experience, anyway.


More information about the Unicode mailing list