Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

Egmont Koblinger via Unicode unicode at unicode.org
Mon Feb 4 18:32:34 CST 2019


Hi Eli,

> I think it's unreasonable and impractical to expect 'echo', 'cat', and
> its ilk to emit bidi controls (or any other controls) to force
> paragraph direction.  For starters, they won't know what direction to
> force, because they don't understand the text they are processing.

I agree, it is unreasonable for 'echo', 'cat' etc. to emit BiDi controls.

There could be some higher level helper utiities though, let's say a
"bidi-cat" that examines the file, makes a guess, emits the
corresponding escape sequences and cats the file. It's not necessarily
a good approach, but a possible one (at least temporarily until
terminals implement a better one).

On the other hand, it's not unreasonable for higher level stuff (e.g.
shell scripts, or tools like "zip") to use such control characters.

> No, this simple case must work reasonably well with the application
> _completely_ oblivious to the bidi aspects.  If this can't work
> reasonably well, I submit that the entire concept of having a
> bidi-aware terminal emulator doesn't "hold water".

There isn't a magic wand. I can't magically fix every BiDi stuff by
changing the terminal emulator's source code. Not because I'm clumsy,
but because it just can't be done. If it was possible, I wouldn't have
written a long specification, I would have just done it. (Actually, if
it was possible, others would have sure done it long before I joined
terminal emulator development.)

There need to be multiple modes, some of them due to the technical
particularities of terminal emulation that aren't seen elsewhere (e.g.
explicit vs. implicit), and some of them because they are present
everywhere where it comes to BiDi (e.g. paragraph direction). And if
the mode is not set correctly, things might break, there's nothing new
in it.

What my specification essentially modifies is that with this
specification, you at least will have a chance to get the mode right.

Currently there are perhaps like 4 different behaviors implemented
across terminal emulators when it comes to BiDi. An application cannot
control and cannot query the behavior. In order to get Emacs behave
properly, you have to ask your users to adjust a setting (and I cannot
repeat enough times that I find this an unacceptable user experience).
If the settings of the terminal aren't what Emacs expects, the result
could be broken (RTL words might even show up in reverse, LTR order).

The same goes for the random example of "zip -h", assuming that they
add Hebrew translation. Given the current set of popular terminal
emulators, there's no way zip could emit some Hebrew text in a
reliably readable way. Whatever it does, there will be terminal
emulators (and settings thereof) where the result is totally broken
(reversed), or at least unpleasant (wrong paragraph direction used).
Moreover, if "zip" emits the Hebrew text in the semantically correct
logical order (e.g. they use whatever existing framework, like gettext
and a popular .po editor), as opposed to the visual LTR order seen in
some legacy systems, it will need different terminal emulator settings
than Emacs, so if someone uses both zip and Emacs regularly, they'll
have to continuously toggle their terminal's settings back and forth –
have I mentioned how unacceptable I find this as a user? :)

One of the key points of my specification is that applications will be
able to automatically set the mode. Emacs will be able to switch to
the mode it requires, and so will be zip. They will have the
opportunity.

If they don't live with this opportunity, it's not my problem, and
there's nothing I could do about it. Let's say hypothetically that zip
adds Hebrew translations, but refuses to emit the escape sequence that
switches to RTL paragraph direction, and thus its result doesn't look
perfect. Can terminal emulators, can my specification, can me be
blamed in this case? I don't think so. If zip knows exactly what it
wants to print (as with the help page it knows for sure), and is given
all the technical infrastructure to reliably achieve that, it'd be
solely them to blame if they refused to properly use it. It's
absolutely out of the scope of my work to try to fix this case.

"cat" is substantially different. In case of "zip", the creators of
that software know exactly how the output should look like, and
according to my specification (assuming a confirming terminal
emulator, of course) nothing stops them from achieving it. "cat"
doesn't know, cannot know the desired look, since the file itself
lacks this information.

Paragraph direction is a concept that sucks big time. (I have no idea
how Unicode could have got it better, though.) It's a piece of
information that needs to be carried externally along with the text,
in order to make sure it'll be displayed correctly. It's a pain in the
butt, just as much carrying the encoding in the pre-Unicode days was,
and hardly anyone cared about, resulting in incorrect accented letters
way too often. Practically everyone's lazy and doesn't carry the
paragraph direction, or there isn't even a place for carrying. Should
there be a meta bit on the filesystem for plain text files, or what?
In practice, often you just guess.

I understand your worries that for the "cat file" use case, it would
be great to have a mode of the terminal emulator where the entire
file's direction is guessed at once, and then applied to each of its
paragraphs (whereas "paragraph" can still be reasonably defined in at
least two ways). I second that it would be great to have such a mode,
but as I've detailed in a previous mail, we don't have the necessary
technical information (boundaries of a command's output) for this.
That is why I put this on hold for now.

"zip -h" is in a much better situation, it knows what it wants to
print, knows what mode (e.g. what paragraph direction) is required for
that, and as of my proposal, will be able to switch the terminal to
that mode.

>     -A   adjust self-extracting exe   -J   junk zipfile prefix (unzipsfx)
>     -T   test zipfile integrity       -X   eXclude eXtra file attributes
>     -!   use privileges (if granted) to obtain all aspects of WinNT security
>
> Do you see how this is carefully formatted to avoid overflowing an
> 80-column line of a typical terminal

On a totally side note:

If you're about to internationalize your software, this layout is a
pretty bad choice.

It's handcrafted, and requires each translator to handcraft the
translation, manually fiddle with the number of spaces. Which in turn
requires that translators are familiar enough with the software to
compile and test it with the work-in-progress translations, which
often isn't the case.

It's only translateable as one giant unit of text, rather than each
message separately. (Or, a somewhat complex formatting engine needs to
be implemented in the app, e.g. to decide which string spans across
both columns.) With one giant unit, translators have a hard time to
spot changes, and every change results in fuzziness, that is, when a
new option is introduced, even the old ones will revert to English
until the translator catches up.

This kind of formatting also ignores that English is a pretty dense
language, in other languages the strings tend to become longer.

Anyway, with my BiDi proposal, zip will have the chance to produce
whatever beautiful handcrafted two-column layout that it wants to have
for its Hebrew help page. (How they handcraft and later maintain it is
not my concern.) In addition to printing the translated text, they'll
also have to switch the terminal into whichever BiDi mode of their
choice which corresponds to the text. I just cannot reliably guess it
in the terminal for them.


cheers,
egmont



More information about the Unicode mailing list