Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

Egmont Koblinger via Unicode unicode at unicode.org
Thu Feb 7 12:37:46 CST 2019


Hi Philippe,

On Thu, Feb 7, 2019 at 3:21 PM Philippe Verdy <verdy_p at wanadoo.fr> wrote:

> "Rules" are not formally written, they are just a sense of best practices.

When it comes to BiDi in terminals, I haven't seen anything that I
consider reasonably okay, let alone "best practice". It's a mess.
That's why I decided to come up with something.

> Bidi plays very badly on terminals

Agreed. There's essentially two ways from here: just leave it as bad
as it is (or even see various terminal emulators coming up with not
well-thought-out hacks that just make it even worse) or try to
improve. I picked the latter.

> [...] refreshing a typical 80x25 screen takes about one half second, which is much longer than typical user input, so full screen refresh does not work for data input and editing, and terminals implement themselves the echo of user input, ignoring how and when the receiving application will handle the input, and also ignoring if the applciation is already sending ouput to the terminal.

I'm really unsure where you're trying to get with it.

For one, adding BiDi doesn't introduce the need for significantly
larger updates. Whenever a partial repaint of the screen was
sufficient, even with BiDi in the game it will remain sufficient.

Another thing: I'm not sure that 9.6kbps is a bottleneck to worry
about. It's present if you connect to a device via serial port, but
will you really do this in combination with BiDi? The use case I much
more have in mind is running a terminal emulator locally, or ssh'ing
to a remote matchine, for getting various kinds of productive work
done (e.g. wriiting a text file in someone's native RTL script in a
text editor). These are magnitudes faster.

> It's hard or impossible to synchroinize this and local echoes on the terminal causes havoc.

If input mixes with output (e.g. you press some keys while you're
waiting for make/gcc to compile your app, and these letters appear
onscreen), the visual result is broken even without BiDi. I cannot
elimite this kind of breakage by introducing BiDi, nor can I build up
something from scratch that somewhat resembles the current terminal
emulator world but fixes all of its oddnesses.

> But the concept of "line" or "paragraph" in a terminal protocols is extremely fuzzy. It's then very difficult to take into account the additiona Bidi contraints as it's impossible to conciliate BOTH the logical ordering (what is encoded in the transmitted data or kept in history buffers) and the visual ordering.

I don't try to conciliate logical and visual ordering within the same
paragraph, I agree it's impossible, it's a semantical nonsense. But I
try to conciliate them in the sense that sometimes the visual order is
the desired one, sometimes the logical order, so let's make it
possible to use one for one paragraph, and the other one for another
paragraph.

> That's why there are terminal protocols that absolutely don't want to play with the logical ordering and require all their data to be transmitted in visual order (in which case, there's no bidi handling at all).

This is one of the modes in my recommendation. If your application
requires this mode (as e.g. Emacs does), use this mode and you're
good.

> In fact most terminal protocols are very defective and were never dessign to handle Bidi input

Maybe it's high time someone fixed this defect, then? :)

> And here your unit (logical lines) is not even defined in the terminal protocol and not known from the meitting applications whjich has no input about the final output terminal properties. So the terminal must perform guesses. As it can insert additional linebreaks itself, and scroll out some portion of it, there's no way to delimit the effect of "bidi controls". The basic requirement for correctly handling bidi controls is to make sure that paragraph delimitations are known and stable. if additional breaks can occur anywhere on what you think is a "logical line" but which is different from the mietting application (or static text document which is ouput "as is" without any change to reformat it, these bidi controls just make things worse and it becomes impossible to make reasonnable guesses about paragraph delimitations in the terminal. The result become unpredictable and most often will not even make any sense as the terminal uses visual ordering always but looses the track of the logical ordering (and things get worse when there are complex clusters or characters that cannot even fit in a monospaced grid.

If an exact definition of hard vs. soft wrapped lines is what you miss
from the specification, okay, I'll add it to a future version.

I don't know how terminals performing guesses occured to you, they
sure don't (as for hard vs. soft newlines).

> The basic requirement for correctly handling bidi controls is to make sure that paragraph delimitations are known and stable.

Since we're talking about bidi controls being emitted, we must be
talking about the implicit mode of the terminal (as per ECMA's and my
specification). Even without BiDi, you can have something on the
screen, move the prompt upwards, and then "cat" a file. The result
will partially overwrite the existing contents, and partially leave
them there. The result will be unreadable, broken. So will it be with
BiDi.

Now, with regular use case of printing to unused (empty) area, the
handling of soft vs. hard newlines is consistent across all terminal
emulators I could test. The terminals remember exactly when a newline
was printed vs. where the contents wrapped to the next line, and
nothing prevents them from doing BiDi accordingly – which my
specification says they need to do. Surprisingly all of PuTTY's,
Konsole's, Mlterm's and Terminal.app's developers got it wrong and
they do BiDi on the physical lines. This is just one example of how
broken the current state of BiDi is, and why it should be fixed.

> Terminal emulators only perform guesses, most of these guesees are valid only with "simple" scripts with one character per cell, assuming a minimum resolution of each cell (the minimum is a 8x8 pixel square, too small for Asian scripts, but typical for rendering on old analog TVs; the typical one is a half-width rectangle, not really much larger, but about 50% taller, and with many Asian scripts still do not fit well). These protocosl were just made for Latin, and similar simpler scripts (Cyrillic, Greek, and simple Japanese scripts, or Hangul jamos ignoring clusters and presented only with halfwidth characters, ignoring all complex clusters). For everything else, there's no defined behavior, no support, no reference documentation, everything is untested, you get extremely variable results, the ouput could be completely garbled and unreadable.

I'm really lost: what kind of guesses are you talking about, and how
are font sizes or anything else you're talking about relevant?

If there's one thing terminal emulators really don't do, then that's
guessing. All the terminal emulators are pretty much a deterministic
state machine.

> If you want to play well with most terminals you have to limit a lot wht you can do with "terminal protocols" and strictly limit your use of controls. In fact the only "stable" thing which works more or less is the basic MIME plain text profile which just need uses a single encoding for ALL kinds of newlines (and competely ignores the distinction between the 3 main kind of line breaks). That's where you need to insert hints: basically the encoded text have to assume a minimum display width, and any "line" longer than about 70 character cells is assumed to be fllowed on the next line, unless that next line is empty, and Bidi controls is not used at all but guessed from characters properties at "reasonnable" paragraph boundaries detemined heuristically by the terminal emulator but not encodable in the data stream itself.

If you wish to create a terminal-based application that can display
Hebrew text, assuming nothing more from the underlying terminal
emulator that it can present these glyphs, you're already lost.

Most of the terminal emulators can only lay out the glyphs from left
to right. That is, you need to emit visual order, that is, the reverse
of the logical order for Hebrew.

Some terminal emulators, with their default settings, will again run
the BiDi algorithm and reverse these back to the incorrect order.
Bummer!

VTE is about to join this latter team, but also introduces an escape
sequence to turn this behavior off. You can start emitting this escape
sequence from your application. VTE will understand it. Other
emulators won't and will still display the word in reversed order. (A
point in my specification is to get it standardized and get all these
other BiDi-aware emulators catch up and recognize this escape
sequence.)

Some other terminal emulators misinterpret this escape sequence
(although it's part of ECMA) and do something different. They'll also
have to be fixed.

The current state of terminal emulators literally doesn't give you a
common minimum on top of which you can do any Hebrew or Arabic by any
means.

Of course my specification cannot fully fix this: If you still pick a
terminal emulator not conforming to this spec, you'll still be out of
luck.

What it can do, among many things is: to bring all the BiDi-aware
terminal emulators into a common base, one that also aligns with the
non-BiDi-aware ones (subject to them not misinterpreting the
BiDi-related sequence).

>> > - use a single newline on continuation
>>
>> Continuation of what exactly?>
>
> Continuation of paragraphs on the next visual line. I think this did not required any precision, it was sufficient on the existing context where you extracted this word, or did not read anything.

When I ask for clarification, I ask for clarification because I didn't
understand, and not for assumptions that I may not have read anything,
or so.

As you can see from previous discussions, there's a whole lot of
confusion about the terminology. E.g. even "paragraph" has multiple
incompatible definitions, this has caused a lot of misunderstanding
between Eli and me until we realized we were actually talking about
the same thing. Thus, when you clarify as "continuation of
paragraphs", I still cannot be fully sure that your message came along
as you intended, because which "paragraph" among the multiple
definitions? Plus there are a whole lot more things you can continue,
e.g. the list of command line options with the next entry (to refer
back to the previous example with "zip"). Nevermind, let's forget it.

Philippe, with all due respect, I have the feeling that you have some
fundamental problems with my work (and I'm temped to ask back: have
you read it at all?), but your message what your problem is just
doesn't come across to me. Could you please avoid all those irrelevant
stories with baud rate and font size and Asian scripts and whatnot,
and clearly get to your point?


cheers,
egmont



More information about the Unicode mailing list