Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
Philippe Verdy via Unicode
unicode at unicode.org
Thu Feb 7 08:20:56 CST 2019
Le jeu. 7 févr. 2019 à 13:29, Egmont Koblinger <egmont at gmail.com> a écrit :
> Hi Philippe,
> > There's some rules for correct display including with Bidi:
> In what sense are these "rules"? Where are these written, in what kind
> of specification or existing practice?
"Rules" are not formally written, they are just a sense of best practices.
Bidi plays very badly on terminals (even enhanced terminals like VT-* or
ANSI that expose capabilities when, most of the time, these capabilities
are not even accessible: it is too late and further modifications of the
terminal properties (notably its display size) can never be taken into
account (it is too late, the ouput has been already generated, and all what
the terminal can do is to play with what is in its history buffers). Even
on dual-channel protocols (input and output), terminal protocols are also
not synchronizing the input and the output and these asynchrnous channels
ignore the transmission time between the terminal and the aware
application, so the terminal protocol must include a functio nthat allows
flushing and redrawing the screen completely (but this requires long
delays). With a common 9.6kbps serial link, refreshing a typical 80x25
screen takes about one half second, which is much longer than typical user
input, so full screen refresh does not work for data input and editing, and
terminals implement themselves the echo of user input, ignoring how and
when the receiving application will handle the input, and also ignoring if
the applciation is already sending ouput to the terminal.
It's hard or impossible to synchroinize this and local echoes on the
terminal causes havoc.
I've not seen any way for a terminal to handle all these constraints. So
the only way for them is to support them only plain-text basic documents,
formatted reasonnably, and inserting layout "hints" in the format of their
output so that termioanl can perform reasonnable guesses and adapt.
But the concept of "line" or "paragraph" in a terminal protocols is
extremely fuzzy. It's then very difficult to take into account the
additiona Bidi contraints as it's impossible to conciliate BOTH the logical
ordering (what is encoded in the transmitted data or kept in history
buffers) and the visual ordering. That's why there are terminal protocols
that absolutely don't want to play with the logical ordering and require
all their data to be transmitted in visual order (in which case, there's no
bidi handling at all). Then terminals will attempt to consiliate the visual
line delimitations (in the transmitted data) with the local-only
capabilities of the rendered frame. Many terminals will also not allow
changing the display width, will not allow changing the display cell size,
will force constraints on cell sizes and fonts, and then won't be able to
correctly output many Asian scripts.
In fact most terminal protocols are very defective and were never dessign
to handle Bidi input, and Asian scripts with compelx clusters and variable
fonts that are needed for them (even CJK scripts which use a mix of
"half-wifth" and "full-width" characters.
> - Separate paragraphs that need a different default Bidi by double
> newlines (to force a hard break)
> There is currently no terminal emulator I'm aware of that uses empty
> lines as boundaries of BiDi treatment.
These are hint in absence of something else, and it plays a role when the
terminal disaply width is unpredicable by the application making the output
and having no access to any return input channel.
Take the example of terminal emulators in resizable windows: the display
width is undefined, but there's not any document level and no buffering,
scrolling text will flush the ouput partially, history is limited A
terminal emulator then needs hints about where paragrpahs are delimited and
most often don't have any other distinctions available even in their
limited history that allows distinguishing the 3 main kinds of line breaks.
> While my recommendation uses a one smaller unit (logical lines), and I
And here your unit (logical lines) is not even defined in the terminal
protocol and not known from the meitting applications whjich has no input
about the final output terminal properties. So the terminal must perform
guesses. As it can insert additional linebreaks itself, and scroll out some
portion of it, there's no way to delimit the effect of "bidi controls". The
basic requirement for correctly handling bidi controls is to make sure that
paragraph delimitations are known and stable. if additional breaks can
occur anywhere on what you think is a "logical line" but which is different
from the mietting application (or static text document which is ouput "as
is" without any change to reformat it, these bidi controls just make things
worse and it becomes impossible to make reasonnable guesses about paragraph
delimitations in the terminal. The result become unpredictable and most
often will not even make any sense as the terminal uses visual ordering
always but looses the track of the logical ordering (and things get worse
when there are complex clusters or characters that cannot even fit in a
The current behavior of terminal emulators is very far from what you
Terminal emulators only perform guesses, most of these guesees are valid
only with "simple" scripts with one character per cell, assuming a minimum
resolution of each cell (the minimum is a 8x8 pixel square, too small for
Asian scripts, but typical for rendering on old analog TVs; the typical one
is a half-width rectangle, not really much larger, but about 50% taller,
and with many Asian scripts still do not fit well). These protocosl were
just made for Latin, and similar simpler scripts (Cyrillic, Greek, and
simple Japanese scripts, or Hangul jamos ignoring clusters and presented
only with halfwidth characters, ignoring all complex clusters). For
everything else, there's no defined behavior, no support, no reference
documentation, everything is untested, you get extremely variable results,
the ouput could be completely garbled and unreadable.
The situation is then worse for interactive applications (notably full
screen text editors, including vi(m) and emacs) using these terminal
protocols over slow unsynchronized dual links.
If you want to play well with most terminals you have to limit a lot wht
you can do with "terminal protocols" and strictly limit your use of
controls. In fact the only "stable" thing which works more or less is the
basic MIME plain text profile which just need uses a single encoding for
ALL kinds of newlines (and competely ignores the distinction between the 3
main kind of line breaks). That's where you need to insert hints: basically
the encoded text have to assume a minimum display width, and any "line"
longer than about 70 character cells is assumed to be fllowed on the next
line, unless that next line is empty, and Bidi controls is not used at all
but guessed from characters properties at "reasonnable" paragraph
boundaries detemined heuristically by the terminal emulator but not
encodable in the data stream itself.
> > - use a single newline on continuation
> Continuation of what exactly?
Continuation of paragraphs on the next visual line. I think this did not
required any precision, it was sufficient on the existing context where you
extracted this word, or did not read anything.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode