Proposal for BiDi in terminal emulators

Egmont Koblinger via Unicode unicode at unicode.org
Fri Feb 1 08:15:53 CST 2019


Hi Richard,

On Fri, Feb 1, 2019 at 12:19 AM Richard Wordingham via Unicode
<unicode at unicode.org> wrote:

> Cropped why?  If the problem is the truncation of lines, one can simple
> store the next character.

Yup, trancation of line for example.

I agree that one could "store the next character". We could extend the
terminal emulation protocol where by some means you can specify that
column 80 contains a letter X, and even though there's no column 81,
an app can still tell the terminal emulator that it should imagine
that column 81 contans the letter Y, and perform shaping accordingly.

This will need to be done not just at the end of the terminal, but at
any position, and for both directions. Think of e.g. a vertically
split tmux. You should be able to tell that column 40 contains X which
should be shaped as if column 41 contained Y, and column 41 contains Z
which should be shaped as if column 40 contained A.

What I canont see at all is how this could be "simply". Could you
please elaborate on that? I don't find this simple at all!

>> > It's not able to
> > separate different UI elements that happen to be adjacent in the
> > terminal, separated by different background color or such.
>
> ZWJ and ZWNJ can handle that.

Wouldn't it be a semantical misuse of these characters, though?

They are supposed to be present in the logical order, and in logical
order (that is: the terminal's implicit mode) they can work as
desired.

Are they okay to be present in visual order (the terminal's explicit
mode, what we're discussing now) too?

Anyway, ZWJ/ZWNJ aren't sufficient to handle the cases I outlined above.


> If a general text manipulating application, e.g. cat, grep or awk, is
> writing to a file, it should not convert normal Arabic characters to
> presentation forms.  You are now asking a general application to
> determine whether it is writing to a terminal or not, and alter its
> output if it is writing to a terminal.

No, this absolutely not what I'm talking about!

There are two vastly different modes of the terminal. For "cat",
"grep" etc. the terminal will be in implicit mode. Absolutely no BiDi
handling is expected from these apps, the terminal will do BiDi and
shaping (perhaps using Harfbuzz; perhaps using presentation form
characters as a temporarily low hanging fruit until a better one is
implemented – the choice is obviously up to the implementation and not
to the specification).

For "emacs" and friends, an explicit mode is required where visual
order is passed to the terminal. What we're discussing is how to
handle shaping in this mode.


> But it as an issue that needs to be addressed.  As a terminal can be
> addressed by cell, an application may need to keep track of what text
> went into each cell. Misery results when the application gets it wrong.

My recommendation doesn't change this principle at all. In the lower
(emulation) layer every character still goes into the cell it used to
go to, and is addressable using cursor motion escapes and so on
exactly as without BiDi.


> How many cells do CJK ideographs occupy?  We've had a strong hint
> that a medial BEH should occupy one cell, while an isolated BEH should
> occupy two.

CJK occupy two, but they do regardless of what's around them. That is,
they already occupy two cells in the logical buffers, in the emulation
layer.

There is absolutely no sane way we can make in terminal emulation a
character's logical width (as in number of cells it occupies) depend
on its neighboring characters. (And even if we could by some terrible
hacks, it would break the principle you just said as "misery
results...", and the principle Eli said that things should remain
reasonably simple, otherwise hardly anyone will bother implementing
them.) This is a compromise Arabic folks will have to accept.

When displayed, it's up for terminal emulators to perhaps
enwiden/shrink cells as it wants to (they might even totally give up
on monospace fonts), but then they'll risk vertical lines not aligning
up perfectly vertically, content overflowing on the right etc. Konsole
does such things.


cheers,
egmont



More information about the Unicode mailing list