Proposal for BiDi in terminal emulators
Richard Wordingham via Unicode
unicode at unicode.org
Fri Feb 1 22:01:42 CST 2019
On Fri, 1 Feb 2019 15:15:53 +0100
Egmont Koblinger via Unicode <unicode at unicode.org> wrote:
> Hi Richard,
>
> On Fri, Feb 1, 2019 at 12:19 AM Richard Wordingham via Unicode
> <unicode at unicode.org> wrote:
>
> > Cropped why? If the problem is the truncation of lines, one can
> > simple store the next character.
>
> Yup, trancation of line for example.
>
> I agree that one could "store the next character". We could extend the
> terminal emulation protocol where by some means you can specify that
> column 80 contains a letter X, and even though there's no column 81,
> an app can still tell the terminal emulator that it should imagine
> that column 81 contans the letter Y, and perform shaping accordingly.
>
> This will need to be done not just at the end of the terminal, but at
> any position, and for both directions. Think of e.g. a vertically
> split tmux. You should be able to tell that column 40 contains X which
> should be shaped as if column 41 contained Y, and column 41 contains Z
> which should be shaped as if column 40 contained A.
>
> What I canont see at all is how this could be "simply". Could you
> please elaborate on that? I don't find this simple at all!
>
> >> > It's not able to
> > > separate different UI elements that happen to be adjacent in the
> > > terminal, separated by different background color or such.
> >
> > ZWJ and ZWNJ can handle that.
>
> Wouldn't it be a semantical misuse of these characters, though?
No. ZWNJ is used before the inanimate plural suffix of Persian, and in
at least one language, <HEH, ZWJ> is used to distinguish one usage from
the digit ٥ (or is it the digit ۵?).
> They are supposed to be present in the logical order, and in logical
> order (that is: the terminal's implicit mode) they can work as
> desired.
>
> Are they okay to be present in visual order (the terminal's explicit
> mode, what we're discussing now) too?
Where do you define the order for explicit mode?
There may be complications in ensuring that
<joiner control><letter><non-spacing marks><joiner control> gets stored
as the content of a single cell.
>
> Anyway, ZWJ/ZWNJ aren't sufficient to handle the cases I outlined
> above.
Example, please.
>
> > If a general text manipulating application, e.g. cat, grep or awk,
> > is writing to a file, it should not convert normal Arabic
> > characters to presentation forms. You are now asking a general
> > application to determine whether it is writing to a terminal or
> > not, and alter its output if it is writing to a terminal.
>
> No, this absolutely not what I'm talking about!
>
> There are two vastly different modes of the terminal. For "cat",
> "grep" etc. the terminal will be in implicit mode. Absolutely no BiDi
> handling is expected from these apps, the terminal will do BiDi and
> shaping (perhaps using Harfbuzz; perhaps using presentation form
> characters as a temporarily low hanging fruit until a better one is
> implemented – the choice is obviously up to the implementation and not
> to the specification).
>
> For "emacs" and friends, an explicit mode is required where visual
> order is passed to the terminal. What we're discussing is how to
> handle shaping in this mode.
(Partitioning grapheme clusters and Indic syllables)
> > But it as an issue that needs to be addressed. As a terminal can be
> > addressed by cell, an application may need to keep track of what
> > text went into each cell. Misery results when the application gets
> > it wrong.
>
> My recommendation doesn't change this principle at all. In the lower
> (emulation) layer every character still goes into the cell it used to
> go to, and is addressable using cursor motion escapes and so on
> exactly as without BiDi.
At present, VTE positions LTR Indic preceding spacing combining marks
after the consonant. I though your draft scheme corrected this very
local bidi issue, which is so local that the bidi algorithm ignores it.
>
>
> > How many cells do CJK ideographs occupy? We've had a strong hint
> > that a medial BEH should occupy one cell, while an isolated BEH
> > should occupy two.
>
> CJK occupy two, but they do regardless of what's around them. That is,
> they already occupy two cells in the logical buffers, in the emulation
> layer.
>
> There is absolutely no sane way we can make in terminal emulation a
> character's logical width (as in number of cells it occupies) depend
> on its neighboring characters. (And even if we could by some terrible
> hacks, it would break the principle you just said as "misery
> results...", and the principle Eli said that things should remain
> reasonably simple, otherwise hardly anyone will bother implementing
> them.) This is a compromise Arabic folks will have to accept.
So ព្រះ <U+1796 KHMER LETTER PO, U+17D2 KHMER SIGN COENG, U+179A KHMER
LETTER RO, U+17C8 KHMER SIGN > _preah_ 'prefix denoting
repect for gods, kings, etc.' will be three cells <្រ,ព,ៈ> = <(COENG,
RA), PO, YUUKALEAPINTU> and cause no confusion? Or will the cells be
<RA, (PO, COENG), YUUKALEAPINTU>?
Richard.
More information about the Unicode
mailing list