Proposal for BiDi in terminal emulators

Egmont Koblinger via Unicode unicode at unicode.org
Sat Feb 2 06:18:03 CST 2019


Hi Richard,

On Sat, Feb 2, 2019 at 12:43 PM Richard Wordingham via Unicode
<unicode at unicode.org> wrote:

> I'm not conversant with the details of terminal controls and I haven't
> used fields.  However, where I spoke of lines above, I believe you can
> simply translate it to fields.  I don't know how one best handles
> fields - are they a list, possibly of rows within fields, or are they
> stored as cell attributes?

The very essential is that the terminal emulator stores "cells".
Pretty much all the data (with very few exceptions) resides in cells.

A cell contains a base letter, followed by possibly a few non-spacing
marks. A cell has a foreground color, background color, bold,
underlined, italic etc. properties.

How these cells are linked up, in an array or whatever, is mostly
irrelevant since it's likely to be different in every implementation.

Of course it is possible to extend the per-cell storage to contain a
"previous" and a "next" character, as to be used for shaping purposes
only. Some questions: Is this enough (e.g. aren't there cases where
more than the immediate neighbor are relevant)? Is the next base
character enough, or do we also need to know the combining accents
that belong to that? And can't we store significantly less information
than the actual letter (let's say, 1 out of 13 [randomly made up
number] possible ways of shaping)?

Terminal emulators potentially store a lot of data (some even support
infinite scrolling), and try to handle them in some effective way.
That is, they do all sorts of bitpacking and crazy stuff. E.g. some
might reject adding new attributes when the per-cell size of the
attribute would extend 4 or 8 bytes, both for memory and performance
reasons. Another example: VTE has one global pool of all the base
character + combining accents combos that it has encountered, and
starts assigning single codepoints to them from U+10000000 or so, so
that then for each cell the base letter + combining accents still
don't require more storage than 4 bytes.

The takeaway is: the less data we need to remember per cell, the
better, and every bit matters.

But to recap, we're now just peeking into a possible future extension
of the specs to see if it's viable (I guess it is), which I believe
emulators might reasonably decide not to implement, if they think
performance is more important than proper shaping in all the special
cases.


cheers,
egmont


More information about the Unicode mailing list