Bidi paragraph direction in terminal emulators

Sat Feb 9 11:42:52 CST 2019

Hi Richard,

On Sat, Feb 9, 2019 at 3:08 PM Richard Wordingham via Unicode
<unicode at unicode.org> wrote:

> It would be good to be able to access a maintained statement of the
> VTE rules for allocating characters to a cell, or group of cells, as
> appropriate.

What VTE did, up to a couple of days ago:

It opens the font, and measures the ASCII 33-126 or so characters,
takes their average size (well, in case of monospace font, they should
all have the same size), this determines the cell size.

Then every character cell is rendered individually, using Pango or
Cairo or I'm not sure what exactly – there are like three paths in the
source, the details are unclear to me. A cell might contain a base
character + nonspacing combining accents, these are passed together to
Pango and friends, so they render it as one unit. The glyph is aligned
to the left of its designated cell area, overflowing on the right (and
thus potentially overlapping with the next glyph) if it's wider than
its designated area.

As a special case, two adjacents cells might contain a double wide
(typically CJK) character, but it's not that special after all: it's
also displayed aligned to the left edge of its first cell.

What I improved a couple of days ago (to be released in vte-0.56), for
Devanagari and friends, although I know there's more than this to
address these scripts properly:

If a cell contains a regular letter, and the next cell contains a
spacing combining mark, then these two are passed to Pango in a single
step, that is, the spacing combining mark is applied around its base
letter by Pango as expected. (Previously the spacing combining mark
was rendered on its own, around a dotted circle, which was obviously
pretty bad.)

What I'm working on currently, as you all know by now, is
BiDi-shuffling the cells before rendering them (hopefully for
vte-0.58).

This is how VTE works now, but it's by no means a specification, and
tailoring a font to this behavior is probably not the right approach.
Instead, VTE's behavior should be improved. We have a pending feature
request (which I've already linked) to use HarfBuzz for rendering the
glyphs, which would then render grapheme clusters beautifully. The
problem that I don't know how to address is: What if harfbuzz tells us
that the overall width for rendering a particular grapheme cluster is
significantly different from its designated area (the number of
character cells [wcswidth()] multiplied by the width of each)?

cheers,
egmont

>
> > > (b) With a terminal that expects a fixed width font, surely the
> > > terminal decides how many cells it allocates to a group of
> > > characters, and the font designer has to come up with a suitable
> > > value based on that.
> >
> > Yes.  A terminal emulator that works with a shaper should probably
> > post-process the width information returned by the shaper for these
> > purposes.
>
> Perhaps it should base the number of cells on the width of the
> clusters.  However, continuing with my example, U+1789 KHMER LETTER NYO
> as a base character is too wide to fit in a cell, and the next
> character will overwrite its right-hand part. From this I deduce that it
> is allocated just one cell.  Gnome terminal is not alone in doing this,
> but it does better than some, in my opinion, in that the overflow of the
> foreground of one cell is not obliterated by the background of the
> next cell.  U+1789 has an East Asian width property of 'Neutral', which
> is distinctly unhelpful.
>
> What I would like is a specification of what a font must do to avoid
> such problems.
>
> > > >  I don't see how you can expect wcwidth, or any other
> > > > interface that was designed to work with _characters_, to be
> > > > useful when you need to display grapheme clusters.
>
> It, or something similar but worse, gets used, especially when moving
> the cursor for editing.
>
> > > Well I can envisage a decision being made that a grapheme cluster
> > > str (as decreed by the terminal) shall occupy wcswidth(str) cells -
> > > "The wcswidth() function returns the number of column positions for
> > > the wide-character string s, truncated to at most length n".
> >
> > AFAIU, the shaping engine returns its output in terms of font glyph
> > numbers, not character codepoints, so you cannot in general call
> > wcswidth on them.  The shaper also returns the advance information,
> > which serves instead of wcwidth and related APIs for determining the
> > actual width on display.
>
> Unfortunately, when the rectangular grid is being preserved,
> typographical advance width is generally ignored when determining the
> placement of characters.  Now, this is not always true; one can have
> the situation where the the positioning of characters respects the
> advance widths, but the positioning of the cursor assumes a fixed-width
> rectangular grid.  I have found working with that to be extremely
> confusing.
>
> Richard.
>