Bidi paragraph direction in terminal emulators

Richard Wordingham via Unicode unicode at unicode.org
Sat Feb 9 19:25:14 CST 2019


On Sat, 9 Feb 2019 22:29:31 +0100
Adam Borowski via Unicode <unicode at unicode.org> wrote:

> On Sat, Feb 09, 2019 at 10:01:21PM +0200, Eli Zaretskii via Unicode
> wrote:

> > I don't know.  Maybe it keeps a database of character combinations
> > that need shaping, each one with the maximum width on display the
> > result can occupy.  Or maybe it does something else.  If it cannot,
> > and the terminal cannot either, then what you say is that some
> > scripts can never be supported by text terminals.  
> 
> That's doable even within the current rules, where every codepoint
> bears a wcwidth of 0, 1 or 2.  A cluster made of codepoints a ' b c d
> " ^ (where a b c d have widths 1 while ' " ^ widths 0) needs to be
> rendered in exactly 4 cells.  This may force stretching or condensing
> the shaped cluster compared to what usual typography would demand but
> that's in no way different from stretching Latin "i" or condensing
> "W".

It would be helpful if overlong shapings were condensed automatically.

The general principle that functions work better on strings applies
here.  There are two obvious situations where the additive formulae
break down.

(a) Emoji should, should they not, occupy at least 2 cells.  There are
a few problem sequences, such as <U+0031, U+FE0F, U+20E3> (or is
wcwidth(0x20E3) equal to 1?).

(b) Brahmi-like Indic scripts.  In many of these, the combination of a
virama or invisible stacker and a base consonant acts like a combining
mark, either causing no advance or as a mark with a very slight width.
Examples include Grantha, Myanmar, Tai Tham and Khmer.

Stretching a stack of 3 or 4 consonants to occupy 3 or 4 cells instead
of 1 would be worse than stretching 'i'.  If you do it, you want fonts
that adjust the glyphs accordingly, just as for 'i'.

Richard.


More information about the Unicode mailing list