Bidi paragraph direction in terminal emulators
Adam Borowski via Unicode
unicode at unicode.org
Sat Feb 9 15:29:31 CST 2019
On Sat, Feb 09, 2019 at 10:01:21PM +0200, Eli Zaretskii via Unicode wrote:
> > From: Egmont Koblinger <egmont at gmail.com>
> > Date: Sat, 9 Feb 2019 20:36:50 +0100
> > Cc: Richard Wordingham <richard.wordingham at ntlworld.com>,
> > unicode Unicode Discussion <unicode at unicode.org>
> > On Sat, Feb 9, 2019 at 8:13 PM Eli Zaretskii <eliz at gnu.org> wrote:
> > > That's the application's problem, not the terminal's. An application
> > > that wants its column to line up _and_ wants to support complex text
> > > scripts will need to move cursor to certain coordinates, not to assume
> > > that 7 codepoints always take 7 columns on display.
It must know that those particular 7 codepoints take, say, 5 columns when
written together in a sequence. And it can't possibly ask the terminal,
either -- it might be on a link that doesn't allow metadata to pass, it
might be broadcasted, its output might be recorded many years prior to being
displayed. A good part of the time the program is even run on a different
Obviously, a program running with system libraries might suffer misalignment
and thus visual corruption if those libraries don't know beyond, say,
Unicode 13 yet the terminal expects Unicode 17 -- but that's no different
from any other property incompatibly changing. Property changes for
established characters are pretty rare thus no significant loss of
interoperability can be expected over time.
> > In order to do that, an application needs to know how wide a text will
> > appear, which depends on the font. How will it know it?
> I don't know. Maybe it keeps a database of character combinations
> that need shaping, each one with the maximum width on display the
> result can occupy. Or maybe it does something else. If it cannot,
> and the terminal cannot either, then what you say is that some scripts
> can never be supported by text terminals.
That's doable even within the current rules, where every codepoint bears a
wcwidth of 0, 1 or 2. A cluster made of codepoints a ' b c d " ^ (where a b
c d have widths 1 while ' " ^ widths 0) needs to be rendered in exactly 4
cells. This may force stretching or condensing the shaped cluster compared
to what usual typography would demand but that's in no way different from
stretching Latin "i" or condensing "W".
⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands
⢿⡄⠘⠷⠚⠋⠀ for Privacy.
More information about the Unicode