Bidi paragraph direction in terminal emulators

Adam Borowski via Unicode unicode at unicode.org
Sat Feb 9 15:29:31 CST 2019


On Sat, Feb 09, 2019 at 10:01:21PM +0200, Eli Zaretskii via Unicode wrote:
> > From: Egmont Koblinger <egmont at gmail.com>
> > Date: Sat, 9 Feb 2019 20:36:50 +0100
> > Cc: Richard Wordingham <richard.wordingham at ntlworld.com>, 
> > 	unicode Unicode Discussion <unicode at unicode.org>
> > 
> > On Sat, Feb 9, 2019 at 8:13 PM Eli Zaretskii <eliz at gnu.org> wrote:
> > 
> > > That's the application's problem, not the terminal's.  An application
> > > that wants its column to line up _and_ wants to support complex text
> > > scripts will need to move cursor to certain coordinates, not to assume
> > > that 7 codepoints always take 7 columns on display.

It must know that those particular 7 codepoints take, say, 5 columns when
written together in a sequence.  And it can't possibly ask the terminal,
either -- it might be on a link that doesn't allow metadata to pass, it
might be broadcasted, its output might be recorded many years prior to being
displayed.  A good part of the time the program is even run on a different
distribution/release/OS.

Obviously, a program running with system libraries might suffer misalignment
and thus visual corruption if those libraries don't know beyond, say,
Unicode 13 yet the terminal expects Unicode 17 -- but that's no different
from any other property incompatibly changing.  Property changes for
established characters are pretty rare thus no significant loss of
interoperability can be expected over time.

> > In order to do that, an application needs to know how wide a text will
> > appear, which depends on the font. How will it know it?
> 
> I don't know.  Maybe it keeps a database of character combinations
> that need shaping, each one with the maximum width on display the
> result can occupy.  Or maybe it does something else.  If it cannot,
> and the terminal cannot either, then what you say is that some scripts
> can never be supported by text terminals.

That's doable even within the current rules, where every codepoint bears a
wcwidth of 0, 1 or 2.  A cluster made of codepoints a ' b c d " ^ (where a b
c d have widths 1 while ' " ^ widths 0) needs to be rendered in exactly 4
cells.  This may force stretching or condensing the shaped cluster compared
to what usual typography would demand but that's in no way different from
stretching Latin "i" or condensing "W".


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands
⢿⡄⠘⠷⠚⠋⠀ for Privacy.
⠈⠳⣄⠀⠀⠀⠀


More information about the Unicode mailing list