Proposal for BiDi in terminal emulators

Kent Karlsson via Unicode unicode at unicode.org
Fri Feb 1 17:38:04 CST 2019


Den 2019-02-01 19:57, skrev "Richard Wordingham via Unicode"
<unicode at unicode.org>:

> On Fri, 1 Feb 2019 13:02:45 +0200
> Khaled Hosny via Unicode <unicode at unicode.org> wrote:
> 
>> On Thu, Jan 31, 2019 at 11:17:19PM +0000, Richard Wordingham via
>> Unicode wrote:
>>> On Thu, 31 Jan 2019 12:46:48 +0100
>>> Egmont Koblinger <egmont at gmail.com> wrote:
>>> 
>>> No.  How many cells do CJK ideographs occupy?  We've had a strong
>>> hint that a medial BEH should occupy one cell, while an isolated
>>> BEH should occupy two.
>> 
>> Monospaced Arabic fonts (there are not that many of them) are designed
>> so that all forms occupy just one cell (most even including the
>> mandatory lam-alef ligatures), unlike CJK fonts.
>> 
>> I can imagine the terminal restricting itself to monspaced fonts,
>> disable ³liga² feature just in case, and expect the font to well
>> behave. Any other magic is likely to fail.
> 
> Of course, strictly speaking, a monospaced font cannot support harakat
> as Egmont has proposed.
> 
> Richard.

(harakat: non-spacing vowel mark in Arabic)

"Monospaced font" is really a concept with modification. Even for
"plain old ASCII" there are two advance widths, not just one: 0 for
control characters (and escape/control sequences, neither of which
should directly consult the font; even such things as OSC sequences,
but the latter are a bad idea to have in any line one might wish to
edit (vi/emacs/...) via a terminal emulator window). But terminals
(read terminal emulators) can deal with mixed single width and double
width characters (which is, IIUC, the motivation for the datafile
EastAsianWidth.txt). Likewise non-spacing combining characters should
be possible to deal reasonably with.

It is a lot more difficult to deal with BiDi in a terminal emulator,
also shaping may be hard to do, as well as reordering (or even
splitting) combining characters. All sorts of problems arise; feeding
the emulator a character (or "short" strings) at a time not allowed
to buffer for display (causing reshaping or movement of already
displayed characters, edit position movement even within a single
line, etc.). Even if solvable for a "GUI" text editor (not via a
terminal), they do not seem to be workable in a terminal (emulator)
setting. Esp. not if one also wants to support multiline editing
(vi/emacs/...) or even single-line editing.

As long as editing is limited to a single line (such as the system
line editor, or an "enhanced functionality" line editor (such as
that used for bash; moving in the history sets the edit position
at EOL) even variable width ("proportional) fonts should not pose
a major problem. But for multiline editors (à la vi/emacs) it would
not be possible to synch nicely (unless one accepts strange jums)
the visual edit position and the actual edit position in the edit
buffer: The program would not have access to the advance width data
from the font that the terminal emulator uses, unless one
revolutionise what terminal emulators do... (And I don't see a
case for doing that.) But both a terminal emulator and multiline
editing programs (for terminal emulators) still can have access
to EastAsianWidth data as well as which characters are non-spacing;
those are not font dependent. (There might be some glitches if
the Unicode versions used do not match (the terminal emulator
and the program being run are most often on different systems),
but only for characters where these properties have changed,
e.g. newly allocated non-spacing marks.)

/Kent K

PS
No, I have not done extensive testing of various terminal emulators
on how well the handle the stuff above.





More information about the Unicode mailing list