Proposal for BiDi in terminal emulators

Benjamin Riefenstahl via Unicode unicode at unicode.org
Sat Feb 2 13:58:06 CST 2019


Hi Egmont, hi all,


This is a interesting discussion here.  If only because I would have
thought that there is only minimal interest by the actual target
audience in supporting these scripts in a terminal, given the severe
limitations of that environment.  The most important limitation seems to
me that a monospaced font must be used, which does not suite most
scripts that do shaping.  On the script-level I am familiar with Arabic,
Syraic and Mandaic (I don't actually speak any of these languages, so if
you want a real expert, I am not that person).  Monospaced Arabic
struggles and is not very elegant.  I have not seen solutions for
monospaced Syriac or Mandaic but I have trouble to even to imagine them.

OTOH, that inelegance maybe can be an excuse (or a guide if you prefer)
to make the implementation simpler in other respects, because
expectations should be lower than for a graphical application.

Anyway, as a concrete addition to the discussion, I have a simple Arabic
shaping solution for Emacs on the terminal, especially on the Linux
console, and this discussion finally made me make it public on Gitlab,
see https://gitlab.com/cc_benny/termshape.  The Gitlab CD is activated,
so (mostly) ready-make Emacs packages can be downloaded as build
artifacts.  If anybody wants to discuss this implementation, we should
probably move that discussion somewhere else, like to the Emacs mailing
list (https://lists.gnu.org/mailman/listinfo/emacs-devel).

Some specific technical points from thinking about the problem on my
side:

Presentation forms: Termshape uses the Arabic presentation forms
available and so it is somewhat limited as mentioned by Eli.  Given that
we need to keep the implementation simple anyway, I am not sure that
significantly more is really needed, at least given what Emacs provides
already.  Additional character forms could be added, where the Unicode
repertoire is not sufficient.  This could use PUA characters or other
means like terminal control sequences.  In both cases a common
understanding would be needed between the terminal (or the font used by
it) and the application, outside of Unicode.

Ligatures: With most shaping one character is transformed into a
character form that still only occupies one cell.  A ligature like
lam-alif OTOH only occupies one cell for two characters, so for
justification etc. the application will have to know that the two
characters together have a width of 1 on the screen.  This is easier if
the applicaton does the selection of ligatures.  If you want to do this
in the terminal, the application would probably need to have some way to
measure the display width of a string, so that it can handle the
situation.  Be prepared though for the application to make quite a lot
of these requests.  For my own main use case for Emacs on a terminal,
display over SSH, that could become a problem.

Diacritics: The application can know what is a non-spacing character and
what is not.  So it can know that diacritics do not occupy their own
cell and it should be able to ignore whether the terminal supports a
specific diacritic or not.  If the terminal does not support a diacritic
the terminal can either just leave it out or the terminal can mess up
the display more of less irreparably.  In the first case, the worst is
that the user does not see the character, in the second case the
application cannot do anything about it with reasonable effort IMO.

A real problem is a combination of diacritics and ligatures.  Any
diacritic applies to only one character in the ligature, and between the
application and the terminal it is currently not possible to determine
which one.  This is one area where an implementation in the terminal
would clearly have the advantage.  But a terminal control sequence could
also help.  IMO we are talking about a luxury problem here, though.  Do
we want to set as our first goal showing complete quranic verses in all
their glory, or are we satisfied with everyday Arabic like say the
website of a modern Arabic newspaper?


Thanks for your effort and for starting this discussion,
benny


More information about the Unicode mailing list