Proposal for BiDi in terminal emulators

Richard Wordingham via Unicode unicode at unicode.org
Sat Feb 2 15:49:40 CST 2019


On Sat, 02 Feb 2019 20:58:06 +0100
Benjamin Riefenstahl via Unicode <unicode at unicode.org> wrote:

> Hi Egmont, hi all,
> 
> 
> This is a interesting discussion here.  If only because I would have
> thought that there is only minimal interest by the actual target
> audience in supporting these scripts in a terminal, given the severe
> limitations of that environment.

Eli will probably tell me I'm behind the times, but there are a few
places where a Gnome-terminal is better than an Emacs GUI window.  One
is colour highlighting of text found by grep.  Another is that screen
overwriting doesn't work in an Emacs window.

My main interest in this, though, is in improving the general run of
Indic terminal cell editors.  If we can get Gnome-terminal working for
Kharoshthi, things should improve for LTR Indic.  Even working on the
false assumption that Indic scripts are like Devanagari would be an
improvement, despite my comments about Khmer.

> Presentation forms: Termshape uses the Arabic presentation forms
> available and so it is somewhat limited as mentioned by Eli.  Given
> that we need to keep the implementation simple anyway, I am not sure
> that significantly more is really needed, at least given what Emacs
> provides already.  Additional character forms could be added, where
> the Unicode repertoire is not sufficient.  This could use PUA
> characters or other means like terminal control sequences.  In both
> cases a common understanding would be needed between the terminal (or
> the font used by it) and the application, outside of Unicode.

You do not need PUA. For U+0756 ARABIC LETTER BEH WITH SMALL V, we
can form:

Initial form:   200C 0756 200D
Medial form:    200D 0756 200D
Final form:     200D 0756 200C
Isolated form:  200C 0756 200C

The tricky bit is to get the terminal to accept them as cell contents.

> A real problem is a combination of diacritics and ligatures.  Any
> diacritic applies to only one character in the ligature, and between
> the application and the terminal it is currently not possible to
> determine which one.  This is one area where an implementation in the
> terminal would clearly have the advantage.  But a terminal control
> sequence could also help.  IMO we are talking about a luxury problem
> here, though.  Do we want to set as our first goal showing complete
> quranic verses in all their glory, or are we satisfied with everyday
> Arabic like say the website of a modern Arabic newspaper?

Just get Kharoshthi working :-)

Some of the Arabic 'mark-up' characters might be tricky. 

Richard.


More information about the Unicode mailing list