Proposal for BiDi in terminal emulators
Egmont Koblinger via Unicode
unicode at unicode.org
Sat Feb 2 16:02:10 CST 2019
On Sat, Feb 2, 2019 at 9:57 PM Richard Wordingham
<richard.wordingham at ntlworld.com> wrote:
> Seriously, you need to give a definition of 'visual order' for this
> context. Not everyone shares your chiralist view.
When I look at the Unicode BiDi algorithm, or go to an online demo at
https://unicode.org/cldr/utility/bidic.jsp, or look at the FriBidi API
etc., their very basic functionality is that I pass the logical order
(as the string is expected to be stored in text files etc.), and the
result of the algorithm is the visual order.
On top of this, I make the clarification that combining marks need to
be reordered to be sent out to the terminal emulator _after_ their
base letter, because that's how terminal emulators work. The BiDi
problem area can only be reasonably addressed in the display layer, by
leaving the emulation layer pretty much unchanged. I find it
unreasonable to introduce a new mode where the combining accents are
sent to the terminal emulator _before_ their base letter. (On an
offtopic note, I wish that was the only mode in Unicode, it would
simplify a couple of things in the handling of streams. But this ship
has sailed decades ago.)
This reordering for the combining accents to come after (that is: to
the right) of the base letter in the LTR visual order is what e.g.
FriBidi does by default, due to the REORDER_NSM flag being set by
Essentially, the "explicit mode" that my specification introduces is
the exact same behavior that most terminal emulators do now, and the
one that e.g. Emacs requires. They lay out the codepoints they
receive, from left to right. Nothing is going to change there. What I
add is another mode (the technically less problematic "implicit" mode
where the terminal displays the contents just as any BiDi-aware
graphical text editor, browser etc. would do) for the sake of
"cat"-like simple utilities, while being unsuitable for Emacs and
friends. My work also specifies how/when exactly to toggle back and
forth between these two modes.
What else do I need to further specify in the concept of "visual order"?
> A visible U+17D2 has no rôle in the Khmer writing system. On
> computers, it is a warning that the input of a subscript consonant is
> only half done. There are three units of the writing system in that
> word - KHMER LETTER PO, KHMER CONSONANT SIGN COENG RO*, and KHMER SIGN
> [and I could quote a whole lot more]
Richard, you are obviously magnitudes more savvy in shaping and stuff
than me, and I can't quickly pick up your knowledge to properly answer
to all the issues you mentioned.
What you probably still haven't realized is that I aimed to address a
much lower level issue than the ones you keep bringing up. Currently,
no matter what terminal emulator you pick, as soon as you start doing
BiDi (vim, emacs, cat, echo...), you end up with words being written
backwards. I mean, maybe they show up correctly with emacs, but they
show up incorrectly with vim and cat. Then you switch to a different
emulator, or toggle a setting, and suddenly vim and cat will be okay,
and emacs won't. This is bad.
This is the low level issue I'm trying to address, to make sure that
letters of words are always shown in the correct order. There's no way
you could do shaping underneath this level, it makes no sense to talk
about shaping, zero-width (non)joining, special Khmer symbols and
whatnot on reversed words, right? The order of the letters need to be
fixed first, which is what I'm doing, and then all the bells and
whistles needed for shaping might come on top of this.
Right now I'm doing this BiDi work all voluntarily. As much as I'd
love to solve all the problems of the world, I don't have capacity for
that. As for shaping, chances are that I'm not going to get there,
unless someone offers a decent paid job :P. What I'm looking for right
now is feedback on whether the low-level BiDi work makes sense, and
whether it really creates proper grounds for building shaping etc. on
top of it one day.
Hope this clarifies a lot. And again, thanks for all your precious
input, but we've heavily diverged from the scope of my work.
More information about the Unicode