Bidi paragraph direction in terminal emulators
Eli Zaretskii via Unicode
unicode at unicode.org
Thu Feb 7 08:14:40 CST 2019
> From: Egmont Koblinger <egmont at gmail.com>
> Date: Wed, 6 Feb 2019 22:01:59 +0100
> Cc: Richard Wordingham <richard.wordingham at ntlworld.com>, unicode at unicode.org
> - Emacs running in a terminal shows an underscore wherever there's a
> BiDi control in the source file – while the graphical one doesn't.
> This looks like a simple bug to me, right?
Not a bug, a feature. Emacs doesn't remove the bidi controls from
display (that's another deviation allowed by the UBA, see section
5.2). On GUI displays, these controls are displayed as thin 1-pixel
spaces, but on text-mode terminals they are shown as space. The
underscore you see is a special typeface used to indicate that this is
not really a space. (This is the default; Emacs being Emacs, it
allows to customize how these characters are displayed, and in
particular not to display them at all.)
> - Line 1007, the copyright line of this file uses visual indentation,
> and Emacs detects LTR paragraph for that line. I think it should
> rather use BiDi controls to have an overall RTL paragraph direction
> detected, and within that BiDi controls to force LTR for the text.
Why? As I said, the tutorial was written in part to demonstrate the
UBA implementation, including the dynamic detection of base paragraph
direction, and this is exactly one example of how it works in
> To recap: The _paragraph direction_ is determined in Emacs for
> emptyline-delimited segments of data, which I honestly find a great
> thing, and would love to do in terminals too, alas at this point it's
> blocked by some really nontrivial technical issues. But once you have
> decided on a direction, each _line_ within that data is passed
> separately to the BiDi algorithm to get reshuffled
Yes and no. You could keep your mental model if you like, but
actually the UBA explicitly says that each line is to be reordered for
display separately, see section 3.4 of UAX#9.
> Let's make a thought experiment. Let's assume that for running the
> BiDi algorithm, we'd still stick to the emptyline-delimited paragraph
> definition. This is not what you do, this is not what I do, but I
> misunderstood that this is what you did, and I also thought this was a
> good idea as a potential extension for the BiDi specs – I no longer
> think so. This definition is truly problematic, as I'll show below.
Which is why it is not what the UBA says one should do.
More information about the Unicode