Bidi paragraph direction in terminal emulators

Egmont Koblinger via Unicode unicode at unicode.org
Thu Feb 7 11:12:37 CST 2019


On Thu, Feb 7, 2019 at 3:14 PM Eli Zaretskii <eliz at gnu.org> wrote:

> Not a bug, a feature.  Emacs doesn't remove the bidi controls from
> display (that's another deviation allowed by the UBA, see section
> 5.2).  On GUI displays, these controls are displayed as thin 1-pixel
> spaces, but on text-mode terminals they are shown as space.

Thanks for the clarification!

> Why?  As I said, the tutorial was written in part to demonstrate the
> UBA implementation, including the dynamic detection of base paragraph
> direction, and this is exactly one example of how it works in
> practice.

Fair enough, then.

> > To recap: The _paragraph direction_ is determined in Emacs for
> > emptyline-delimited segments of data, which I honestly find a great
> > thing, and would love to do in terminals too, alas at this point it's
> > blocked by some really nontrivial technical issues. But once you have
> > decided on a direction, each _line_ within that data is passed
> > separately to the BiDi algorithm to get reshuffled
>
> Yes and no.  You could keep your mental model if you like, but
> actually the UBA explicitly says that each line is to be reordered for
> display separately, see section 3.4 of UAX#9.

The very first step of the BiDi algorithm is to split at "paragraphs",
however that's defined, and then do the rest for each paragraph.

For one particular paragraph, there's a lot going on: determining
embedded levels and such. At one point, at the very beginning of 3.4,
a caller may split a paragraph into lines. Then the rest (actual
reordering) happens on lines.

This is _not_ the same as splitting into lines upfront (that is,
define UBA's "paragraphs" as the text file's "lines"), and then
determining embedded levels and reshuffling on these smaller units.

Emacs does the latter, and so does my specification.

I believe it's not my mental model that's weird, but your use of
terminology that doesn't match UBA's that confused me. It's pretty
confusing and obviously hard to use the proper terminology, since
Emacs's definition and the user-perceived notion of a "paragraph"
differs from what becomes a "paragraph" according to UBA's definition.

Both in Emacs and in my spec, a "line" of the text file maps to a
"paragraph" according to UBA's phrasing. (Except when determining the
paragraph direction, where Emacs uses its own human-perceived
emptyline-separated paragraph, rather than lines. Which is a nice
thing to do.)

Anyways, I'm glad it turned out we're on the same page, it's just the
terminology that's truly confusing.


cheers,
egmont


More information about the Unicode mailing list