Bidi paragraph direction in terminal emulators BiDi in terminal emulators)

Eli Zaretskii via Unicode unicode at unicode.org
Tue Feb 5 10:05:07 CST 2019


> From: Egmont Koblinger <egmont at gmail.com>
> Date: Tue, 5 Feb 2019 00:08:10 +0100
> Cc: unicode at unicode.org
> 
> every single newline character starts a new paragraph. The result of
> printf "Hello\nWorld\n" > world.txt
> is a text file consisting of two paragraphs, with 5 characters in each. Correct?

Yes.

> > Actually, Emacs implements the rule that paragraphs are separated by
> > empty lines. This is documented in the Emacs manuals.
> 
> That is, Emacs overrides UAX#9 and comes up with a different
> definition?

Yes, Emacs uses the "higher-level protocols" clause in HL1, when the
paragraph direction is to be determined from the text.  (There's also
a way for the user or a Lisp program to force a certain base paragraph
direction on all paragraphs in a window that displays some text.)

> Furthermore, you argue that in terminals I should follow
> Emacs's definition rather than Unicode's?

IME, what Emacs uses gives much better results, yes.

> I believe I understand your concerns with the per-line paragraph
> definition, but this interpretation that I've just shown most likely
> leads to even more broken behavior.

I don't see how the result could be more broken, when the decisions
about base paragraph direction are made much more rarely.  The places
in text where the paragraph direction will be determined under my
proposal is a small subset of the places where it will be determined
by the default UBA rules.  So it will make the same mistakes as the
each-line-is-a-new-paragraph method, but there will be much fewer of
such mistakes.

In addition to this theoretical argument, I have 10 years of using
this in Emacs to back me up.  The only difference between Emacs and
your example is the very first paragraph.

> It's a really nontrivial technical problem to let the terminal
> emulator know where each prompt, and/or each command's output begins
> and ends. There's work going on for letting the terminal emulator
> recognize the prompts, but even if it's successful, it'll probably
> take 5-10 years to reach the majority of the users. And it probably
> still wouldn't solve the case of knowing the boundary between the two
> outputs if a "cat file1.txt; cat file2.txt" is executed, let alone if
> they're concatenated with "cat file1.txt file2.txt".

I think you are trying to find a perfect solution, and because it
probably doesn't exist, or at least is hard to come by, you conclude
that a solution that is imperfect should be rejected.

But I'm not saying my proposal is the perfect solution, just that it
is better (sometimes, way better) than the default of considering each
line a paragraph.

> So, what you're arguing for, is that the default behavior should be
> something that's:
> - currently not implementable in a semantically correct way (to stop
> around shell prompts) due to technical limitations, and
> - isn't what Unicode says.

The first point has to do with the search for a perfect solution.  My
advice is to settle for something reasonable even if it is not
perfect.

The second point is incorrect: the UBA explicitly allows the
implementation to apply higher-level protocols for paragraph
direction, see HL1 in UAX#9.

> You have not convinced me that the pros outweigh the cons.

There are no cons in my proposal that aren't already present in the
default each-line-is-a-new-paragraph rule.  So even if the pros don't
outweigh the cons, the balance should be better than under the default.

> That being said, I'm more than open to see such a behavior as a
> future extension, subject of course to the semantic prompt stuff
> being available.

I think the default should provide reasonably good display, and
each-line-is-a-new-paragraph doesn't.


More information about the Unicode mailing list