Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

Egmont Koblinger via Unicode unicode at unicode.org
Wed Feb 6 15:29:36 CST 2019


Hi Philippe,

Thanks a lot for your input!

Another fundamental difficulty with terminal emulators is: These
controls (CR, LF...) are control instructions that move the cursor in
some ways, and then are forgotten. You cannot do BiDi on the
instructions the terminal receives. You can only do BiDi on the
result, the contents of the canvas after these instructions are
executed. Here these controls are either lost, or you have to give a
specification how exactly they need to be remembered, i.e. converted
to being part of the canvas's data.

Let's also mention that trying to get apps into using them is quite
hopeless. The best you can do is design BiDi around what you already
have, which pretty much means hard vs. soft line endings, and
hopefully forthcoming semantical marks around shell prompts. (To
overcomplicate the story, a received LF doesn't convert the line
ending to hard wrapped in most terminal emulators. In some it does. I
don't think there's an exact specification anywhere. Maybe the BiDi
spec needs to create one. Lines are hard wrapped by default, turned to
soft wrapped when the text gets wrapped at the end of the line, and a
few random control functions turn them back to hard one, but in most
terminals, a newline is not such a control function.)

Anyway, please also see my previous email; I hope that clarifies a lot
for you, too.


cheers,
egmont

On Tue, Feb 5, 2019 at 5:53 PM Philippe Verdy via Unicode
<unicode at unicode.org> wrote:
>
> I think that before making any decision we must make some decision about what we mean by "newlines". There are in fact 3 different functions:
> - (1) soft line breaks (which are used to enforce a maximum display width between paragraph margins): these are equivalent to breakable and compressible whitespaces, and do not change the logical paragraph direction, they don't insert any additionnal vertical gap between lines, so the logicial line-height is preserved and continues uninterrupted. If text justification applies, this whitespace will be entirely collapsed into the end margin, and any text before it will stilol be justified to match the end margin (until the maximum expansion of other whitespaces in the middle is reached, and the maximum intercharacter gap is also reached (in which case, that line will not longer be expanded more), but this does not apply to terminal emulators that noramlly never use text justification, so the text will just be aligned to the start margin and whitespaces before it on the same line are preserved, and collapsed only at end of the line (just before the soft line break itself)
> - (2) hard line breaks: they break to a new line but continue the paragraph within its same logical direction, but they are not compressible whitespaces (and do not depend on the logical end margin of the paragraph.
> - (3) paragraph breaks: generally they introduce an addition vertical gap with top and bottom margins
>
> The problem in terminals is that they usually cannot distinguish types (1) and (2), they are simply encoded by a single CR, or LF, or CR+LF, or NEL. Type (1) is only existing within the framework of a higher level protocol which gives additional interpretation to these "newlines". The special control LS is almost never used but may be used for type (1) i.e. soft line-breaks, and will fallback to type (2) which is represented by the legacy "simple" newlines (single CR, or single LF, or single CR+LF, or single NEL). I have seen very little or no use of the LS (line separator) special control.
>
> Type (3) may be encoded with PS (paragraph separator), but in terminals (and common protocols line MIME) it is usually encoded using a couple of newline (CR+CR, or LF+LF, or CR+LF+CR+LF, or NL+NL) possibly with additional whitespaces (and additional presentation characters such as ">" in quotations inserted in mail responses) between them (needed for MIME and HTTP) which may be collapsed when rendering or interpreting them.
>
> Some terminal protocols can also use other legacy ASCII separators such as FS, GS, RS, US for grouping units containing multiple paragraphs, or STX/EOT pairs for encapsulating whole text documents in an protocol-specific enveloppe format (and will also use some escaping mechanism for special controls found in the middle, such as DLE+control to escape the control, or DLE+0 to escape a NUL, or DLE+# to escape a DEL, or DEL+x+NN where N are a fixed number of hexadecimal, decimal or octal digits. There's a wide variety of escaping mechanisms used by various higher-layer protocols (including transport protocols or encoding syntaxes used just below the plain-text layer, in a lower layer than the transport protocol layer).
>
> Le lun. 4 févr. 2019 à 21:46, Eli Zaretskii via Unicode <unicode at unicode.org> a écrit :
>>
>> > Date: Mon, 4 Feb 2019 19:45:13 +0000
>> > From: Richard Wordingham via Unicode <unicode at unicode.org>
>> >
>> > Yes.  If one has a text composed of LTR and RTL paragraphs, one has to
>> > choose how far apart their starting margins are.  I think that could
>> > get complicated for plain text if the terminal has unbounded width.
>>
>> But no real-life terminal does.  The width is always bounded.



More information about the Unicode mailing list