Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

Philippe Verdy via Unicode unicode at unicode.org
Tue Feb 5 10:51:51 CST 2019


I think that before making any decision we must make some decision about
what we mean by "newlines". There are in fact 3 different functions:
- (1) soft line breaks (which are used to enforce a maximum display width
between paragraph margins): these are equivalent to breakable and
compressible whitespaces, and do not change the logical paragraph
direction, they don't insert any additionnal vertical gap between lines, so
the logicial line-height is preserved and continues uninterrupted. If text
justification applies, this whitespace will be entirely collapsed into the
end margin, and any text before it will stilol be justified to match the
end margin (until the maximum expansion of other whitespaces in the middle
is reached, and the maximum intercharacter gap is also reached (in which
case, that line will not longer be expanded more), but this does not apply
to terminal emulators that noramlly never use text justification, so the
text will just be aligned to the start margin and whitespaces before it on
the same line are preserved, and collapsed only at end of the line (just
before the soft line break itself)
- (2) hard line breaks: they break to a new line but continue the paragraph
within its same logical direction, but they are not compressible
whitespaces (and do not depend on the logical end margin of the paragraph.
- (3) paragraph breaks: generally they introduce an addition vertical gap
with top and bottom margins

The problem in terminals is that they usually cannot distinguish types (1)
and (2), they are simply encoded by a single CR, or LF, or CR+LF, or NEL.
Type (1) is only existing within the framework of a higher level protocol
which gives additional interpretation to these "newlines". The special
control LS is almost never used but may be used for type (1) i.e. soft
line-breaks, and will fallback to type (2) which is represented by the
legacy "simple" newlines (single CR, or single LF, or single CR+LF, or
single NEL). I have seen very little or no use of the LS (line separator)
special control.

Type (3) may be encoded with PS (paragraph separator), but in terminals
(and common protocols line MIME) it is usually encoded using a couple of
newline (CR+CR, or LF+LF, or CR+LF+CR+LF, or NL+NL) possibly with
additional whitespaces (and additional presentation characters such as ">"
in quotations inserted in mail responses) between them (needed for MIME and
HTTP) which may be collapsed when rendering or interpreting them.

Some terminal protocols can also use other legacy ASCII separators such as
FS, GS, RS, US for grouping units containing multiple paragraphs, or
STX/EOT pairs for encapsulating whole text documents in an
protocol-specific enveloppe format (and will also use some escaping
mechanism for special controls found in the middle, such as DLE+control to
escape the control, or DLE+0 to escape a NUL, or DLE+# to escape a DEL, or
DEL+x+NN where N are a fixed number of hexadecimal, decimal or octal
digits. There's a wide variety of escaping mechanisms used by various
higher-layer protocols (including transport protocols or encoding syntaxes
used just below the plain-text layer, in a lower layer than the transport
protocol layer).

Le lun. 4 févr. 2019 à 21:46, Eli Zaretskii via Unicode <unicode at unicode.org>
a écrit :

> > Date: Mon, 4 Feb 2019 19:45:13 +0000
> > From: Richard Wordingham via Unicode <unicode at unicode.org>
> >
> > Yes.  If one has a text composed of LTR and RTL paragraphs, one has to
> > choose how far apart their starting margins are.  I think that could
> > get complicated for plain text if the terminal has unbounded width.
>
> But no real-life terminal does.  The width is always bounded.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20190205/f3742b8d/attachment.html>


More information about the Unicode mailing list