Bidi paragraph direction in terminal emulators BiDi in terminal emulators)
Richard Wordingham via Unicode
unicode at unicode.org
Mon Feb 4 18:05:47 CST 2019
On Tue, 5 Feb 2019 00:08:10 +0100
Egmont Koblinger via Unicode <unicode at unicode.org> wrote:
> Hi Eli,
>
> > Actually, UAX#9 defines "paragraph" as the chunk of text delimited
> > by paragraph separator characters. This means characters whose bidi
> > category is B, which includes Newline, the CR-LF pair on Windows,
> > U+0085 NEL, and U+2029 PARAGRAPH SEPARATOR.
It actually gives two different definitions. Table UAX#9 4 restricts
the type B to *appropriate newline functions; not all newlines are
paragraph separators.
> Indeed, this was an oversight on my side. So, with this definition,
> every single newline character starts a new paragraph. The result of
> printf "Hello\nWorld\n" > world.txt
> is a text file consisting of two paragraphs, with 5 characters in
> each. Correct?
No, it depends on when a newline function is 'appropriate'. TUS 5.8
Rule R2b applies - 'In simple text editors, interpret any NLF the same
as LS'.
> > Actually, Emacs implements the rule that paragraphs are separated by
> > empty lines. This is documented in the Emacs manuals.
>
> That is, Emacs overrides UAX#9 and comes up with a different
> definition? Furthermore, you argue that in terminals I should follow
> Emacs's definition rather than Unicode's? Or please clarify if I
> misunderstood you here.
He's deriving 'B' from a protocol.
Richard.
More information about the Unicode
mailing list