Emacs' implementation of the bidirectional algorithm
Eli Zaretskii via Unicode
unicode at unicode.org
Sat Jul 1 11:39:47 CDT 2017
> Date: Sat, 1 Jul 2017 16:36:52 +0300
> From: Itai Berli via Unicode <unicode at unicode.org>
> Emacs claims to fully conform to the Unicode Bidirectional Algorithm
> 8.0.0 (see sections 22.19 'Bidirectional Editing' and 37.26
> 'Bidirectional Display' of the Emacs manual), yet I have noticed some
> behavior that makes me question this claim.
> I'll appreciate the opinion of others, this way or the other.
> For each of the following three situation, I wish to know: Is Emacs'
> behavior consistent with the UBA? If it does, I'd like to know whether
> you find this behavior in line with the 'spirit' of the UBA, and with
> common sense.
> 1. Paragraph boundaries. According to the Emacs manual (section 22.19)
> "Paragraph boundaries are empty lines, i.e., lines consisting entirely
> of whitespace characters." The following screenshot shows this
> behavior in action: http://imgur.com/3eyrUfA
> 2. Visualization of explicit bidi characters. According to the Emacs
> manual (section 22.19: "In a GUI session, the lrm and rlm characters
> display as very thin blank characters; on text terminals they display
> as blanks." The following screenshot shows this behavior in action.
> There are three bidi marks (LRI,PDI,LRM) between the two left-most
> x's. http://imgur.com/VD3Lvsn
> 3. Line wrapping. The following screenshot shows the line-breaking
> algorithm in action. The paragraph starts with two Hebrew words
> followed by the beginning of Abraham Lincoln's Gettysburg Address. The
> English text flows from the bottom to the top.
Item 3 doesn't conform to what section 3.4 of the UBA says. the
reasons are that this requirement would need the Emacs display engine
to be redesigned.
The other items don't violate the UBA, IMO. They follow the
high-level protocols clause in HL1, and section 5.2 which describes
the optional retaining of directional control characters in the buffer
and on display.
More information about the Unicode