Emacs' implementation of the bidirectional algorithm
Itai Berli via Unicode
unicode at unicode.org
Sat Jul 1 08:36:52 CDT 2017
Emacs claims to fully conform to the Unicode Bidirectional Algorithm
8.0.0 (see sections 22.19 'Bidirectional Editing' and 37.26
'Bidirectional Display' of the Emacs manual), yet I have noticed some
behavior that makes me question this claim.
I'll appreciate the opinion of others, this way or the other.
For each of the following three situation, I wish to know: Is Emacs'
behavior consistent with the UBA? If it does, I'd like to know whether
you find this behavior in line with the 'spirit' of the UBA, and with
1. Paragraph boundaries. According to the Emacs manual (section 22.19)
"Paragraph boundaries are empty lines, i.e., lines consisting entirely
of whitespace characters." The following screenshot shows this
behavior in action: http://imgur.com/3eyrUfA
2. Visualization of explicit bidi characters. According to the Emacs
manual (section 22.19: "In a GUI session, the lrm and rlm characters
display as very thin blank characters; on text terminals they display
as blanks." The following screenshot shows this behavior in action.
There are three bidi marks (LRI,PDI,LRM) between the two left-most
3. Line wrapping. The following screenshot shows the line-breaking
algorithm in action. The paragraph starts with two Hebrew words
followed by the beginning of Abraham Lincoln's Gettysburg Address. The
English text flows from the bottom to the top.
Possible reasons why these behaviors are reasonable and consistent
with the standard.
1. Paragraph boundaries. The UBA allows applications to employ
higher-level protocols when deciding on base paragraph direction. See
section 4.3 and specifically clause HL1 there.
2. Visualization of explicit bidi characters. (a) The UBA also allows
to display the bidi characters. See section 5.2. (b) This is just the
default; it can be customized like every other character's glyph.
3. Line wrapping. The remedy is simple: break long lines into shorter
ones by inserting newlines.
More information about the Unicode