Line wrapping of mixed LTR/RTL text

Philippe Verdy via Unicode unicode at unicode.org
Tue Aug 28 12:07:51 CDT 2018


The space encoded just before the logical end of line or linewrap (in the
middle of the displayed line) has to be moved at end of the physical line
(in the paragraph direction), it should not be kept in the middle.

If you need to force a linewrap on a non-breaking space (because there's no
other break opportunity to wrap the line elsewhere), then treat that
non-breaking space as a regular breaking space which will also be moved at
end of the row (after the margin on the ending side of the paragraph), and
choose the last non-breaking space on the row; usually, all spaces present
at linewraps (including non-breaking spaces) are compacted. But there are
other style policies that will force the linewrap preferably after a
trailing punctuation or a separator punctuation, or before a leading
punctuation, or just after the last unbreakable cluster that can fit the
row (including ion the middle of words at arbitrary position if there's no
hyphenation process or the script does not support hyphenation, such as
sinograms and kanas).

Where to insert linewraps is very fuzzy and depends on the rendering
context and capabilities of the target device (you cannot scroll a piece of
printed paper, but you can scroll a display with a scrollbar or using
navigation cursors in a width-restricted input field)

Le mar. 28 août 2018 à 16:34, Cosmin Apreutesei via Unicode <
unicode at unicode.org> a écrit :

> Hello everyone,
>
> I'm having a bit of trouble implementing line wrapping with bidi and I
> would like to ask for some advice or hints on what is the proper way
> to do this.
>
> UAX#9 section 3.4 says that bidi reordering should be done after line
> wrapping. But in order to do line wrapping correctly I need to be able
> to visually ignore some whitespace, and I'm not sure exactly which
> whitespace must be ignored.
>
> There is this sentence in UAX#9 which provides a clue: "[...] trailing
> whitespace will appear at the visual end of the line (in the paragraph
> direction).". I'm not sure what that means, but by doing some tests
> with fribidi and libunibreak I noticed that the whitespace always
> sticks to the logical end of the word (so visually to the right for
> LTR runs and to the left for RTL runs), regardless of the base
> paragraph direction. Is it safe to use this assumption and always
> remove the whitespace at the logical end of the last word of the line?
> Or is it more complicated than that?
>
> Quick example showing the problem. The following text:
>
> لمفاتيح ABC DEF
>
> with RTL base direction would wrap (for a certain line width) as:
>
> ABC  لمفاتيح
> DEF
>
> with two spaces between the Latin and Arabic text, one from the Latin
> text and one from the Arabic text. Since the line logically ends with
> the "C" and LTR direction, I should have to probably remove the space
> after the "C" (and, as a rule, just remove the whitespace at the
> logical end of the word, regardless of paragraph's direction or word's
> direction). Is this the right way to do it?
>
> Screenshots attached.
>
> Thanks!
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180828/e392b7cf/attachment.html>


More information about the Unicode mailing list