Bidi paragraph direction in terminal emulators
Egmont Koblinger via Unicode
unicode at unicode.org
Sat Feb 9 15:31:37 CST 2019
On Sat, Feb 9, 2019 at 10:02 PM Asmus Freytag (c) <asmusf at ix.netcom.com> wrote:
> are you excluding CJK because of the difficulty handling a large
> repertoire with mechanical means?
No, I excluded CJK because they're pretty well solved in terminals,
and nowhere near along the lines of how they work with typewriters.
I should've probably said "letter based" scripts or whatever, I'm not
familiar with the exact terminologies.
> To force Hindi crosswords mode you need to segment the string into syllables,
> each having a variable number of characters [...]
Thanks a lot to you too for your detailed explanation!
> Are you defining as your goal to have some kind of "line by line" display that
> can survive any Unicode text thrown at it, or are you trying to extend a given
> design with rather specific limitations, so that it survives / can be used with,
> just a few more scripts than European + CJK?
I don't have a clearly defined goal. I find fun in developing VTE (and
slightly improving other terminal emulators too by spreading ideas,
knowledge, comments etc.), addressing various kinds of goals, whatever
happens to come next. At this point it's BiDi, with a bit of
Devanagari improvement sneaking in the other day.
What is clear to me: I cannot redefine the basics of terminal
emulation. I can only add incremental improvements to whatever it
already is, and I have to make sure that the ecosystem built around it
during decades (all the screen handling libraries and applications)
doesn't break. I'm limited by these constraints.
> The discrepancies would be more like throwing random blank spaces in the
> middle of every word, writing letters out of order, or overprinting. So, more
> fundamental, not just "not perfect".
Let's take the Devanagari improvement of the other day. Until now,
there were plenty of dotted circles shown, and combining spacing marks
that should've been placed before the letter were placed after the
letter, before a placeholder dotted circle. Now they are displayed as
expected: the combininig spacing mark shows up before the letter (if
it's of that kind), and no dotted circle. The letter + spacing marks
now shows up correctly. The entire word still doesn't, e.g. there are
often spaces between letters where the upper line connecting them
should be continuous.
Eventually HarfBuzz could help, but it's just not yet clear how
exactly. I cannot essentially change the underlying model of fixed
width cells. On top of this model, though, we can experiment with
various ideas about displaying. For example, if a word occupies 7
columns in the model, then HarfBuzz renders it, and the rendered
version occupies the width of 8.6 columns, maybe we can squeeze it
using a trivial linear transformation? I'm not sure, but maybe it's an
idea worth investigating. Won't look perfect, but probably will look
better than what we do currently. We already have column spacing
implemented, to pull the columns further apart from each other by a
fixed amount (mostly for accessibility purposes), maybe a user can use
this feature to make more room for a nicely rendered, non-squeezed
> To give you an idea, here is an Arabi crossword. It uses the isolated shape of
> all letters and writes all words unconnected. That's two things that may be
> acceptable for a puzzle, but not for text output.
You can't get nice Arabic without first making sure the order of the
letters is the correct one, not reversed. :-) That's what my current
work is about.
As per Richard's feedback, I also see that shaping needs to be done
differently than I had thought. Mind you, my visual inspection of what
the non-preferred shaping approach gave to me vs. what a proper
HarfBuzz rendering gave (for Arabic) were extremely close to each
other, something that I'd probably consider "good enough" if I spoke
the language and were aware of the terminal's constraints. Well,
definitely a major improvement over what we have.
> You may begin to see the limitations and that they may well prevent you from
> reaching even your limited goal for speakers of at least three of the top ten languages
If the goal is to have perfect rendering without compromises: sure I
won't reach that. (It's not a goal for me. For perfect rendering,
users should get away from terminals.) If the goal is to have
something reasonably good, better than what we have currently, I can't
see why not.
More information about the Unicode