Bidi paragraph direction in terminal emulators
Asmus Freytag (c) via Unicode
unicode at unicode.org
Sat Feb 9 15:02:55 CST 2019
On 2/9/2019 11:48 AM, Egmont Koblinger wrote:
> Hi Asmus,
>> On quick reading this appears to be a strong argument why such emulators will
>> never be able to be used for certain scripts. Effectively, the model described works
>> well with any scripts where characters are laid out (or can be laid out) in fixed
>> width cells that are linearly adjacent.
> I'm wondering if you happen to know:
> Are there any (non-CJK) scripts for which a mechanical typewriter does
> not exist due to the complexity of the script?
are you excluding CJK because of the difficulty handling a large
repertoire with mechanical means? However, see:
> Are there any (non-CJK) scripts for which crossword puzzles don't exist?
> For scripts where these do exist, is it perhaps an acceptable tradeoff
> to keep their limitations in the terminal emulator world as well, to
> combine the terminal emulator's power with these scripts?
I agree with you that crossword puzzles and scrabble have a similar
limitation to the design that you sketched for us. However, take a script
that is written in syllables (each composed of 1-5 characters, say).
In a "crossword" I could write this script so that each syllable occupies
a cell. It would be possible to read such a puzzle, but trying to use
such a draconian
technique for running text would be painful, to say the least. (We are
talking about pretty, here).
Here's an example for Hindi:
I don't read Hindi, but 5 vertical in the top puzzle, cell 2, looks like
both a consonant and a vowel.
To force Hindi crosswords mode you need to segment the string into
each having a variable number of characters, and then assign a single
position to them. Now some syllables are wider than others, so you could use
the single/double width paradigm. The result may be somewhat legible for
Devanagari, but even some of the closely related scripts may not fit
Now there are some scripts where the same syllable can be written in more
than one form; the forms differing by how the elements are fused (or
not fused) into a single shape. Sometimes, these differences are more
more like an 'fi' ligature in English, sometimes they really indicate
or one of the forms is simply not correct (like trying to spell lam-alif
in Arabic using
two separate letters).
I'm sure there are scripts that work rather poorly (effectively not at
all) in cross-
word mode. The question then becomes one of goals.
Are you defining as your goal to have some kind of "line by line"
can survive any Unicode text thrown at it, or are you trying to extend a
design with rather specific limitations, so that it survives / can be
just a few more scripts than European + CJK?
> Honestly, even with English, all I have to do is "cat some_text_file",
> and chances are that a word is split in half at some random place
> where it hits the right margin. Even with just English, a terminal
> emulator isn't something that gives me a grammatically and
> typographically super pleasing or correct environment. It gives me
> something that I personally find grammatically and typographically
> "good enough", and in the mean time a powerful tool to get my work
The discrepancies would be more like throwing random blank spaces in the
middle of every word, writing letters out of order, or overprinting. So,
fundamental, not just "not perfect".
To give you an idea, here is an Arabi crossword. It uses the isolated
all letters and writes all words unconnected. That's two things that may be
acceptable for a puzzle, but not for text output.
(try typing 3 vertical as a word to see the difference - it's 4x U+062A)
> Obviously the more complex the script, the more tradeoffs there will
> be. I think it's a call each user has to make whether they prefer a
> terminal emulator or a graphical app for a certain kind of task. And
> if terminal emulators have a lower usage rate in these scripts, that's
> not necessarily a problem. If we can improve by small incremental
> changes, sure, let's do. If we'd need to heavily redesign plenty of
> fundamentals in order to improve, it most likely won't happen.
You may begin to see the limitations and that they may well prevent you
reaching even your limited goal for speakers of at least three of the
top ten languages
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode