Proposing new arrow characters with Bidi_Mirrored=Yes

Mark E. Shoulson mark at kli.org
Tue Apr 8 13:48:32 CDT 2025


On 4/8/25 1:56 PM, NeatNit via Unicode wrote:

>> Users just type what gives them the correct appearance.
>> Even then, the problem with encoding duplicate characters based on layout properties is that "users just type what gives them the correct appearance" at the time they enter the character. The only context a user has is the text being typed. If that happens to give the correct direction, a user wouldn't know to shift to a different character, just in case the context might change.
>> wouldn't whoever enters the arrow just use the right^wcorrect one? Does text get converted from LTR to RTL? If so, isn't that part of the translator's responsibility?
> You guys are mostly right: in a context of users typing in text and manually choosing to insert an arrow, they would choose the arrow that looks correct, and it doesn't matter if they use a mirroring or non-mirroring arrow. This is not the issue I mean to solve.
>
>> The question then is "what software processes are unavoidable and known to interfere with this user choice" for arrows in a bidirectional context?
(The above quoted-quotes from Asmus)
>> The issue is with software that programmatically inserts arrows in text that comes from unpredictable sources. Developers usually never think of this case, causing the arrow to point in the wrong direction. Real world examples:
>>
>> https://github.com/deevroman/better-osm-org/issues/241 - solved by bidi-isolating both sides of the arrow, and programmatically selecting the correct arrow based on the layout direction
>> https://github.com/OSMCha/osmcha-frontend/issues/765 - solved by bidi-isolating both sides of the arrow, and relying on the fact that the interface is always LTR
>> https://meta.discourse.org/t/wrong-arrow-direction-in-rtl-text-contexts/360760 - which I've already mentioned, **no simple way to solve it** without mirroring arrows!
>>
>> Obviously I don't expect developers to suddenly know to switch to the mirroring arrows overnight, if they are added. But I would love to be able to tell them "all you have to do to fix it is replace this character with that one".
>>
Ah!  OK, now we're talking.  I see the use case.  I haven't read details 
on the software in question, but I take it the point is that you're 
presenting a route and there's a list of waypoints and it's presented as 
"And now go from point A → point B" and needs to be 
localized/internationalized.  This actually... sounds like a reasonable 
use?  I mean, it makes sense why this wouldn't be served by the current 
situation and why people would want something smarter.
>> If replacing "->" by an arrow character can change its direction, isn't it up to the autocorrect software to analyze the bidi context and select the correct arrow? The rule should be to select whatever substitution gives the same appearance (direction) as what the user would see for the string they typed.
> The problem is this replacement is done (as far as I know) outside of any rendering context, when the text is just a sequence of character codes. It's not reasonable to know which direction the text goes. Sometimes it's completely impossible, if the text direction depends on context that isn't available at the time of replacement.
This gets back to the problem that some arrows should be mirrored ("and 
then turn left (←)") and some should not.  That would require some 
user-smarts.
>> Here's a possibly disastrous idea: arrows mirror when they are within the domain of a Directional Override character (U+202D, U+202E).
> Let's say this was implemented... Would it help solve the issues linked above in some way?

(this quoted-quote is from me)

Now that I see your intended situation, I think what I was imagining 
would not, in fact, help you.  Just like there are 
directionality-isolates and embeddings, there are also directionality 
overrides so you can force ordinarily LTR text to be RTL or vice-versa, 
‮like this‬. (the last two words in the last sentence were typed and are 
encoded in the same order the letters would be in English, but probably 
show up reversed for you.)  And I was thinking that with a right-to-left 
override region, arrows would be reversed.  But that wouldn't help you 
here, except if you sorta joined the two halves of your expression by 
having them start and end an override region.  But that would be messy 
and defeat the purpose of having them in different spans and generally 
treating the two parts as independent pieces of information that are 
being joined by an arrow.

In retrospect, my original thought was a pretty stupid idea, since it 
essentially winds up assuming that the writer knows when the arrow 
should point this way or that... in which case they could have used the 
correct arrow in the first place!  The advantage of what you're 
proposing is that the decision should be handled by the BiDi/mirroring 
algorithm, the same algorithm that decides what direction your 
parentheses face.

>> A similar[ly bad] idea might be to have markup-type characters, something like <MIRRORED SELECTOR> or some such, to indicate that an attached character should be mirrored (or a pair of them that indicate direction).
> I actually love that idea! It would solve the issue for all arrows (and any other glyphs in ExtraMirroring.txt), while only introducing one or two new code point. Maybe also <NON MIRRORED SELECTOR> to disable mirroring even on character with Bidi_Mirroring=Yes.

And this would work better, if we take it to mean "the character this is 
attached to is _subject_ to mirroring."  But markup-type characters in 
Unicode are a grey area and those which exist are not widely loved 
either.  As Marcus Scherer writes:

> Encoding characters that look the same but behave differently is a bad 
> idea. We have tried this, for example with letter-behavior clones of 
> some of the typographic quotes (U+02BB, U+02BC). People use them 
> inconsistently, because they can't tell the difference while typing or 
> reading, and so we get problems with having to treat both equally in 
> some places, text search, spoofing, "why does it say I am using an 
> invalid character?", etc.
>
> Unicode also has some magic invisible control characters that were 
> supposed to change the behavior of affected characters in ways that 
> violated their identity. These control codes are Deprecated with 
> prejudice.

The directionality isolates and overrides and such are in this category 
of control characters, though I think not actually deprecated because 
they're needed(?) but still looked at a bit askance, and you don't want 
your kids playing with them...

And Marcus' point about "Encoding characters that look the same but 
behave differently is a bad idea" is an extremely good one, too.

>> I don't even want to know about handling this in TTB contexts...
> What is TTB? Couldn't quickly find it.

Top-To-Bottom.  Vertical text.  Just one more way for things to be confused.

~mark

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20250408/02c27d82/attachment-0001.htm>


More information about the Unicode mailing list