Proposing new arrow characters with Bidi_Mirrored=Yes
Nitai Sasson
unicode.org at sl.neatnit.net
Thu Apr 10 13:04:22 CDT 2025
-------- Original Message --------
On 10/04/2025 10:22, Eli Zaretskii via Unicode <unicode at corp.unicode.org> wrote:
> > Date: Wed, 09 Apr 2025 22:02:53 +0000
> > From: Nitai Sasson via Unicode <unicode at corp.unicode.org>
> >
> > Okay, this tangent about ligatures is totally off-topic. There are other cases where arrows are used as operators or relations within text, so mirroring arrows are still needed even if they aren't the best solution for the specific issue of showing "->" as an arrow. Eli, we can continue in private emails if you want, I don't want to spam this thread with it.
>
> I only mentioned ligatures because you brought up the use case of
> replacing the likes of "->" with arrows, and that is nowadays done in
> text-editing environments by using ligatures.
Yes, your response was valid and useful, but I felt it was leading us astray from the topic. I agree that ligatures are an appropriate solution in many circumstances. That said, I see the Discourse implementation as a markup feature. In that interpretation, when users type "-->" they literally mean an arrow character should be inserted. (Whether this is a good idea [disregarding the BiDi issue] is not important)
> > Arrows are the only exception.
>
> IME, the arrows are not the only exception. It is possible to bump
> into other similar cases in certain situations.
You're absolutely right. I was using "arrows" as a catch-all term for these cases, but this wasn't the best choice of words. The proposal would be applicable in all those cases.
> As one example, Emacs displays the backslash character '\' at the
> right edge of a line that is too long to fit and is continued in the
> next screen line. When Emacs displays RTL paragraphs, with lines
> starting at the right edge, such continued lines are indicated with
> the forward slash character '/' at the left edge of the continued
> line. Thus, Emacs considers '/' to be the mirrored image of '\' in
> this case.
I was not aware of this strange Emacs behavior. I'm not sure I approve of it, but this is also not relevant. The proposed solution would be applicable in that scenario too - but it would not deprecate higher-level solutions, such as used in Emacs.
> The UAX#9 recognizes such eventualities:
> [...]
I will have to read UAX#9 thoroughly and meticulously before making a proper proposal. Right now I'm just gauging reactions and requesting feedback for the idea. (I was going to propose dozens/hundreds of new arrow characters, but thanks to feedback we're down to *a single* control character!)
> The file BidiMirroring.txt doesn't mention the arrow characters in the
> last section; perhaps it should?
Yes. Of major relevance is this document, whose status I don't yet fully understand: https://www.unicode.org/L2/L2022/22026r-non-bidi-mirroring.pdf
It defines mirrors for non-mirroring characters (like arrows) in a new file called ExtraMirroring.txt. I don't see any reason why this data couldn't be appended to BidiMirroring.txt instead.
I don't see ExtraMirroring.txt in https://www.unicode.org/Public/UCD/latest/ucd/ so I take it this proposal(?) has not been accepted into Unicode at this time.
> > The Proposal Idea
> > =================
> >
> > [...]
>
> I don't understand how, under your proposal, would the implementation
> know which character to display as the mirrored image of U+2192
> RIGHTWARDS ARROW when in RTL context. What did I miss?
Kinda repeating the above, but for total clarity: this is not a complete proposal, just a simplified description to receive feedback before creating a more complete proposal. A full proposal would describe an answer to this again.
The ExtraMirroring.txt data linked above would be a major part of the answer. I think it might be reasonable to actually mirror the original glyph if a mirror is not defined - which I think is already the case for a few Bidi_Mirrored characters, but I am having trouble verifying it right now (not familiar enough with Unicode tools to filter characters effectively).
More information about the Unicode
mailing list