Proposing new arrow characters with Bidi_Mirrored=Yes

Eli Zaretskii eliz at gnu.org
Wed Apr 9 08:29:59 CDT 2025


> Date: Wed, 09 Apr 2025 13:00:40 +0000
> From: Nitai Sasson via Unicode <unicode at corp.unicode.org>
> 
> On Wednesday, 9 April 2025 at 14:45, Eli Zaretskii via Unicode unicode at corp.unicode.org wrote:
> 
>  The problem is this replacement is done (as far as I know) outside of any rendering context, when the
>  text is just a sequence of character codes. It's not reasonable to know which direction the text goes.
>  Sometimes it's completely impossible, if the text direction depends on context that isn't available at the
>  time of replacement.
> 
>  The above is strictly speaking inaccurate. Any serious text rendering nowadays requires a shaping
>  engine, such as HarfBuzz, and ligation of "->" into "→" would be done by such a shaper in cooperation
>  with a font that supports ligatures. The shaping engine is aware of the bidi context and the script of the
>  text it shapes, so it could in principle mirror the arrow.
> 
> I have seen this somewhere once, and it's pretty cool, but if you're a web developer and not a font designer,
> this isn't how you'd implement this. Instead you'd do this:
>   // Custom arrows
>   .replace(/(^|\s)-{1,2}>(\s|$)/gm, "\u0020\u2192\u0020")
>   .replace(/(^|\s)<-{1,2}(\s|$)/gm, "\u0020\u2190\u0020")
>   .replace(/(^|\s)<-{1,2}>(\s|$)/gm, "\u0020\u2194\u0020")
> https://github.com/discourse/discourse/blob/70998b73dbe29aa2ff7017d0f4c18a57d278e209/app/assets/javascripts/discourse-markdown-it/src/features/custom-typographer-replacements.js#L48-L51

Programs which make such replacements should do it right.  We cannot
possibly find a solution that will fix every bidi-related bug out
there, that's impossible.

Unicode is about plain text, and plain text is nowadays rendered
mostly by shaping engines.  Ligation is produced by these shapers, not
by programs which blindly replace characters with other characters.
We should IMO focus on the mainstream and not on the marginal, niche
use cases.

> I think you'll agree that within this scope, it's difficult or impossible to determine the direction of the text. I do
> not think this approach for arrow replacement is inherently wrong or invalid.

I don't agree it's impossible or difficult to determine the text
direction, no.  It's just that the replacement should consider the
context. not just the couple of characters being replaced.

>  The question is whether the cases where mirroring is the correct rendering by far outweigh the
>  non-mirroring cases. If yes, the shaper could mirror and expect text which wants to avoid that use
>  directional override controls. If not, mirroring will not make sense.
> 
> I'm afraid I'm not following this - are you talking about in the
> shaper, with ligatures?

Yes.

> There is no question
> about the desired correct output: the string "->" with ligature should appear as an arrow pointing in the same
> direction as the non-ligature "arrow" does. This means it should mirror if the greater-than sign is mirrored.

Yes, if this makes sense in the vast majority of cases.

> I have never looked into the fine details of how these things are implemented, and I can't find a font that
> explicitly supports both arrow ligatures and bidi text. But I did find Fira Code, an ASCII font that supports these
> ligatures and I was horrified by the results for "א -> ב":

Please report that to the font developers, and possibly also to the
developers of the shaping engine you used.

> https://fonts.google.com/specimen/Fira+Code?preview.text=A%20-%3E%20B,%20%D7%90%20-%3E%20%D7%91
> 
> This renders differently between Firefox and Chromium. Neither result is good. I've embedded screenshots
> here: https://meta.discourse.org/t/wrong-arrow-direction-in-rtl-text-contexts/360760/11
> Ligatures are not a viable solution, and even if they were, this would not make an explicit forward-arrow
> character (or character combination) redundant.

I think nowadays ligatures are _the_ viable solutions.  The sheer
number of editors and terminal emulators which provide ligatures by
default is large enough to consider this a de-facto standard.

> This is getting off-topic, but what could be the source of this difference? I see these possibilities:
> 
> * Both implementations have (different) bugs and don't implement the Unicode spec correctly.
> * Unicode specs are vague, and leave room for interpretation in this case
> * Unicode has precise specs about this, one of the implementations is accurate, and the Unicode spec did
>  not properly account for this case
> * This just isn't supposed to work if the font doesn't explicitly support bidi (I hope this is not the case)

I think all of the applications you tried use HarfBuzz as their
shaping library, so maybe they use it in different ways.


More information about the Unicode mailing list