Proposing new arrow characters with Bidi_Mirrored=Yes

Eli Zaretskii eliz at gnu.org
Thu Apr 10 02:22:20 CDT 2025


> Date: Wed, 09 Apr 2025 22:02:53 +0000
> From: Nitai Sasson via Unicode <unicode at corp.unicode.org>
> 
> Okay, this tangent about ligatures is totally off-topic. There are other cases where arrows are used as operators or relations within text, so mirroring arrows are still needed even if they aren't the best solution for the specific issue of showing "->" as an arrow. Eli, we can continue in private emails if you want, I don't want to spam this thread with it.

I only mentioned ligatures because you brought up the use case of
replacing the likes of "->" with arrows, and that is nowadays done in
text-editing environments by using ligatures.

> Arrows are the only exception.

IME, the arrows are not the only exception.  It is possible to bump
into other similar cases in certain situations.

As one example, Emacs displays the backslash character '\' at the
right edge of a line that is too long to fit and is continued in the
next screen line.  When Emacs displays RTL paragraphs, with lines
starting at the right edge, such continued lines are indicated with
the forward slash character '/' at the left edge of the continued
line.  Thus, Emacs considers '/' to be the mirrored image of '\' in
this case.

The UAX#9 recognizes such eventualities:

  Some of the characters that do not have the Bidi_Mirrored property
  may be rendered with mirrored glyphs, according to a higher level
  protocol that adds mirroring: see Section 4.3, Higher-Level
  Protocols, especially HL6. Except in such cases, mirroring must be
  done according to rule L4, to ensure that the correct character is
  used to express the intended semantic, and to avoid interoperability
  and security problems.

  Implementing rule L4 calls for mirrored glyphs. These glyphs may not
  be exact graphical mirror images. For example, clearly an italic
  parenthesis is not an exact mirror image of another— “(” is not the
  mirror image of “)”. Instead, mirror glyphs are those acceptable as
  mirrors within the normal parameters of the font in which they are
  represented.

  In implementation, sometimes pairs of characters are acceptable
  mirrors for one another—for example, U+0028 “(” LEFT PARENTHESIS and
  U+0029 “)” RIGHT PARENTHESIS or U+22E0 “⋠” DOES NOT PRECEDE OR EQUAL
  and U+22E1 “⋡” DOES NOT SUCCEED OR EQUAL. Other characters such as
  U+2231 “∱” CLOCKWISE INTEGRAL do not have corresponding characters
  that can be used for acceptable mirrors. The informative
  BidiMirroring.txt data file [Data9], lists the paired characters
  with acceptable mirror glyphs. The formal property name for this
  data in the Unicode Character Database [UCD] is
  Bidi_Mirroring_Glyph. A comment in the file indicates where the
  pairs are “best fit”: they should be acceptable in rendering,
  although ideally the mirrored glyphs may have somewhat different
  shapes.

The file BidiMirroring.txt doesn't mention the arrow characters in the
last section; perhaps it should?

> The Proposal Idea
> =================
> 
> (with credit to Mark E. Shoulson)
> 
> Define a new combining character:
> 
> <BDM> Bi-Directional Mirror (working title)
> 
> Binds to the preceding character, and effectively gives it the property Bidi_Mirrored=Yes.
> Only has an effect on characters with Neutral directionality. Does nothing to characters with strong or weak LTR or RTL directionality.
> 
> Examples:
> Within RTL text:
> U+05D0 א HEBREW LETTER ALEF
> U+2192 → RIGHTWARDS ARROW
> <BDM>    Bi-Directional Mirror
> U+05D1 ב HEBREW LETTER BET
> 
> Renders as: א←ב
> (Without <BDM>: א→ב)
> Arrow direction is flipped because it's resolved RTL
> 
> Within LTR text:
> U+0041 A LATIN CAPITAL LETTER A
> U+2192 → RIGHTWARDS ARROW
> <BDM>    Bi-Directional Mirror
> U+0042 B LATIN CAPITAL LETTER B
> 
> Renders as: A→B
> (Without <BDM>: A→B)
> Arrow direction is maintained because it's resolved LTR

I don't understand how, under your proposal, would the implementation
know which character to display as the mirrored image of U+2192
RIGHTWARDS ARROW when in RTL context.  What did I miss?


More information about the Unicode mailing list