Proposing new arrow characters with Bidi_Mirrored=Yes
Eli Zaretskii
eliz at gnu.org
Thu Apr 10 02:22:20 CDT 2025
> Date: Wed, 09 Apr 2025 22:02:53 +0000
> From: Nitai Sasson via Unicode <unicode at corp.unicode.org>
>
> Okay, this tangent about ligatures is totally off-topic. There are other cases where arrows are used as operators or relations within text, so mirroring arrows are still needed even if they aren't the best solution for the specific issue of showing "->" as an arrow. Eli, we can continue in private emails if you want, I don't want to spam this thread with it.
I only mentioned ligatures because you brought up the use case of
replacing the likes of "->" with arrows, and that is nowadays done in
text-editing environments by using ligatures.
> Arrows are the only exception.
IME, the arrows are not the only exception. It is possible to bump
into other similar cases in certain situations.
As one example, Emacs displays the backslash character '\' at the
right edge of a line that is too long to fit and is continued in the
next screen line. When Emacs displays RTL paragraphs, with lines
starting at the right edge, such continued lines are indicated with
the forward slash character '/' at the left edge of the continued
line. Thus, Emacs considers '/' to be the mirrored image of '\' in
this case.
The UAX#9 recognizes such eventualities:
Some of the characters that do not have the Bidi_Mirrored property
may be rendered with mirrored glyphs, according to a higher level
protocol that adds mirroring: see Section 4.3, Higher-Level
Protocols, especially HL6. Except in such cases, mirroring must be
done according to rule L4, to ensure that the correct character is
used to express the intended semantic, and to avoid interoperability
and security problems.
Implementing rule L4 calls for mirrored glyphs. These glyphs may not
be exact graphical mirror images. For example, clearly an italic
parenthesis is not an exact mirror image of another— “(” is not the
mirror image of “)”. Instead, mirror glyphs are those acceptable as
mirrors within the normal parameters of the font in which they are
represented.
In implementation, sometimes pairs of characters are acceptable
mirrors for one another—for example, U+0028 “(” LEFT PARENTHESIS and
U+0029 “)” RIGHT PARENTHESIS or U+22E0 “⋠” DOES NOT PRECEDE OR EQUAL
and U+22E1 “⋡” DOES NOT SUCCEED OR EQUAL. Other characters such as
U+2231 “∱” CLOCKWISE INTEGRAL do not have corresponding characters
that can be used for acceptable mirrors. The informative
BidiMirroring.txt data file [Data9], lists the paired characters
with acceptable mirror glyphs. The formal property name for this
data in the Unicode Character Database [UCD] is
Bidi_Mirroring_Glyph. A comment in the file indicates where the
pairs are “best fit”: they should be acceptable in rendering,
although ideally the mirrored glyphs may have somewhat different
shapes.
The file BidiMirroring.txt doesn't mention the arrow characters in the
last section; perhaps it should?
> The Proposal Idea
> =================
>
> (with credit to Mark E. Shoulson)
>
> Define a new combining character:
>
> <BDM> Bi-Directional Mirror (working title)
>
> Binds to the preceding character, and effectively gives it the property Bidi_Mirrored=Yes.
> Only has an effect on characters with Neutral directionality. Does nothing to characters with strong or weak LTR or RTL directionality.
>
> Examples:
> Within RTL text:
> U+05D0 א HEBREW LETTER ALEF
> U+2192 → RIGHTWARDS ARROW
> <BDM> Bi-Directional Mirror
> U+05D1 ב HEBREW LETTER BET
>
> Renders as: א←ב
> (Without <BDM>: א→ב)
> Arrow direction is flipped because it's resolved RTL
>
> Within LTR text:
> U+0041 A LATIN CAPITAL LETTER A
> U+2192 → RIGHTWARDS ARROW
> <BDM> Bi-Directional Mirror
> U+0042 B LATIN CAPITAL LETTER B
>
> Renders as: A→B
> (Without <BDM>: A→B)
> Arrow direction is maintained because it's resolved LTR
I don't understand how, under your proposal, would the implementation
know which character to display as the mirrored image of U+2192
RIGHTWARDS ARROW when in RTL context. What did I miss?
More information about the Unicode
mailing list