BidiMirrored property and ancient scripts

Richard Wordingham richard.wordingham at ntlworld.com
Sat Jul 25 12:27:26 CDT 2015


On Sat, 25 Jul 2015 17:26:14 +0300
Eli Zaretskii <eliz at gnu.org> wrote:

> > Date: Sat, 25 Jul 2015 14:36:51 +0100
> > From: Richard Wordingham <richard.wordingham at ntlworld.com>
> > 
> > > > The issue lies with the wording of condition (1).  One might
> > > > expect it to apply only to characters with a bidirectional type
> > > > of L.
> > 
> > > I see no reason to restrict this to L characters.  I'd be
> > > interested to hear your rationale for that.
> > 
> > A) A strong character's form in the corresponding directional
> > context is the form identified by the Unicode charts.  If it is of
> > type AL or R, it will , by definition, not be mirrored.
> > 
> > B) A weak or neutral character's form in the charts is the form that
> > occurs in the left-to-right direction.  Such a character has
> > Bidi-mirrored set to Yes if it has different forms for
> > left-to-right and right-to-left.  By rule L4, it will be mirrored
> > if it receives a resolved direction of R.
> > 
> > C) A character of type L may need to be mirrored if it receives a
> > resolved directionality of R.  The most notable example is Egyptian
> > hieroglyphs, but the same applies to Greek.
> 
> Mirroring is not changing a character's shape.  It is a replacement of
> a character's glyph with a glyph of a different character.

Mirroring is changing a glyph to suitable for reading in the other
direction.  Note the following extract from BidiMirroring.txt in the
Unicode Character Database:

<quote>
# The following characters have no appropriate mirroring character.
# For these characters it is up to the rendering system 
#   to provide mirrored glyphs.

# 2140; DOUBLE-STRUCK N-ARY SUMMATION
# 2201; COMPLEMENT
# 2202; PARTIAL DIFFERENTIAL
<snip/>
</quote>

> Thus, your reasons make no sense to me, because a character's shape,
> any character's shape, be it L, R, AL, or anything else, is immutable.

So go back and reread.
 
> > There is a definite hole in my argument for non-spacing marks; marks
> > used primarily in the Arabic script are shown in a form they take
> > in a right-to-left context.
 
> I don't think it's a hole.  I think your interpretation of this is
> entirely wrong.

 
> > > > My surmise is that it attempts to address text whose
> > > > directionality is not known before rendering.
> > > 
> > > Indeed, UBA mirroring is only relevant to neutral characters.
> > 
> > Then how do you explain condition (2):
> > 
> > "Characters with a resolved directionality of L and whose
> > bidirectional type is R or AL"
> 
> I never saw an example of it.  Can you show something like that?

Frédéric gave the example of Old North Arabian - there are samples at
http://www.mnh.si.edu/epigraphy/e_pre-islamic/safaitic.htm


> Note that those conditions are "at least one of", so they are not all
> required to be true at the same time.

Obviously, since a character cannot simultaneously have both resolved
directions.

> > Obviously these characters are not neutral characters.  The only way
> > they can acquire a resolved directionality of R is by application of
> > RLO.
> 
> You mean, resolved directionality of L and LRO, right?

Sorry, you're correct.

> Anyway, let's talk about a concrete example of applying this rule,
> shall we?  I'm guessing this is for some very specific characters in a
> script I never used.

I rather suspect it's for all current characters in a script you never
used.  Given half a chance, a script with weak directionality will be
encoded with Bidi-class L letters.  Old North Arabian has squeezed in
as a right-to-left script.

> > > I don't think so.  I agree with those who maintain that
> > > boustrophedon is unidirectional text, and so out of scope for the
> > > UBA.
> > 
> > There are three main parts to the UBA:
> > 
> > 1) Interpreting the text as nested runs of text in the same order.
> 
> I take it that by this you mean resolving the level of each
> character.  To me, that is the main part of the UBA; all the rest is
> almost trivial.

The nesting is implied by the levels, but the levels are just a means
to store the nesting and an elegant way of storing the direction.
There is a distressing tendency of Unicode algorithms to just record
the algorithm, rather than to explain what is being done.  Perfectly
intelligible steps can end up looking like an arcane dance.

> > 2) Sorting out the left-to-right order in which to write them (L2)
> > 
> > 3) Sorting out mirroring (L4)
> > 
> > Interpreting LRO and RLO is part of (1).  I'd like to know what the
> > justification for have directionality overrides is.
> 
> One justification is when you want to present characters in some
> particular order that overrides their innate bidirectional properties.
> For example, imagine you want to tell your readers what will some
> bidirectional text look like after reordering by the UBA, and you want
> to do that without relying on the UBA implementation of whatever
> software is used to view your presentation.

Brute force layout!  That makes it seem that overriding strong types
was an error that leaves people hoping for support for switching text
direction.

> > Where we may part company is in our view of Hebrew text (no Arabic
> > numbers) with parentheses in a right-to-left paragraph.  I think
> > such text is really just as unidirectional as equivalent Latin text
> > in a left-to-right paragraph.
> 
> No, not as soon as numbers or Latin characters are involved, IMO.

My example, which your e-mail client may take as being in a
left-to-right paragraph, is:

כרטיס אשראי / דביט (לא אמריקן אקספרס ולא דיינרס)

> > However, one needs the UBA to sort out the rendering of the
> > parentheses in the Hebrew text.

> Not really, you can short-cut it, the same as in strictly
> left-to-right text.

It's the UBA that mandates that the opening and closing parentheses be
rendered like right and left parentheses respectively rather than like
left and right parentheses.  I think it may be compatible with the 
character identity for the U+0028 glyph to be marked with a tiny 'o'
regardless of whether it broadly looks like a left or a right
parenthesis. 

> > Indeed, one may rely on the bidi algorithm to declare the Latin
> > example unidirectional.
> 
> One might, but to what purpose and goal?

A right-to-left paragraph consisting of the two characters "(a" would
be bidirectional and have a parenthesis on the right; a left-to-right
paragraph with the same content would have a parenthesis on the left.

The e-mail client I'm using has no higher-level protocol to determine
whether a paragraph is left-to-right or right-to-left, but uses the
first strong character.  Notepad (Windows 7, at least) seems to have two
options - all paragraphs are left-to-right, or all paragraphs are
right-to-left.  

> > If one can determine that text to be rendered boustrophedon is
> > genuinely 'unidirectional', it seems entirely reasonable to call
> > upon the Bidi algorithm to sort out the mirroring of glyphs on a
> > *line* once one has chosen the direction of a line.
> 
> No, not as soon as characters of different or weak/neutral
> directionality are involved, IMO.

If the paragraph contains any digits, it is not genuinely
unidirectional.  If it is, and there are no unmatched PDF characters,
one can just prefix LRO or RLO to each line to get the right
directionality.  If there are strong characters of different
directionalities, then it is unlikely that the paragraph is genuinely
unidirectional.  The full tridirectional (left, right and
boustrophedon) algorithm is likely to be extremely fiddly, as well as
dependent on non-existent information.

Richard.



More information about the Unicode mailing list