Bidi: inserting Japanese paragraphs in Arabic/Farsi document

Philippe Verdy verdy_p at wanadoo.fr
Sun Nov 20 13:58:58 CST 2016


2016-11-20 19:19 GMT+01:00 Eli Zaretskii <eliz at gnu.org>:

> > From: Philippe Verdy <verdy_p at wanadoo.fr>
> > Date: Sun, 20 Nov 2016 18:51:01 +0100
> > Cc: Simon Cozens <simon at simon-cozens.org>,
> >       unicode Unicode Discussion <unicode at unicode.org>
> >
> > Correction: I expect to see:
> >
> > OWT-CIBARA Japanese2" 【Japanese1】" ENO-CIBARA
>
> I don't understand why.
>
> What do you expect with the brackets removed?  I expect this:
>
>  OWT-CIBARA "Japanese1 Japanese2" ENO-CIBARA
>
> because N0 and N1 are no-ops, and N2 clearly says that a neutral
> character that is surrounded by text of different directionalities
> takes the embedding direction.
>

With ASCII quotes that are hard to match unambiguously in pairs, they would
normally inherit what is in their prior context if they cannot be paired.
So the first quotation mark would take the RTL direction of ARABIC-ONE. the
second quotation mark would also inherit the LTR direction of "Japanese2"
and would to its right.

The final effect would be that quotes would appear glued side-by-side. But
note that the two japanese backets are matching together, so no quotation
mark can be between them: the whole bracketed section including brackets
should be creating its own isolate: this occurs only with the old Bidi
algorithm that did not take bracket pairs into account.

So the [Japanese1] bracketed section should be OK with new renderers (this
is not the case with Chrome that still uses the old algorithm), just after
the ARABIC-ONE and the leading quotation mark of the Japanese section.

But probably the correct rendering should rather be:

   OWT-CIBARA 【Japanese1】 Japanese2"" ENO-CIBARA

unless ASCII quotation marks are paired, in which case you'll get:

   OWT-CIBARA "【Japanese1】 Japanese2" ENO-CIBARA

which is most probably what is expected.

All this is about deciding if a quotation mark is "leading" or "trailing",
and this is not clear at all for ASCII quotation marks and it has a
consequence on the final rendering made by the Bidi algorithm
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161120/33863bfd/attachment.html>


More information about the Unicode mailing list