Bidi: inserting Japanese paragraphs in Arabic/Farsi document
verdy_p at wanadoo.fr
Sun Nov 20 14:19:40 CST 2016
Note that if you get :
OWT-CIBARA "Japanese2 【Japanese1】" ENO-CIBARA
this means that the first quotation mark is "transparent" and preserves the
And I don't see then how you can pair the final quotation mark, unless you
consider it as "leading" the ARABIC-TWO part (meaning that you don't pair
these quotation marks at all: only brackets are paired and the
is correct (you are using the new Bidi algorithm).
There's still ambiguities for handling pairs of quotation marks (this is
not evident at all and it is language-dependant when some languages do not
distinguish the glyph for the leading and trailing marks, or swap them, for
example with »Deutsch« as opposed to «Italiano» or « français», and it is a
difdicult problem in multilingual documents not only mixing RTL and LTR
scripts and needing the Bidi algorithm, and different LTR languages are
For citation of Japanese in Arabic text, I sould suggest using Asian
quotation marks by encoding:
ARABIC-ONE 「【Japanese1】 Japanese2」 ARABIC-TWO
so that Asian quotation marks will unambiguously pair together and you'll
OWT-CIBARA 「Japanese2 【Japanese1】」 ENO-CIBARA
Or because 「」, like also 【】, are unambiguously LTR giving them a strong LTR
direction, you'd then get the best:
OWT-CIBARA 「【Japanese1】 Japanese2」 ENO-CIBARA
But If there are line-wraps in the middle of the Japanese section:
notably if you can't mirror the CJK quotation marks
Otherwise if you can mirror these marks :
or without any line-break in the middle of the Japanese quotation :
OWT-CIBARA └Japanese2【Japanese1】┐ ENO-CIBARA
(here I use└ ┐ only as aliases for the mirrored「」, which are not encoded)
2016-11-20 20:58 GMT+01:00 Philippe Verdy <verdy_p at wanadoo.fr>:
> 2016-11-20 19:19 GMT+01:00 Eli Zaretskii <eliz at gnu.org>:
>> > From: Philippe Verdy <verdy_p at wanadoo.fr>
>> > Date: Sun, 20 Nov 2016 18:51:01 +0100
>> > Cc: Simon Cozens <simon at simon-cozens.org>,
>> > unicode Unicode Discussion <unicode at unicode.org>
>> > Correction: I expect to see:
>> > OWT-CIBARA Japanese2" 【Japanese1】" ENO-CIBARA
>> I don't understand why.
>> What do you expect with the brackets removed? I expect this:
>> OWT-CIBARA "Japanese1 Japanese2" ENO-CIBARA
>> because N0 and N1 are no-ops, and N2 clearly says that a neutral
>> character that is surrounded by text of different directionalities
>> takes the embedding direction.
> With ASCII quotes that are hard to match unambiguously in pairs, they
> would normally inherit what is in their prior context if they cannot be
> So the first quotation mark would take the RTL direction of ARABIC-ONE.
> the second quotation mark would also inherit the LTR direction of
> "Japanese2" and would to its right.
> The final effect would be that quotes would appear glued side-by-side. But
> note that the two japanese backets are matching together, so no quotation
> mark can be between them: the whole bracketed section including brackets
> should be creating its own isolate: this occurs only with the old Bidi
> algorithm that did not take bracket pairs into account.
> So the [Japanese1] bracketed section should be OK with new renderers (this
> is not the case with Chrome that still uses the old algorithm), just after
> the ARABIC-ONE and the leading quotation mark of the Japanese section.
> But probably the correct rendering should rather be:
> OWT-CIBARA 【Japanese1】 Japanese2"" ENO-CIBARA
> unless ASCII quotation marks are paired, in which case you'll get:
> OWT-CIBARA "【Japanese1】 Japanese2" ENO-CIBARA
> which is most probably what is expected.
> All this is about deciding if a quotation mark is "leading" or "trailing",
> and this is not clear at all for ASCII quotation marks and it has a
> consequence on the final rendering made by the Bidi algorithm
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode