Bidi: inserting Japanese paragraphs in Arabic/Farsi document

Philippe Verdy verdy_p at wanadoo.fr
Sun Nov 20 10:20:49 CST 2016


So it is an issue of Chrome, still not using the new rules. I thought it
was already using them.

The alignment of the paragraph to the right is optional, it is less
essential. It would still be satisfactory to see:

Japanese2 【Japanese1】

That alignment is prefered only when it is a separate paragraph, but if the
Japanese citation is within an Arabic paragraph encoded as :

ARABIC-ONE "【Japanese1】Japanese2" ARABIC-TWO

I expect to see

                                               OWT-CIBARA Japanese2
【Japanese1】"" ENO-CIBARA

aligned to the right margin,or:

   OWT-CIBARA Japanese2 【Japanese1】"" ENO-CIBARA

if it occurs in an Arabic document.

There's still the problem of surrounding quation marks that don't form
matching pairs (unlike brackets), that's why authors will likely use
mirrorable quotation marks, or will need to surround the Japanese citation
and the quotations using some isolation using <bdi>...</bdi> or equivalent
bidi isolate controls, or an LTR override control for the leading quotation
mark to get:

   OWT-CIBARA "Japanese2 【Japanese1】" ENO-CIBARA

May be some bidi processors may opt for matching quotation mark pairs such
as "..." or “...” or „...‟ or «...» or »...« or 「...」, but it is well known
that this won't work if quotation marks are not paired or use the same
mirrorable character for the leasing and trailing quotation marks as  ”
...”,.

Same problem if quotations span multiple paragraphs where an additional
quotation mark is leading each additional paragraph in the same quotation
(for saying that the quotation continues), with only one quotation mark at
end of the last paragraph) which can't be paired easily without
ambiguities, or more complex resolution which will be language dependant
and would probably require additonal markup of the language used in the
citation text itself, or for the whole container including the quotation
marks. And example of this complex case is

   « CITATION1
   » CITATION2
   » CITATION3 », Author

This style above is parsable by considering that any "trailing" quotation
mark leading any line cannot be really a trailing mark (it is then a
continuation mark) and that to match the trailing quotation mark, you need
to look further, possibly in multiple paragraphs.

As far as I know, there's no easy way to encode in plain-text Unicode only
(without markup), that continuation marks should be ignored by Bidi
processors for matching pairs, except by putting these continuation marks
in isolates (e.g. above the continuation marks just before CITATION2 and
CITATION3 will be encoded as <LRI,»,PDI>, or in HTML as <bdi>»</bdi>).

There's no easy solution for this case except by using some isolation with
an explicit direction set to surround the whole (<bdi dir="ltr">...</bdi>
or LRI...PDI). It is notable that most quotation marks are also not
mirrorable, but pseudo-mirroring by replacing these marks may be made in
language-dependant processors.

2016-11-20 16:29 GMT+01:00 Eli Zaretskii <eliz at gnu.org>:

> > From: Simon Cozens <simon at simon-cozens.org>
> > Date: Sun, 20 Nov 2016 21:22:46 +1100
> >
> > On 20/11/2016 19:46, Philippe Verdy wrote:
> > > Why don't the Japanese backets pair together to avoid having one
> > > mirrored and not the other one ?
> >
> > Isn't this the classic bidi brackets problem? The 【 is assumed to belong
> > to the base level because it's bidi neutral, but the 】 is assumed to be
> > part of the LTR text, so they end up in different isolating runs.
>
> The UBA was changed in Unicode 6.3 to process mirrored bracket pairs
> specially, to avoid this issue.  But not all browsers caught up with
> that yet.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161120/b47bfeba/attachment.html>


More information about the Unicode mailing list