Unclear text in the UBA (UAX#9) of Unicode 6.3
verdy_p at wanadoo.fr
Wed Apr 23 21:37:11 CDT 2014
Thanks for the clear reply, now I know that my example in a prior message
would work appropriately with UBA:
This is an [«] ARABIC EXAMPLE [»] for demonstration only.
- the opening guillemet is not stripped out of the context stack when the
first closing bracket is matched with the first opening bracket,
- later the closing guillemet matches the opening guillemet remaining on
the stack, even if the second opening bracket was pushed on top of it :
pair of guillemets is matched, the opening guillement is dropped from the
stack but the second bracket on top of it remains there and can also match
now the following closing bracket.
So brackets pairs can effectively overlap non hierarchically.
But still there's a problem:
- The first pair of bracket starts immediately after an LTR Latin context
so its direction is LTR too and consistant: these brackets won't be
- The pair of guillemets starts also after the first opening bracket has
been resolved as LTR, so both guillemets will be LTR and won't be mirrored.
- However the second pair of brackets starts just after an ARABIC context :
these brackets will be BOTH mirrored.
But we get:
- "This is an " : strong LTR at start, the last weak space inherits frol
the last letter "n" , no mirroring anywhere
- "["; resolved as LTR by inheritance from the previous resolved space, no
- "«": ditto
- "]": found match, is LTR like the matching opening bracket.
- " ARABIC EXAMPLE " the first weak space inherits from the bracket but
then we have a direction to RTL up to the last space.
- "["; resolved as RTL, mirrored
- "»", resolved as LTR due tu pair matching, no mirroring
- "]"; resolved as RTL due tu pair matching, mirrored
- " for demonstrations only" : the first weak space inherits from the
previous RTL bracket, but then the direction switches to LTR for the first
Latin letter up to the end of string
And we have then the follwing runs with directions resolved :
- "This is an [«] " : LTR
- "ARABIC EXAMPLE [" : RTL
- "»" : LTR
- "] " : RTL
- "for demonstraton only." : LTR
Now we can apply the Bidi possible linewraps (where appropriate if this
does not fit a single line) and then the reordering of each line. Assuming
that everything fits on the same line I get (without mirroring applied):
"This is an [«] [ ELPMAXE CIBARA»[for demonstration only."
And with mirroring applied:
"This is an [«] ] ELPMAXE CIBARA»[for demonstration only."
Ugly isn't it ?
I still don't see any solution without using Bidi controls in the string.
But if we use Bidi controls to force a change of direction and this only
applies distinct directions for elements in a pair.
In my opinion such pair should NOT match (this is the case here for the
second pair of brackets.
What I would like to see (including with mirroring applied where needed
should better be:
"This is an [«] ELPMAXE CIBARA [»] for demonstration only."
There are still some tweaks to do to the algorithm.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode