Bidi Parenthesis Algorithm and BidiCharacterTest.txt
Whistler, Ken
ken.whistler at sap.com
Tue Oct 14 17:14:02 CDT 2014
Eli asked in response to Andrew:
> > · Since 2-17 is now R and not neutral, the resolution of 3-9 is R because the
> > check for context finds the opening parenthesis at 2 (now R) before the a
> at 1.
> > Therefore 2-17 is R under N0c2.
>
> But there's nothing about this in the UAX#9 language! How did you
> arrive at this dependency, using just what the UBA says?
See below.
> > Perhaps this should be emended to include that N0 can also update the
> type for
> > subsequent tests under N0, which is the case here.
>
> There's a big difference between X6 and N0. X6 is about the explicit
> override, and is applied before N0. Your interpretation makes N0 a
> recursive rule, something that is not even hinted at by the UBA spec.
I disagree that this makes N0 a "recursive" rule. It is a rule with repeatedly
applicable subparts. And like nearly all the rules in the UBA (except ones
which explicitly state that they apply to *original* Bidi_Class values,
which thus have to be stored across the life of the processing of
the string in question), all rules apply to the *current* Bidi_Class
values of the examined context.
In this sense, the UBA, for most rules, operates as a set of
"change and forget" steps. Thus in the case of N0, if you are
processing a sequential list of bracket pairs, you just process
each pair, one at a time, and it sees as its input whatever the
*current* state is -- which may be (and often is) changed by
the last step.
What you do *not* need to do for N0 is preserve the starting
state when N0 was initiated, and independently check each
bracket pair against *that* array of Bidi_Class values while you
are busy setting them to new values.
>
> Of course! And so Example 1 is very different from what we are
> discussing, because each stage of the algorithm is applied to the
> results of the previous stage. But there's no other place, AFAICS,
> where the same stage is applied recursively. So I really don't see
> how this interpretation could be gleaned from the UBA description.
I agree that this could (and should) be made more explicit, as
it is apparent that people can run into problems of interpretation
here.
An examination of the functioning of the N0 rule in the bidi
reference implementations could, however, also be used to
help explain what is intended here. For example, in the particular
test case in question, the bidiref C implementation can have its
debug diagnostics cranked up, and you find:
Trace: Entering br_UBA_ResolveEN [W7]
Current State: 13
Text: 0061 0028 0028 007B 0062 2680 005B 005D 0029 007D 005B 0063 005B 005D 005D 05D0 0029
Bidi_Class: L ON ON ON L ON ON ON ON ON ON L ON ON ON R ON
Levels: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Runs: <R--------------------------------------------------------------------------------R>
…
Trace: Exiting br_SortPairList
Pair list: {1,16} {2,8} {6,7} {10,14} {12,13}
Debug: Strong direction e between brackets
Debug: Strong direction o between brackets
Debug: No strong direction between brackets
Debug: Strong direction o between brackets
Debug: No strong direction between brackets
Current State: 14
Text: 0061 0028 0028 007B 0062 2680 005B 005D 0029 007D 005B 0063 005B 005D 005D 05D0 0029
Bidi_Class: L R R ON L ON ON ON R ON R L ON ON R R R
Levels: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Runs: <R--------------------------------------------------------------------------------R>
Which is the clue needed to track down how the interpretation
based on comparing Bidi_Class values retained from the initiation of
rule N0 is incorrect.
--Ken
>
> Thanks for explaining, but it is really frustrating to find out about
> these untold subtleties at this late stage. (And yes, I've read the
> proposed changes in tr9-32.html, and not even they say anything about
> this.) How can we be sure that your interpretation is indeed correct,
> if it is not even hinted anywhere?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20141014/89655a0a/attachment.html>
More information about the Unicode
mailing list