Bidi Parenthesis Algorithm and BidiCharacterTest.txt

Whistler, Ken ken.whistler at sap.com
Tue Oct 14 17:14:02 CDT 2014


Eli asked in response to Andrew:



> > · Since 2-17 is now R and not neutral, the resolution of 3-9 is R because the

> > check for context finds the opening parenthesis at 2 (now R) before the a

> at 1.

> > Therefore 2-17 is R under N0c2.

>

> But there's nothing about this in the UAX#9 language!  How did you

> arrive at this dependency, using just what the UBA says?



See below.



> > Perhaps this should be emended to include that N0 can also update the

> type for

> > subsequent tests under N0, which is the case here.

>

> There's a big difference between X6 and N0.  X6 is about the explicit

> override, and is applied before N0.  Your interpretation makes N0 a

> recursive rule, something that is not even hinted at by the UBA spec.



I disagree that this makes N0 a "recursive" rule. It is a rule with repeatedly

applicable subparts. And like nearly all the rules in the UBA (except ones

which explicitly state that they apply to *original* Bidi_Class values,

which thus have to be stored across the life of the processing of

the string in question), all rules apply to the *current* Bidi_Class

values of the examined context.



In this sense, the UBA, for most rules, operates as a set of

"change and forget" steps. Thus in the case of N0, if you are

processing a sequential list of bracket pairs, you just process

each pair, one at a time, and it sees as its input whatever the

*current* state is -- which may be (and often is) changed by

the last step.



What you do *not* need to do for N0 is preserve the starting

state when N0 was initiated, and independently check each

bracket pair against *that* array of Bidi_Class values while you

are busy setting them to new values.



>

> Of course!  And so Example 1 is very different from what we are

> discussing, because each stage of the algorithm is applied to the

> results of the previous stage.  But there's no other place, AFAICS,

> where the same stage is applied recursively.  So I really don't see

> how this interpretation could be gleaned from the UBA description.



I agree that this could (and should) be made more explicit, as

it is apparent that people can run into problems of interpretation

here.



An examination of the functioning of the N0 rule in the bidi

reference implementations could, however, also be used to

help explain what is intended here. For example, in the particular

test case in question, the bidiref C implementation can have its

debug diagnostics cranked up, and you find:



Trace: Entering br_UBA_ResolveEN [W7]

Current State: 13

  Text:        0061 0028 0028 007B 0062 2680 005B 005D 0029 007D 005B 0063 005B 005D 005D 05D0 0029

  Bidi_Class:     L   ON   ON   ON    L   ON   ON   ON   ON   ON   ON    L   ON   ON   ON    R   ON

  Levels:         1    1    1    1    1    1    1    1    1    1    1    1    1    1    1    1    1

  Runs:        <R--------------------------------------------------------------------------------R>

…

Trace: Exiting br_SortPairList

Pair list:  {1,16} {2,8} {6,7} {10,14} {12,13}

Debug: Strong direction e between brackets

Debug: Strong direction o between brackets

Debug: No strong direction between brackets

Debug: Strong direction o between brackets

Debug: No strong direction between brackets

Current State: 14

  Text:        0061 0028 0028 007B 0062 2680 005B 005D 0029 007D 005B 0063 005B 005D 005D 05D0 0029

  Bidi_Class:     L    R    R   ON    L   ON   ON   ON    R   ON    R    L   ON   ON    R    R    R

  Levels:         1    1    1    1    1    1    1    1    1    1    1    1    1    1    1    1    1

  Runs:        <R--------------------------------------------------------------------------------R>



Which is the clue needed to track down how the interpretation

based on comparing Bidi_Class values retained from the initiation of

rule N0 is incorrect.



--Ken





>

> Thanks for explaining, but it is really frustrating to find out about

> these untold subtleties at this late stage.  (And yes, I've read the

> proposed changes in tr9-32.html, and not even they say anything about

> this.)  How can we be sure that your interpretation is indeed correct,

> if it is not even hinted anywhere?

>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20141014/89655a0a/attachment.html>


More information about the Unicode mailing list