Bidi Parenthesis Algorithm and BidiCharacterTest.txt

Eli Zaretskii eliz at gnu.org
Wed Oct 15 00:36:36 CDT 2014


> From: "Whistler, Ken" <ken.whistler at sap.com>
> Date: Tue, 14 Oct 2014 22:14:02 +0000
> Cc: "Whistler, Ken" <ken.whistler at sap.com>,
>         "unicode at unicode.org" <unicode at unicode.org>
> 
> I disagree that this makes N0 a "recursive" rule. It is a rule with repeatedly
> applicable subparts. And like nearly all the rules in the UBA (except ones
> which explicitly state that they apply to *original* Bidi_Class values,
> which thus have to be stored across the life of the processing of
> the string in question), all rules apply to the *current* Bidi_Class
> values of the examined context.

Can you point out where this is stated in the UBA?

According to my reading of the UBA, only W7 could qualify as something
similar to the "recursive" interpretation of N0.  All the other rules
are either defined in a way that the "recursion" cannot happen
(because the conditions for applying the rule disappear after it is
applied once), or explicitly speak about a sequence of similar
characters whose bidi types are modified in the same manner.

> Trace: Exiting br_SortPairList
> Pair list: {1,16} {2,8} {6,7} {10,14} {12,13}
> Debug: Strong direction e between brackets
> Debug: Strong direction o between brackets
> Debug: No strong direction between brackets
> Debug: Strong direction o between brackets
> Debug: No strong direction between brackets

This doesn't explain _why_ the decision was that the direction between
brackets was one or the other.  Which is at the core of the issue at
hand.  So this debugging output doesn't really help here.

In any case, when designing an implementation, one normally expects to
read some formal requirements, not learn those requirements from
another implementation.

Anyway, I'm glad we all agree that, once again, the new additions to
the UBA, and the BPA-related ones in particular, are not described
well enough to avoid misinterpretations and misunderstanding such as
this one, and that the language should be improved and clarified,
hopefully sooner rather than later.  I've just lost 20 hours of work
due to that.


More information about the Unicode mailing list