Unclear text in the UBA (UAX#9) of Unicode 6.3

Asmus Freytag asmusf at ix.netcom.com
Tue Apr 22 11:06:27 CDT 2014

On 4/22/2014 2:19 AM, Ilya Zakharevich wrote:
> I think the crucial problem is with
>    1(  2[  3(  4]  5) 5b]  6)
> I have two possible interpretations: one matches 2 with 5b, another
> leaves 2 unmatched.


if you read UAX#9, the way the algorithm works is by pushing openers on 
a stack, then, on finding the first closer, going down the stack and 
attempting to locate a match, then, on finding a match, discarding any 
enclosed openers, on not finding a match, discarding the closer.

(discard = ignore for further matching, don't treat as bracket any longer).

So, when we reach 4] we have


on the stack. The match is with 2[ and 3 is ignored. 1( remains and can 
be matched later to 5).

Ultimately 5b] and 6) are ignored.

I believe that your scheme does not match the PBA in that it assumes 
that brackets are hierarchical and attempts to preserve the best 
hierarchy, whereas PBA assumes that pairs that are closer together are 
more likely to be correct matches (for non-mathematical texts 
hierarchies are not the norm (and they are shallow at best)).

What the PBA actually does can now be put into a definition plus a rule, 
neither of which use "stack" or other implementation details, such as 
"variables" or "lists".

D  A bracket pair is a pair of an opening paired bracket and a closing
   paired bracket characters within the same isolating run sequence,
   such that the Bidi_Paired_Bracket property value of the former
   character or its canonical equivalent equals the latter character or
   its canonical equivalent.

R  Characters are resolved into resolved bracket pairs as follows:
   Starting at the beginning of the text, when the a closing bracket 
   is encountered, find the nearest preceding opening character that is 
not part
   of  a resolved pair, and not ignored for pair resolution and that can 
form a
   bracket pair. If one exists, resolve the pair, and mark any enclosed 
   brackets of any kind as ignored. Otherwise, if no pair can be 
resolved, mark
   the closing bracket as ignored.

What this shows is that what the text in BD16 of UAX#9 tries to cover is 
both a definition
and a rule; which makes it so difficult to follow.

I think what should be proposed is such a breakdown into a smaller 
definition that
speaks to the matching of properties (modulo canonical equivalence) separate
from the strategy for resolving actual pairs, which is better stated as 
a rule.

The rule does not need to use implementation language to be definite.

A "resolved" bracket pair is simply the actual pair resolved by rule "R" 
and the
rest of the PBA acts on "resolved" pairs.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140422/21307bfe/attachment.html>

More information about the Unicode mailing list