Unclear text in the UBA (UAX#9) of Unicode 6.3

Asmus Freytag asmusf at ix.netcom.com
Thu Apr 24 02:28:50 CDT 2014

On 4/23/2014 7:37 PM, Philippe Verdy wrote:
> Thanks for the clear reply, now I know that my example in a prior 
> message would work appropriately with UBA:
>   This is an [«] ARABIC EXAMPLE [»] for demonstration only.
> Because:
> - the opening guillemet is not stripped out of the context stack when 
> the first closing bracket is matched with the first opening bracket,
This is _*incorrect*_, see the text in blue/bold in the definition 
copied below.
The second bullet in item 3 of the second second-level bullet of the 
third top-level bullet of BD16 clearly says that all elements that are 
above the matched element are popped together with it.
> - later the closing guillemet matches the opening guillemet remaining 
> on the stack,
No, this is_*incorrect*_, because the stack has been popped.

The problem with the "stack" in this algorithm is that it isn't a stack. 
A stack is a data structure that allows you to manipulate the top 
element. This data structure is simply a list, to which elements are 
appended, as opening brackets are found, and which then is scanned (from 
the tail) for a match, and, on meeting a match, the tail is trimmed.

Item "4" is the one that does the iteration in scanning the tail. After 
one or more iterations, item "3" no longer operates on what would have 
been the "top" element of a "stack", but deep in the tail of a list. 
When the items are "popped" it's equivalent to dropping the tail. 
(Unlike your interpretation, which would remove individual elements, 
this language clearly refers to multiple elements.)
> even if the second opening bracket was pushed on top of it : pair of 
> guillemets is matched, the opening guillement is dropped from the 
> stack but the second bracket on top of it remains there and can also 
> match now the following closing bracket.
> So brackets pairs can effectively overlap non hierarchically.

BD16. A /bracket pair/ is a pair of characters consisting of an /opening 
paired bracket/ and a /closing paired bracket/ such that the 
Bidi_Paired_Bracket property value of the former or its canonical 
equivalent equals the latter or its canonical equivalent and which are 
algorithmically identified at specific text positions within an 
/isolating run sequence/. The following algorithm identifies all of the 
/bracket pairs/ in a given /isolating run sequence/:

  * Create a stack for elements each consisting of a bracket character
    and a text position. Initialize it to empty.
  * Create a list for elements each consisting of two text positions,
    one for an opening paired bracket and the other for a corresponding
    closing paired bracket. Initialize it to empty.
  * Inspect each character in the isolating run sequence in logical order.
      o If an opening paired bracket is found, push its
        Bidi_Paired_Bracket property value and its text position onto
        the stack.
      o If a closing paired bracket is found, do the following:
         1. Declare a variable that holds a reference to the current
            stack element and initialize it with the top element of the
         2. Compare the closing paired bracket being inspected or its
            canonical equivalent to the bracket in the current stack
         3. If the values match, meaning the two characters form a
            bracket pair, then
              + Append the text position in the current stack element
                together with the text position of the closing paired
                bracket to the list.
              + **Pop the stack _through the current stack element
         4. Else, if the current stack element is not at the bottom of
            the stack, advance it to the next element deeper in the
            stack and go back to step 2.
         5. Else, continue with inspecting the next character without
            popping the stack.
  * Sort the list of pairs of text positions in ascending order based on
    the text position of the /opening paired bracket/.

> But still there's a problem:

The remainder of the problems can't be discussed, because the premise is 
wrong (see above).

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140424/41e78b31/attachment.html>

More information about the Unicode mailing list