Unclear text in the UBA (UAX#9) of Unicode 6.3
asmusf at ix.netcom.com
Thu Apr 24 02:28:50 CDT 2014
On 4/23/2014 7:37 PM, Philippe Verdy wrote:
> Thanks for the clear reply, now I know that my example in a prior
> message would work appropriately with UBA:
> This is an [«] ARABIC EXAMPLE [»] for demonstration only.
> - the opening guillemet is not stripped out of the context stack when
> the first closing bracket is matched with the first opening bracket,
This is _*incorrect*_, see the text in blue/bold in the definition
The second bullet in item 3 of the second second-level bullet of the
third top-level bullet of BD16 clearly says that all elements that are
above the matched element are popped together with it.
> - later the closing guillemet matches the opening guillemet remaining
> on the stack,
No, this is_*incorrect*_, because the stack has been popped.
The problem with the "stack" in this algorithm is that it isn't a stack.
A stack is a data structure that allows you to manipulate the top
element. This data structure is simply a list, to which elements are
appended, as opening brackets are found, and which then is scanned (from
the tail) for a match, and, on meeting a match, the tail is trimmed.
Item "4" is the one that does the iteration in scanning the tail. After
one or more iterations, item "3" no longer operates on what would have
been the "top" element of a "stack", but deep in the tail of a list.
When the items are "popped" it's equivalent to dropping the tail.
(Unlike your interpretation, which would remove individual elements,
this language clearly refers to multiple elements.)
> even if the second opening bracket was pushed on top of it : pair of
> guillemets is matched, the opening guillement is dropped from the
> stack but the second bracket on top of it remains there and can also
> match now the following closing bracket.
> So brackets pairs can effectively overlap non hierarchically.
BD16. A /bracket pair/ is a pair of characters consisting of an /opening
paired bracket/ and a /closing paired bracket/ such that the
Bidi_Paired_Bracket property value of the former or its canonical
equivalent equals the latter or its canonical equivalent and which are
algorithmically identified at specific text positions within an
/isolating run sequence/. The following algorithm identifies all of the
/bracket pairs/ in a given /isolating run sequence/:
* Create a stack for elements each consisting of a bracket character
and a text position. Initialize it to empty.
* Create a list for elements each consisting of two text positions,
one for an opening paired bracket and the other for a corresponding
closing paired bracket. Initialize it to empty.
* Inspect each character in the isolating run sequence in logical order.
o If an opening paired bracket is found, push its
Bidi_Paired_Bracket property value and its text position onto
o If a closing paired bracket is found, do the following:
1. Declare a variable that holds a reference to the current
stack element and initialize it with the top element of the
2. Compare the closing paired bracket being inspected or its
canonical equivalent to the bracket in the current stack
3. If the values match, meaning the two characters form a
bracket pair, then
+ Append the text position in the current stack element
together with the text position of the closing paired
bracket to the list.
+ **Pop the stack _through the current stack element
4. Else, if the current stack element is not at the bottom of
the stack, advance it to the next element deeper in the
stack and go back to step 2.
5. Else, continue with inspecting the next character without
popping the stack.
* Sort the list of pairs of text positions in ascending order based on
the text position of the /opening paired bracket/.
> But still there's a problem:
The remainder of the problems can't be discussed, because the premise is
wrong (see above).
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode