Limits in UBA

Eli Zaretskii eliz at
Wed Oct 22 10:53:27 CDT 2014


I have 2 questions related to the Unicode Bidirectional Algorithm,
both regarding limits on certain aspects of the UBA.

First, I'd like to ask about the 127 entries of the directional status
stack; it had 63 entries in the version of the UBA before Unicode 6.3.
Where and why are such deep embeddings/isolates needed?  Does anyone
know of practical examples of text that requires such a depth?

I personally never saw a situation where one or 2 embeddings/overrides
were not enough.  This is a far cry from the UAX#9 numbers.
Implementing such a deep stack requires memory-management solutions
that are non-trivial, and add complexity to an already complex
algorithm, but if I implement only a small fraction of that, I cannot
claim bidirectional conformity.  So I wonder if there's a practical
justification for such a deep UBA stack.

The second question is about the stack required for implementing the
BPA resolution of brackets, as described in BD16 and N0.  The UBA
doesn't place any limits on the depth of that stack.  This means that
text with a large enough number of opening bracket characters and no
closing brackets could exhaust the entire memory space of an
application.  What is the implementation supposed to do in this
situation?  Crashing or exiting with a fatal error code is clearly
inappropriate in some applications.

Is it even reasonable not to have any limits for this stack?

Thanks in advance for any insights.

More information about the Unicode mailing list