Limits in UBA
ken.whistler at sap.com
Wed Oct 22 14:42:06 CDT 2014
I think you are correct that the BidiCharacterTest.txt data currently
does not go beyond 3 nesting levels for testing the BPA part of UBA.
I agree with Andrew that that is reasonable guide to the normal limit
of meaningful bracket embeddings one might find in text. However,
I don't think it is safe to assume that 3 is the deepest that the
conformance test data would ever have in it.
Unlike the bidi format control embeddings, which are hard to visualize
and involve special input or programming, it is *easy* for people to
generate strings with deeply embedded bracket pairs:
So it might make sense to add test cases with data like that to
BidiCharacterTest. In such cases, fallback behavior when hitting
the implementation limit are presumably o.k., but is advisable to
check implementations to ensure that they don't actually fall over
if they *do* hit their limit.
In the C BidiRef reference implementation I wrote, the limit I
picked was simply half the maximum string length it would process,
on the assumption that the worst case it would have to deal with would
be a string consisting of *nothing but* bracket pairs.
If supporting 1024 bracket pair levels in "cheap" for Emacs support,
that seems like a defensible limit choice to me.
> > The BPA is not as subject to the extremes of generated text, and therefore
> brackets should follow a natural limit such that it is possible for a human to
> parse and track the bracketed levels. As such, the max depth is going to be
> quite low in normal text. Most cases of the BPA involve one pair. Nested
> pairs beyond three become quite artificial - except in languages such as LISP.
> However, supporting correct display of Bidi LISP code is not a goal of the
> BPA. I'm not sure what the maximum depth used by the test data is - I think
> that is the best current guide unless we introduce something.
> The test data doesn't have more than 3 nested levels, I think.
> For Emacs, I limited the BPA stack at 1024 levels, which is probably
> way too much, but it was cheap, so I saw no reason forcing an
> arbitrary lower limit.
More information about the Unicode