Fault in Bidi Algorithm at BD16

Richard Wordingham richard.wordingham at ntlworld.com
Sun Mar 20 12:58:26 CDT 2022


There is a fault in BD16, at least at Unicode 14.0:

The problem lies in this part of the algorithm:

"If an opening paired bracket is found and there is room in the stack,
push its Bidi_Paired_Bracket property value and its text position onto
the stack.

If an opening paired bracket is found and there is no room
in the stack, stop processing BD16 for the remainder of the isolating
run sequence.

If a closing paired bracket is found, do the following:

1.  Declare a variable that holds a reference to the current stack
    element and initialize it with the top element of the stack.

2.  Compare the closing paired bracket being inspected or its
    canonical equivalent to the bracket in the current stack element."

It was picked up by line 312 of BidiCharacterTests.txt:

0061 0020 2329 0062 002E 0031 3009;1;1;2 2 2 2 2 2 2;0 1 2 3 4 5 6

This line primarily checks that U+2329 and U+3009 are identified as a
'bracket pair'.  bpb(U+2329) is U+232A, whose canonical decomposition
is U+3009.  However, the step *numbered* '2' is non-determistic; it
contains the word 'or'.  The simple, robust solution is to change 'or
its canonical equivalent' to 'and its canonical equivalents'.  That
also avoids the risk of 'its canonical equivalent' being interpreted as
the result of the function to_NFC or to_NFD.

It feels simpler to work with the NFC or NFD equivalents of the
candidate opening and closing brackets at both the first and last of
the quoted steps.

I admit that part of the problem was that I was using a tool that
assumed that canonically equivalent characters had the same Unicode
properties.

Richard.


More information about the Unicode mailing list