Re: Unicode Bidi Algorithm – Java reference implementation

Ken Whistler kenwhistler at att.net
Sun Sep 18 19:16:50 CDT 2016


On 9/17/2016 10:26 AM, Deepak Jois wrote:
> I now need to make the updates to support the changes in Unicode 8.0,
> and I am finding it a bit hard to grok the changes in C at a glance.
>

The UBA 7.0 --> UBA 8.0 changes were rather subtle. They did not change 
much about the gross behavior of the algorithm, but there were some 
fixes for edge cases in a couple rules. Also, the specification of 
behavior on stack overflow became exact, rather than implementation-defined.

The C bidi reference code is a bit complicated, because it supports 
*all* UBA versions from 6.2 through 8.0, which means it has to special 
case rule processing by versions when the specification itself changes.

If you diff the 7.0 version of brrule.c and the 8.0 version of brrule.c 
you'll find the heart of the differences there, along with explanations 
in comments for the changes. The new function br_SetBracketPairBC 
handles an edge case for combining marks following a bracket. The code 
using a new flag testONisNotRequired deals with an edge case for the 
current Bidi_Class of brackets being tested for pairing. Changes in 
br_PushBracketStack are involved in the need to keep the pre-8.0 
behavior as it was for earlier versions of bidiref, but allowing for 
explicit behavior for stack overflow for 8.0.

It may also help to compare the 7.0 and 8.0 versions of UAX #9 itself, 
so you can see the textual changes in the specification of the rules. 
Try diffing:

http://www.unicode.org/reports/tr9/tr9-31.html (7.0)
http://www.unicode.org/reports/tr9/tr9-33.html (8.0)

The significant changes there are in BD11, BD14, BD15, BD16, and in 
rules X5a, X5b, X6a, and N0. (The rest of the changes in the updated 
document are cosmetic.)

--Ken



More information about the Unicode mailing list