Unclear text in the UBA (UAX#9) of Unicode 6.3

Eli Zaretskii eliz at gnu.org
Mon Apr 21 03:33:39 CDT 2014


> Date: Sun, 20 Apr 2014 23:03:20 -0700
> From: Asmus Freytag <asmusf at ix.netcom.com>
> CC: Eli Zaretskii <eliz at gnu.org>, unicode at unicode.org, 
>  Kenneth Whistler <ken at unicode.org>
> 
> >>         Note that the current embedding level is not changed by this rule.
> >>
> >>     What does this last sentence mean by "the current embedding level"?
> >>     The first bullet of X6 mandates that "the current character’s
> >>     embedding level" _is_ changed by this rule, so what other "current
> >>     embedding level" is alluded to here?
> >     I'm punting on that one - can someone else answer this?
> >
> >
> > I assume "current embedding level" here meant "the embedding level of 
> > the last entry on the directional status stack". (This is a natural 
> > slip to make if you think in terms of an optimized implementation that 
> > stores each component of the top of the directional status stack in a 
> > variable, as suggested in 3.3.2.)
> >
> > James
> >
> In general, I heartily dislike "specifications" that just narrate a 
> particular implementation...

I cannot agree more.

In fact, my main gripe about the UBA additions in 6.3 are that some of
their crucial parts are not formally defined, except by an algorithm
that narrates a specific implementation.  The two worst examples of
that are the "definitions" of the isolating run sequence and of the
bracket pair.  I didn't ask about those because I succeeded to figure
them out, but it took many readings of the corresponding parts of the
document.  It is IMO a pity that the two main features added in 6.3
are based on definitions that are so hard to penetrate, and which
actually all but force you to use the specific implementation
described by the document.

My working definition that replaces BD13 is this:

  An isolating run sequence is the maximal sequence of level runs of
  the same embedding level that can be obtained by removing all the
  characters between an isolate initiator and its matching PDI (or
  paragraph end, if there is no matching PDI) within those level runs.

As for bracket pair (BD16), I'm really amazed that a concept as easy
and widely known/used as this would need such an obscure definition
that must have an algorithm as its necessary part.  How about this
instead:

  A bracket pair is a pair of an opening paired bracket and a closing
  paired bracket characters within the same isolating run sequence,
  such that the Bidi_Paired_Bracket property value of the former
  character or its canonical equivalent equals the latter character or
  its canonical equivalent, and all the opening and closing bracket
  characters in between these two are balanced.

Then we could use the algorithm to explain what it means for brackets
to be balanced (for those readers who somehow don't already know
that).

Again, thanks for clarifying these subtle issues.  I can now proceed
to updating the Emacs bidirectional display with the changes in
Unicode 6.3.




More information about the Unicode mailing list