Unclear text in the UBA (UAX#9) of Unicode 6.3

Mon Apr 21 02:55:59 CDT 2014

> Date: Sun, 20 Apr 2014 12:58:23 -0700
> From: Asmus Freytag <asmusf at ix.netcom.com>
> 
> On 4/20/2014 3:24 AM, Eli Zaretskii wrote:
> > Would someone please help understand the following subtleties and
> > obscure language in the UBA document found at
> > http://www.unicode.org/reports/tr9/?  Thanks in advance.
> 
> Eli,
> 
> I've tried to give you some explanations

Thanks!

> in some places, I concur with you that the wording could be improved
> and that such improved wording should be proposed to the UTC (or its
> editorial committee) for incorporation into a future update.

How do we do that?

> For details, see below.
> >
> > 1. In paragraph 3.1.2, near its very end, we have this sentence (with
> > my emphasis):
> >
> >    As rule X10 will specify, an isolating run sequence is the unit to
> >    which the rules following it are applied, and the last character of
> >          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >    one level run in the sequence is considered to be immediately
> >    followed by the first character of the next level run in the
> >    sequence during this phase of the algorithm.
> >
> > What does it mean here by "the rules following it"?  Following what?
> 
> That looks like a bad referent,  but from context, this "it" must be X10

Ah, so simply saying "the following rules" or "rules following X10"
would be enough.

> Bullet 1 could be changed to
> 
>    . Create a stack for elements each consisting of a*code point*  (Bidi_Paired_Bracket property value)
>      and a text position. Initialize it to empty.
> 
> to make things more clear. And a slight wording change might help the 
> reader with item 2:
> 
>        2. Compare the*code point for the*closing paired bracket being inspected or its
>        	 canonical equivalent to the*code poin*t (Bidi_Paired_Bracket property value) in the current stack
>        	 element.
> 
> 
> And, to continue
> 
>        3. If the values match, meaning*the character being inspected and the character**
> **	 at the text position in the stack*  form a bracket pair, then [...]

Right, this makes the description a whole lot more clear.

>     Apply rules W1–W7, N0–N2, and I1–I2 to each of the isolating run sequences.
>     For each sequence, [completely] apply each rule in the order in which they appear below.
>     The order that one isolating run sequence is treated relative to another does not matter.
> 
> I believe the above restatement expresses the same thing in fewer words.

It does, thanks.

> > 5. Rule N0 says:
> >
> >     . For each bracket-pair element in the list of pairs of text positions
> >
> >       a. Inspect the bidirectional types of the characters enclosed
> >       	within the bracket pair.
> >       b. If any strong type (either L or R) matching the embedding
> >       	direction is found, set the type for both brackets in the pair
> >       	to match the embedding direction.
> >
> > First, what is meant here by "strong type [...] matching the embedding
> > direction"?  Does the "match" here consider only the odd/even value of
> > the current embedding level vs R/L type, in the sense that odd levels
> > "match" R and even levels "match" L?  Or does this mean some other
> > kind of matching?  Table 3, which the only place that seems to refer
> > to the issue, is not entirely clear, either:
> >
> >    e   The text ordering type (L or R) that matches the embedding level
> >        direction (even or odd).
> >
> > Again, the sense of the "match" here is not clear.
> 
> even/odd --- R/L match, might be made more explicit

I agree this should be made more explicit, as this is a somewhat
subtle issue that might trip the reader.

> > Next, what is meant here by "the characters enclosed within the
> > bracket pair"?  If the bracket pair encloses another bracket pair,
> > which is inner to it, do the characters inside the inner pair count
> > for the purposes of resolving the level of the outer pair?
> They do, so there's no need to change the text.

It might be a good idea to say that explicitly, e.g. as a note, or at
least provide another example where the strong characters are only
inside an inner bracket pair, which will send the same message to the
reader.

Thanks again for the clarifications.