Unclear text in the UBA (UAX#9) of Unicode 6.3

Asmus Freytag asmusf at ix.netcom.com
Tue Apr 22 13:41:39 CDT 2014


On 4/22/2014 10:11 AM, Eli Zaretskii wrote:
>> Date: Tue, 22 Apr 2014 09:52:43 -0700
>> From: Asmus Freytag <asmusf at ix.netcom.com>
>> CC: nospam-abuse at ilyaz.org, verdy_p at wanadoo.fr, ken at unicode.org,
>>   jjc at jclark.com, unicode at unicode.org
>>
>>> I agree, but let me try to say the same more concisely:
>>>
>>>      A bracket pair is a pair of an opening paired bracket and a closing
>>>      paired bracket characters within the same isolating run sequence,
>>>      such that the Bidi_Paired_Bracket property value of the former
>>>      character or its canonical equivalent equals the latter character
>>>      or its canonical equivalent, and provided that a closing bracket is
>>>      matched to the closest match candidate, disregarding any candidates
>>>      that either already have a closer match, or are enclosed in a
>>>      matched pair of other 2 bracket characters.
>>>
>>>
>> I think that this (or something like this) might work, but that we are
>> better off
>> splitting this into a definition and a rule as I have proposed in my
>> previous message.
> Why not have the above _and_ a rule?  The rule should be worded so as
> to help understanding the definition.  But IMO it is not a good idea
> to have a rule as an integral part of the definition, because the two
> serve different purposes.

Not everything needs to be in a single definition.

The specification, to uplevel the discussion at this point, is composed 
of definitions and rules.

What I am proposing is that the natural unit for definition is
the paired bracket as defined by the match in properties, in other
words ( with ) and not ( with ].

The part that picks out of the possible pairs in a span of text is
really better handled as a rule - it describes an action to be
performed.

We really have two concepts and an action here.

1) matching bracket characters (a pair, or if you want, a possible or 
putative pair)
2) specific bracket characters in a given piece of text that match 
(resolved pair)
3) the act of resolving pairs given a specific sequence of characters 
(defined by a rule)

So if you wanted to

BD 16 could be split into BD16a defining the (putative) pair based on 
matching
properties and 16b defining the term "resolved pair".

A rule, Rx, could then be specified to describe the resolution process, 
and relying
on the definition of the (putative) pair only. After Rx has been 
applied, all identified
pairs are 'resolved pairs', and the remainder of the algorithm can be 
stated using that
term.

That structure matches the rest of the specification.

>
> And I think we should also point out explicitly that the brackets
> match non-hierarchically, as many readers will expect that they are,
> and will be confused.
That's a good note, as we have seen that some people make that assumption
(and it's a natural one from a mathematical point).
>
>> In the rest of the bidi algorithm, rules are used to describe actions
>> taken on scanning text, and "resolving" bracket pairs is such a scan.
> Yes, but other definitions don't use rules as their integral parts.
> Why should this one be an exception?
>
>
I agree - I would not put what I call "Rx" inside any definitions. It 
goes into the rules
section.

Now, BD16b seems to depend on performing Rx, but not really. A 
"resolved" pair
is simply the result of any kind of resolution process. That the PBA 
uses Rx to do
the resolution isn't part of the definition of the term - Rx serves to 
identify
which pairs are resolved pairs, not that a resolved pair represents a choice
among possible pairings. The whole mess in UAX#9 is based on the fact 
that the
authors thought that they had to mix these two levels.

As you asked me to compare my statement with yours, I'm adding it here again
I do believe both would lead to the same implementation, but my preference
is to take the description of how to resolve into a rule. This would be 
the kind
of text I'd suggest to add to UAX#9.

//---------

BD16a  A bracket pair is a pair of an opening paired bracket and a closing
    paired bracket characters
    such that the Bidi_Paired_Bracket property value of the former
    character or its canonical equivalent equals the latter character or
    its canonical equivalent.

BD16b  A resolved bracket pair is a bracket pair that has been
    been selected from among possible bracket pairs in an isolating run
    sequence.

Note: for the PBA this selection is performed according to Rx (below).


Rx  For each isolated run sequence, bracket characters are selected
    into resolved bracket pairs as follows:
    Starting at the beginning of the run sequence, when the a closing
    bracket character is encountered, find the nearest preceding
    opening character that forms a bracket pair, but is not already
    part of a resolved bracket pair, and not ignored for bracket pair
    selection.
	If one exists, resolve the pair, and mark any enclosed
    opening brackets of any kind as not part of a bracket pair and
    ignored for further bracket pair selection. Otherwise, if no pair
    can be selected, mark the closing bracket as not part of a pair
    and ignored for further pair selection.
	

Note: the outcome of Rx is a list of resolved pairs and their
locations. Selected pairs can nest, but can't otherwise overlap.
The rule prefers the closest pair for matching as opposed
to attempting to select for the most hierarchical set of nested
pairs. (See examples).

------------

What I have called Rx here, would become N0a with the part of NO that is
the second bullet numbered N0b.

I would move the existing examples into the rules section, not leave them
in the definitions as they are today.

A./
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140422/fbe033eb/attachment.html>


More information about the Unicode mailing list