Unclear text in the UBA (UAX#9) of Unicode 6.3
asmusf at ix.netcom.com
Tue Apr 22 13:41:39 CDT 2014
On 4/22/2014 10:11 AM, Eli Zaretskii wrote:
>> Date: Tue, 22 Apr 2014 09:52:43 -0700
>> From: Asmus Freytag <asmusf at ix.netcom.com>
>> CC: nospam-abuse at ilyaz.org, verdy_p at wanadoo.fr, ken at unicode.org,
>> jjc at jclark.com, unicode at unicode.org
>>> I agree, but let me try to say the same more concisely:
>>> A bracket pair is a pair of an opening paired bracket and a closing
>>> paired bracket characters within the same isolating run sequence,
>>> such that the Bidi_Paired_Bracket property value of the former
>>> character or its canonical equivalent equals the latter character
>>> or its canonical equivalent, and provided that a closing bracket is
>>> matched to the closest match candidate, disregarding any candidates
>>> that either already have a closer match, or are enclosed in a
>>> matched pair of other 2 bracket characters.
>> I think that this (or something like this) might work, but that we are
>> better off
>> splitting this into a definition and a rule as I have proposed in my
>> previous message.
> Why not have the above _and_ a rule? The rule should be worded so as
> to help understanding the definition. But IMO it is not a good idea
> to have a rule as an integral part of the definition, because the two
> serve different purposes.
Not everything needs to be in a single definition.
The specification, to uplevel the discussion at this point, is composed
of definitions and rules.
What I am proposing is that the natural unit for definition is
the paired bracket as defined by the match in properties, in other
words ( with ) and not ( with ].
The part that picks out of the possible pairs in a span of text is
really better handled as a rule - it describes an action to be
We really have two concepts and an action here.
1) matching bracket characters (a pair, or if you want, a possible or
2) specific bracket characters in a given piece of text that match
3) the act of resolving pairs given a specific sequence of characters
(defined by a rule)
So if you wanted to
BD 16 could be split into BD16a defining the (putative) pair based on
properties and 16b defining the term "resolved pair".
A rule, Rx, could then be specified to describe the resolution process,
on the definition of the (putative) pair only. After Rx has been
applied, all identified
pairs are 'resolved pairs', and the remainder of the algorithm can be
stated using that
That structure matches the rest of the specification.
> And I think we should also point out explicitly that the brackets
> match non-hierarchically, as many readers will expect that they are,
> and will be confused.
That's a good note, as we have seen that some people make that assumption
(and it's a natural one from a mathematical point).
>> In the rest of the bidi algorithm, rules are used to describe actions
>> taken on scanning text, and "resolving" bracket pairs is such a scan.
> Yes, but other definitions don't use rules as their integral parts.
> Why should this one be an exception?
I agree - I would not put what I call "Rx" inside any definitions. It
goes into the rules
Now, BD16b seems to depend on performing Rx, but not really. A
is simply the result of any kind of resolution process. That the PBA
uses Rx to do
the resolution isn't part of the definition of the term - Rx serves to
which pairs are resolved pairs, not that a resolved pair represents a choice
among possible pairings. The whole mess in UAX#9 is based on the fact
authors thought that they had to mix these two levels.
As you asked me to compare my statement with yours, I'm adding it here again
I do believe both would lead to the same implementation, but my preference
is to take the description of how to resolve into a rule. This would be
of text I'd suggest to add to UAX#9.
BD16a A bracket pair is a pair of an opening paired bracket and a closing
paired bracket characters
such that the Bidi_Paired_Bracket property value of the former
character or its canonical equivalent equals the latter character or
its canonical equivalent.
BD16b A resolved bracket pair is a bracket pair that has been
been selected from among possible bracket pairs in an isolating run
Note: for the PBA this selection is performed according to Rx (below).
Rx For each isolated run sequence, bracket characters are selected
into resolved bracket pairs as follows:
Starting at the beginning of the run sequence, when the a closing
bracket character is encountered, find the nearest preceding
opening character that forms a bracket pair, but is not already
part of a resolved bracket pair, and not ignored for bracket pair
If one exists, resolve the pair, and mark any enclosed
opening brackets of any kind as not part of a bracket pair and
ignored for further bracket pair selection. Otherwise, if no pair
can be selected, mark the closing bracket as not part of a pair
and ignored for further pair selection.
Note: the outcome of Rx is a list of resolved pairs and their
locations. Selected pairs can nest, but can't otherwise overlap.
The rule prefers the closest pair for matching as opposed
to attempting to select for the most hierarchical set of nested
pairs. (See examples).
What I have called Rx here, would become N0a with the part of NO that is
the second bullet numbered N0b.
I would move the existing examples into the rules section, not leave them
in the definitions as they are today.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode