Unclear text in the UBA (UAX#9) of Unicode 6.3

Asmus Freytag asmusf at ix.netcom.com
Wed Apr 23 20:15:44 CDT 2014

On 4/23/2014 4:41 PM, Ilya Zakharevich wrote:
> On Wed, Apr 23, 2014 at 09:21:04AM -0700, Asmus Freytag wrote:
>>>   a parsing is good if it satisfies all conditions below:
>>>     0) Some delimiters in the string are marked as “non-matching”; the rest
>>>        is broken into disjoint “matched” pairs;
>>>     MATCH) A “matched” pair consists of an open-delimiter and matching close-
>>>            delimiter (in this order in the string).
>>>     NEST) “Matched” pairs are properly nested (meaning that 2 pairs cannot be
>>>           positioned as Open1 Open2 Close1 Close2 in the string order).
>>>     MINLEN) “Inside” a “matched” pair, every delimiter which could match elements
>>>             of the pair but is marked as “non-matching” must nest inside
>>>             some deeper-nested “matched” pair.
>>> (I hope that the meaning of the word “inside” in MINLEN is clear.)
>>>     GREED) Given any close-delimiter marked as “non-matching”, its
>>>            pre-context does not contain any open-delimiter which could
>>>            match it.
>>>       Here pre-context of a position is a concatenation of substrings of the
>>>       initial string:
>>>       • Take the most deeply nested “matched pair” containing the position
>>>         (if none, the whole string);
>>>       • take the part of the string inside this pair AND before the position;
>>>       • remove all “matched” pairs completely contained insidde this substring
>>>         together with what they enclose.
>> This is a very nice formal definition. I'm surprised that your "GREED"
>> statement needs such a complex auxiliary concept (pre-context).
>> Can you explain why, if you make "pre-context" simply the part of the
>> whole string that precedes the unmatched close-delimiter, the words
>> "which could match it" are insufficient?
> Aha, this means that my description is INCOMPLETE: you got a wrong
> impression what “match” means!  Everywhere, this word means exactly
> the same as in the MATCH rule: that Unicode codepoints match following
> Unicode properties.
> This is non-recursive definition.  All rules are independent.

That explains why you repeat most of the other constraints in your 

>   Without
> complicated notion of pre-context, matching [] in
>    ( [ ) ]
> would be an acceptable match.
> Thanks for your corrections,
> Ilya
For a static definition, would it have been simpler to break the 
definition into
two - say a "tentative parsing" (all conditions but greed) and "selected 
which the could be defined as the parsing that starts closest to the left.

(I don't have the time as I write this to work out whether that's the 
condition, as I am about to board a ride, but just as a trigger to 
thought what
a split definition might achieve).

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140423/31a8f19f/attachment.html>

More information about the Unicode mailing list