Unclear text in the UBA (UAX#9) of Unicode 6.3
nospam-abuse at ilyaz.org
Wed Apr 23 18:41:15 CDT 2014
On Wed, Apr 23, 2014 at 09:21:04AM -0700, Asmus Freytag wrote:
> > a parsing is good if it satisfies all conditions below:
> > 0) Some delimiters in the string are marked as “non-matching”; the rest
> > is broken into disjoint “matched” pairs;
> > MATCH) A “matched” pair consists of an open-delimiter and matching close-
> > delimiter (in this order in the string).
> > NEST) “Matched” pairs are properly nested (meaning that 2 pairs cannot be
> > positioned as Open1 Open2 Close1 Close2 in the string order).
> > MINLEN) “Inside” a “matched” pair, every delimiter which could match elements
> > of the pair but is marked as “non-matching” must nest inside
> > some deeper-nested “matched” pair.
> >(I hope that the meaning of the word “inside” in MINLEN is clear.)
> > GREED) Given any close-delimiter marked as “non-matching”, its
> > pre-context does not contain any open-delimiter which could
> > match it.
> > Here pre-context of a position is a concatenation of substrings of the
> > initial string:
> > • Take the most deeply nested “matched pair” containing the position
> > (if none, the whole string);
> > • take the part of the string inside this pair AND before the position;
> > • remove all “matched” pairs completely contained insidde this substring
> > together with what they enclose.
> This is a very nice formal definition. I'm surprised that your "GREED"
> statement needs such a complex auxiliary concept (pre-context).
> Can you explain why, if you make "pre-context" simply the part of the
> whole string that precedes the unmatched close-delimiter, the words
> "which could match it" are insufficient?
Aha, this means that my description is INCOMPLETE: you got a wrong
impression what “match” means! Everywhere, this word means exactly
the same as in the MATCH rule: that Unicode codepoints match following
This is non-recursive definition. All rules are independent. Without
complicated notion of pre-context, matching  in
( [ ) ]
would be an acceptable match.
Thanks for your corrections,
More information about the Unicode