Unclear text in the UBA (UAX#9) of Unicode 6.3

Ilya Zakharevich nospam-abuse at ilyaz.org
Wed Apr 23 18:41:15 CDT 2014


On Wed, Apr 23, 2014 at 09:21:04AM -0700, Asmus Freytag wrote:
> >  a parsing is good if it satisfies all conditions below:
> >
> >    0) Some delimiters in the string are marked as “non-matching”; the rest
> >       is broken into disjoint “matched” pairs;
> >
> >    MATCH) A “matched” pair consists of an open-delimiter and matching close-
> >           delimiter (in this order in the string).
> >
> >    NEST) “Matched” pairs are properly nested (meaning that 2 pairs cannot be
> >          positioned as Open1 Open2 Close1 Close2 in the string order).
> >
> >    MINLEN) “Inside” a “matched” pair, every delimiter which could match elements
> >            of the pair but is marked as “non-matching” must nest inside
> >            some deeper-nested “matched” pair.
> >
> >(I hope that the meaning of the word “inside” in MINLEN is clear.)
> >
> >    GREED) Given any close-delimiter marked as “non-matching”, its
> >           pre-context does not contain any open-delimiter which could
> >           match it.
> >
> >      Here pre-context of a position is a concatenation of substrings of the
> >      initial string:
> >      • Take the most deeply nested “matched pair” containing the position
> >        (if none, the whole string);
> >      • take the part of the string inside this pair AND before the position;
> >      • remove all “matched” pairs completely contained insidde this substring
> >        together with what they enclose.
> 
> This is a very nice formal definition. I'm surprised that your "GREED"
> statement needs such a complex auxiliary concept (pre-context).
> 
> Can you explain why, if you make "pre-context" simply the part of the
> whole string that precedes the unmatched close-delimiter, the words
> "which could match it" are insufficient?

Aha, this means that my description is INCOMPLETE: you got a wrong
impression what “match” means!  Everywhere, this word means exactly
the same as in the MATCH rule: that Unicode codepoints match following
Unicode properties.

This is non-recursive definition.  All rules are independent.  Without
complicated notion of pre-context, matching [] in

  ( [ ) ]

would be an acceptable match.

Thanks for your corrections,
Ilya



More information about the Unicode mailing list