Unclear text in the UBA (UAX#9) of Unicode 6.3

Ilya Zakharevich nospam-abuse at ilyaz.org
Wed Apr 23 02:35:02 CDT 2014

On Tue, Apr 22, 2014 at 09:06:27AM -0700, Asmus Freytag wrote:
> if you read UAX#9, the way the algorithm works is by pushing openers
> on a stack, then, on finding the first closer, going down the stack
> and attempting to locate a match, then, on finding a match,
> discarding any enclosed openers, on not finding a match, discarding
> the closer.

I think I LOVE this definition.  Simple, beautiful, and IMO following
people’s expectations very closely.

Here is what “theoretizing” gives:

 a parsing is good if it satisfies all conditions below:

   0) Some delimiters in the string are marked as “non-matching”; the rest
      is broken into disjoint “matched” pairs;

   MATCH) A “matched” pair consists of an open-delimiter and matching close-
          delimiter (in this order in the string).

   NEST) “Matched” pairs are properly nested (meaning that 2 pairs cannot be
         positioned as Open1 Open2 Close1 Close2 in the string order).

   MINLEN) “Inside” a “matched” pair, every delimiter which could match elements
           of the pair but is marked as “non-matching” must nest inside
           some deeper-nested “matched” pair.

(I hope that the meaning of the word “inside” in MINLEN is clear.)

   GREED) Given any close-delimiter marked as “non-matching”, its
          pre-context does not contain any open-delimiter which could
          match it.

     Here pre-context of a position is a concatenation of substrings of the
     initial string:
     • Take the most deeply nested “matched pair” containing the position
       (if none, the whole string);
     • take the part of the string inside this pair AND before the position;
     • remove all “matched” pairs completely contained insidde this substring
       together with what they enclose.


P.S.  Judging by another message of yours, for you “theoretizing” is a
      4-letter word…  Oh well…

More information about the Unicode mailing list