Unclear text in the UBA (UAX#9) of Unicode 6.3

Ilya Zakharevich nospam-abuse at ilyaz.org
Fri Apr 25 03:11:26 CDT 2014

On Wed, Apr 23, 2014 at 06:15:44PM -0700, Asmus Freytag wrote:
> On 4/23/2014 4:41 PM, Ilya Zakharevich wrote:
> >>>    GREED) Given any close-delimiter marked as “non-matching”, its
> >>>           pre-context does not contain any open-delimiter which could
> >>>           match it.
> >>>
> >>>      Here pre-context of a position is a concatenation of substrings of the
> >>>      initial string:
> >>>      • Take the most deeply nested “matched pair” containing the position
> >>>        (if none, the whole string);
> >>>      • take the part of the string inside this pair AND before the position;
> >>>      • remove all “matched” pairs completely contained insidde this substring
> >>>        together with what they enclose.

> >>Can you explain why, if you make "pre-context" simply the part of the
> >>whole string that precedes the unmatched close-delimiter, the words
> >>"which could match it" are insufficient?
> >Aha, this means that my description is INCOMPLETE: you got a wrong
> >impression what “match” means!  Everywhere, this word means exactly
> >the same as in the MATCH rule: that Unicode codepoints match following
> >Unicode properties.

> >This is non-recursive definition.  All rules are independent.

> That explains why you repeat most of the other constraints in your
> pre-context.

Frankly speaking, I do not see any such repetition.

> For a static definition, would it have been simpler to break the
> definition into
> two - say a "tentative parsing" (all conditions but greed) and
> "selected parsing",
> which the could be defined as the parsing that starts closest to the left.

I do not see how: to know whether a closing delimiter may be matched
or not, it is not enough to know “tentative” parsing of what preceeds
it; one must know the **actual** parsing.  Eventually, you would end
with either a recursive definition, or a definition of a “process” of

Anyway, I’ve written my portion of definitions which combine
“tentative” stuff with “best choice” of tentative variants.  One ends
with monsters like
(and, Eli, the fact that I wrote it does not imply that I must like it :-[ ).

In the case of Perl RExes, there is no alternative.  IMO, if there IS
a way to define what a “standalone” GOOD THING is, it is __much__
better than the “best of many” way.  Definiting it as “the best of
potentially good things” requires the reader to imagine first ALL the
potentially good things; only when this (otherwise not very useful)
universe has settled down in the reader’s mind they would be able to
pick up the best guy…


More information about the Unicode mailing list