Unclear text in the UBA (UAX#9) of Unicode 6.3
nospam-abuse at ilyaz.org
Fri Apr 25 03:11:26 CDT 2014
On Wed, Apr 23, 2014 at 06:15:44PM -0700, Asmus Freytag wrote:
> On 4/23/2014 4:41 PM, Ilya Zakharevich wrote:
> >>> GREED) Given any close-delimiter marked as “non-matching”, its
> >>> pre-context does not contain any open-delimiter which could
> >>> match it.
> >>> Here pre-context of a position is a concatenation of substrings of the
> >>> initial string:
> >>> • Take the most deeply nested “matched pair” containing the position
> >>> (if none, the whole string);
> >>> • take the part of the string inside this pair AND before the position;
> >>> • remove all “matched” pairs completely contained insidde this substring
> >>> together with what they enclose.
> >>Can you explain why, if you make "pre-context" simply the part of the
> >>whole string that precedes the unmatched close-delimiter, the words
> >>"which could match it" are insufficient?
> >Aha, this means that my description is INCOMPLETE: you got a wrong
> >impression what “match” means! Everywhere, this word means exactly
> >the same as in the MATCH rule: that Unicode codepoints match following
> >Unicode properties.
> >This is non-recursive definition. All rules are independent.
> That explains why you repeat most of the other constraints in your
Frankly speaking, I do not see any such repetition.
> For a static definition, would it have been simpler to break the
> definition into
> two - say a "tentative parsing" (all conditions but greed) and
> "selected parsing",
> which the could be defined as the parsing that starts closest to the left.
I do not see how: to know whether a closing delimiter may be matched
or not, it is not enough to know “tentative” parsing of what preceeds
it; one must know the **actual** parsing. Eventually, you would end
with either a recursive definition, or a definition of a “process” of
Anyway, I’ve written my portion of definitions which combine
“tentative” stuff with “best choice” of tentative variants. One ends
with monsters like
(and, Eli, the fact that I wrote it does not imply that I must like it :-[ ).
In the case of Perl RExes, there is no alternative. IMO, if there IS
a way to define what a “standalone” GOOD THING is, it is __much__
better than the “best of many” way. Definiting it as “the best of
potentially good things” requires the reader to imagine first ALL the
potentially good things; only when this (otherwise not very useful)
universe has settled down in the reader’s mind they would be able to
pick up the best guy…
More information about the Unicode