UAX44: loose matching of symbolic values and the `is` prefix
kenwhistler at att.net
Mon Jun 6 10:04:36 CDT 2016
On 6/6/2016 12:58 AM, Mathias Bynens wrote:
> Backwards compatibility seems to be the only good reason to continue supporting the `is` prefix*for existing implementations*, such as the one in Perl. But why is it still a requirement for new engines to support it as part of UAX44-LM3?
> I’d like to propose changing UAX44-LM3 to make supporting the `is` prefix optional for new implementations.
I think the target of concern here is wrong. UAX #44 doesn't *require*
any regex engine to include this "is prefix" handling. What UAX #44 does
is recommend that all property and property value aliases be correctly
recognized, and then specifies a clear statement (in UAX44-LM3) of the
loose matching rule for recognizing the various forms of those aliases
that could be considered equivalent. I don't think messing with that
rule statement (which has been in place since 2010) would be helpful.
The target instead should be in UTS #18, which happily, has a proposed
update available for comment right now:
The relevant point is:
That is the conformance part that requires that conformant Unicode regex
implementations "must follow the Matching rules from [UAX44]".
If you are seeking indulgences for new engine implementations, that
seems like the correct point to be adding clarifications and exceptions.
Note that the following text in that section already includes wording
about exceptions and compatibility issues. There is also a following
section specifically about regex for the Script and Script Extensions
properties that seems like it would be the appropriate place to talk
about the Greek/IsGreek issue as pertains to regex support.
I would suggest you make specific suggestions about the text of UTS #18
as part of the ongoing public review for the proposed update of that
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode