UAX44: loose matching of symbolic values and the `is` prefix

Ken Whistler kenwhistler at att.net
Mon Jun 6 10:04:36 CDT 2016


On 6/6/2016 12:58 AM, Mathias Bynens wrote:
> Backwards compatibility seems to be the only good reason to continue supporting the `is` prefix*for existing implementations*, such as the one in Perl. But why is it still a requirement for new engines to support it as part of UAX44-LM3?
>
> I’d like to propose changing UAX44-LM3 to make supporting the `is` prefix optional for new implementations.
>

I think the target of concern here is wrong. UAX #44 doesn't *require* 
any regex engine to include this "is prefix" handling. What UAX #44 does 
is recommend that all property and property value aliases be correctly 
recognized, and then specifies a clear statement (in UAX44-LM3) of the 
loose matching rule for recognizing the various forms of those aliases 
that could be considered equivalent. I don't think messing with that 
rule statement (which has been in place since 2010) would be helpful.

The target instead should be in UTS #18, which happily, has a proposed 
update available for comment right now:

http://www.unicode.org/review/pri325/

The relevant point is:

http://www.unicode.org/reports/tr18/tr18-18.html#RL1.2

That is the conformance part that requires that conformant Unicode regex 
implementations "must follow the Matching rules from [UAX44]".

If you are seeking indulgences for new engine implementations, that 
seems like the correct point to be adding clarifications and exceptions. 
Note that the following text in that section already includes wording 
about exceptions and compatibility issues. There is also a following 
section specifically about regex for the Script and Script Extensions 
properties that seems like it would be the appropriate place to talk 
about the Greek/IsGreek issue as pertains to regex support.

I would suggest you make specific suggestions about the text of UTS #18 
as part of the ongoing public review for the proposed update of that 
specification.

--Ken

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160606/5e35f117/attachment.html>


More information about the Unicode mailing list