UAX44: loose matching of symbolic values and the `is` prefix

Nova Patch patch.nova at
Mon Jun 6 16:39:05 CDT 2016

Den mandag 6. juni 2016 skrev Doug Ewell følgende:
> Mathias Bynens wrote:
> > The `is` prefix doesn’t provide any functionality that would otherwise
> > be unavailable. It doesn’t add any value, yet causes incompatibility,
> > author confusion, and it increases implementation complexity.
> I don't see any evidence that it adds no value. Support for existing
> implementations is value.

Markus has now confirmed that ICU doesn’t support this syntax and I can
confirm that even Perl, which probably supports the most different ways to
write the same regex, doesn’t support any form of the `is` prefix for
property values when the property name is provided.

$ perl -Mutf8 -E 'say "π" =~ /\p{Script=Greek}/'
$ perl -Mutf8 -E 'say "π" =~ /\p{Script=IsGreek}/'
Can't find Unicode property definition "Script=IsGreek" at -e line 1.
$ perl -Mutf8 -E 'say "π" =~ /\p{Script=Is_Greek}/'
Can't find Unicode property definition "Script=Is_Greek" at -e line 1.

Although Perl does optionally support the `is` prefix for property names
and standalone property values:

$ perl -Mutf8 -E 'say "π" =~ /\p{IsScript=Greek}/'
$ perl -Mutf8 -E 'say "π" =~ /\p{IsGreek}/'

However, this syntax is notoriously inconstant among different regex
engines. Perl’s specific rules are documented in *perluniprops* ( as \p{Is_*} (case- and
underscore-insensitive) being a synonym for \p{*} which explains the above
functionality. Based on my past research for *Unicode Regular Expression
Engines* at IUC38, I suspect that there might not be any regex engine that
actually supports syntax like Script=IsGreek as described in UAX44-LM3! If
anybody knows otherwise, I’d love to hear about it.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Unicode mailing list