UAX44: loose matching of symbolic values and the `is` prefix

Nova Patch patch.nova at gmail.com
Mon Jun 6 16:39:05 CDT 2016


Den mandag 6. juni 2016 skrev Doug Ewell følgende:
>
> Mathias Bynens wrote:
>
> > The `is` prefix doesn’t provide any functionality that would otherwise
> > be unavailable. It doesn’t add any value, yet causes incompatibility,
> > author confusion, and it increases implementation complexity.
>
> I don't see any evidence that it adds no value. Support for existing
> implementations is value.

Markus has now confirmed that ICU doesn’t support this syntax and I can
confirm that even Perl, which probably supports the most different ways to
write the same regex, doesn’t support any form of the `is` prefix for
property values when the property name is provided.

$ perl -Mutf8 -E 'say "π" =~ /\p{Script=Greek}/'
1
$ perl -Mutf8 -E 'say "π" =~ /\p{Script=IsGreek}/'
Can't find Unicode property definition "Script=IsGreek" at -e line 1.
$ perl -Mutf8 -E 'say "π" =~ /\p{Script=Is_Greek}/'
Can't find Unicode property definition "Script=Is_Greek" at -e line 1.

Although Perl does optionally support the `is` prefix for property names
and standalone property values:

$ perl -Mutf8 -E 'say "π" =~ /\p{IsScript=Greek}/'
1
$ perl -Mutf8 -E 'say "π" =~ /\p{IsGreek}/'
1

However, this syntax is notoriously inconstant among different regex
engines. Perl’s specific rules are documented in *perluniprops* (
http://perldoc.perl.org/perluniprops.html) as \p{Is_*} (case- and
underscore-insensitive) being a synonym for \p{*} which explains the above
functionality. Based on my past research for *Unicode Regular Expression
Engines* at IUC38, I suspect that there might not be any regex engine that
actually supports syntax like Script=IsGreek as described in UAX44-LM3! If
anybody knows otherwise, I’d love to hear about it.

Nova
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160606/e86921b7/attachment.html>


More information about the Unicode mailing list