Name Property in Regular Expressions
Martin J. Dürst
duerst at it.aoyama.ac.jp
Fri May 10 03:11:57 CDT 2024
Dear Unicoders,
I hope this more on-topic than the most recent discussions.
I have some questions regarding name properties in regular expressions,
i.e. about
https://www.unicode.org/reports/tr18/#Name_Properties
1) When matching (see also
https://www.unicode.org/reports/tr44/#Matching_Rules), it's clear that
"zero-width space" is equivalent to "ZERO WIDTH SPACE" or
"zerowidthspace", but should something like
"Ze-rowi-dThsp ace" (hyphens or spaces in the wrong places) also be
equivalent?
2) TR 18 suggests wildcards such as \p{name=/ALIEN/}. This looks very
convenient, but I have doubts that implementation was really considered
when writing this down. In essence, this would have to run a regular
expression over close to one megabyte of name data (+some additional
processing for the algorithmically defined names), just to compile the
regular expression. (It's possible to speed that up with some clever
indexing, but this would only add additional complexity and space.)
So my question is whether anybody actually knows about some
implementation of this name wildcard feature.
Regards, Martin.
More information about the Unicode
mailing list