Name Property in Regular Expressions

Martin J. Dürst duerst at it.aoyama.ac.jp
Fri May 10 03:11:57 CDT 2024


Dear Unicoders,

I hope this more on-topic than the most recent discussions.

I have some questions regarding name properties in regular expressions, 
i.e. about
https://www.unicode.org/reports/tr18/#Name_Properties

1) When matching (see also 
https://www.unicode.org/reports/tr44/#Matching_Rules), it's clear that 
"zero-width space" is equivalent to "ZERO WIDTH SPACE" or 
"zerowidthspace", but should something like
"Ze-rowi-dThsp ace" (hyphens or spaces in the wrong places) also be 
equivalent?

2) TR 18 suggests wildcards such as \p{name=/ALIEN/}. This looks very 
convenient, but I have doubts that implementation was really considered 
when writing this down. In essence, this would have to run a regular 
expression over close to one megabyte of name data (+some additional 
processing for the algorithmically defined names), just to compile the 
regular expression. (It's possible to speed that up with some clever 
indexing, but this would only add additional complexity and space.)
So my question is whether anybody actually knows about some 
implementation of this name wildcard feature.

Regards,   Martin.


More information about the Unicode mailing list