Name Property in Regular Expressions
Asmus Freytag
asmusf at ix.netcom.com
Fri May 10 03:25:54 CDT 2024
On 5/10/2024 1:11 AM, Martin J. Dürst via Unicode wrote:
> Dear Unicoders,
>
> I hope this more on-topic than the most recent discussions.
>
> I have some questions regarding name properties in regular
> expressions, i.e. about
> https://www.unicode.org/reports/tr18/#Name_Properties
>
> 1) When matching (see also
> https://www.unicode.org/reports/tr44/#Matching_Rules), it's clear that
> "zero-width space" is equivalent to "ZERO WIDTH SPACE" or
> "zerowidthspace", but should something like
> "Ze-rowi-dThsp ace" (hyphens or spaces in the wrong places) also be
> equivalent?
YES.
>
> 2) TR 18 suggests wildcards such as \p{name=/ALIEN/}. This looks very
> convenient, but I have doubts that implementation was really
> considered when writing this down. In essence, this would have to run
> a regular expression over close to one megabyte of name data (+some
> additional processing for the algorithmically defined names), just to
> compile the regular expression. (It's possible to speed that up with
> some clever indexing, but this would only add additional complexity
> and space.)
> So my question is whether anybody actually knows about some
> implementation of this name wildcard feature.
>
> Regards, Martin.
More information about the Unicode
mailing list