Name Property in Regular Expressions

Asmus Freytag asmusf at ix.netcom.com
Fri May 10 03:25:54 CDT 2024


On 5/10/2024 1:11 AM, Martin J. Dürst via Unicode wrote:
> Dear Unicoders,
>
> I hope this more on-topic than the most recent discussions.
>
> I have some questions regarding name properties in regular 
> expressions, i.e. about
> https://www.unicode.org/reports/tr18/#Name_Properties
>
> 1) When matching (see also 
> https://www.unicode.org/reports/tr44/#Matching_Rules), it's clear that 
> "zero-width space" is equivalent to "ZERO WIDTH SPACE" or 
> "zerowidthspace", but should something like
> "Ze-rowi-dThsp ace" (hyphens or spaces in the wrong places) also be 
> equivalent?
YES.
>
> 2) TR 18 suggests wildcards such as \p{name=/ALIEN/}. This looks very 
> convenient, but I have doubts that implementation was really 
> considered when writing this down. In essence, this would have to run 
> a regular expression over close to one megabyte of name data (+some 
> additional processing for the algorithmically defined names), just to 
> compile the regular expression. (It's possible to speed that up with 
> some clever indexing, but this would only add additional complexity 
> and space.)
> So my question is whether anybody actually knows about some 
> implementation of this name wildcard feature.
>
> Regards,   Martin.




More information about the Unicode mailing list