Name Property in Regular Expressions

Giacomo Catenazzi cate at cateee.net
Fri May 10 03:54:02 CDT 2024


On 10.05.2024 10:11, Martin J. Dürst via Unicode wrote:

> 2) TR 18 suggests wildcards such as \p{name=/ALIEN/}. This looks very 
> convenient, but I have doubts that implementation was really considered 
> when writing this down. In essence, this would have to run a regular 
> expression over close to one megabyte of name data (+some additional 
> processing for the algorithmically defined names), just to compile the 
> regular expression. (It's possible to speed that up with some clever 
> indexing, but this would only add additional complexity and space.)
> So my question is whether anybody actually knows about some 
> implementation of this name wildcard feature.

You write *very convenient*. Could you give some example?

I think such extension go on the gray area between semantic and 
rendering. Unicode is working also on such area (text segmentation), but 
I'm not so sure regexp can handle it (with metadata and complexities).

On rendering site I do not see much problem on having all data, but on a 
server/database, I'm not sure it is so useful, where regexp may be used 
for search, security and tokenization. So my question: how do you find 
it useful (but for us unicode standard lovers)?

giacomo



More information about the Unicode mailing list