Unicode Regex Question

Mark Davis ☕️ mark at macchiato.com
Tue Dec 30 13:40:36 CST 2014


$ has a special meaning in the transforms; it means the end of string
(either end). Unlike normal regex, however, it can occur in character
classes, eg [[a$b][:script=greek:]]


Mark <https://google.com/+MarkDavis>

*— Il meglio è l’inimico del bene —*

On Tue, Dec 30, 2014 at 8:21 PM, Cameron Dutro <cameron at lumoslabs.com>
wrote:

> Hey cldr-users,
>
> I'm looking at this entry
> <http://unicode.org/cldr/trac/browser/trunk/common/transforms/Any-Publishing.xml#L21>
> in CLDR transforms. I'm curious why that "$" character is inside the
> character class. Here's the line reproduced:
>
> <tRule>$makeRight = [[:Z:][:Ps:][:Pi:]$] ;</tRule>
>
> I see an outer character class that contains three internal unicode
> character sets and a literal dollar sign. Usually in regular expressions,
> the dollar sign is used to match the end of the string. When it's included
> in a character class however, it should be interpreted as a literal
> character.
>
> Was including the dollar sign in the character class intentional? Should
> it be treated as an end-of-string anchor or a literal string?
>
> -Cameron
>
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20141230/6fcd5cd1/attachment.html>


More information about the CLDR-Users mailing list