Unicode Regex Question

Cameron Dutro cameron at lumoslabs.com
Tue Dec 30 17:26:00 CST 2014


Also, would it be fair to say simply removing the outer set of square
brackets and treating the entire thing as a regex is correct? It doesn't
make sense to me to have these transform rules be "almost" regexes except
for this one "$" exception, especially given "$"'s special significance in
regexes.

-Cameron

On Tue, Dec 30, 2014 at 3:22 PM, Cameron Dutro <cameron at lumoslabs.com>
wrote:

> Thanks Mark. Is that documented anywhere?
>
> -Cameron
>
> On Tue, Dec 30, 2014 at 11:40 AM, Mark Davis [image: ☕]️ <
> mark at macchiato.com> wrote:
>
>> $ has a special meaning in the transforms; it means the end of string
>> (either end). Unlike normal regex, however, it can occur in character
>> classes, eg [[a$b][:script=greek:]]
>>
>>
>> Mark <https://google.com/+MarkDavis>
>>
>> *— Il meglio è l’inimico del bene —*
>>
>> On Tue, Dec 30, 2014 at 8:21 PM, Cameron Dutro <cameron at lumoslabs.com>
>> wrote:
>>
>>> Hey cldr-users,
>>>
>>> I'm looking at this entry
>>> <http://unicode.org/cldr/trac/browser/trunk/common/transforms/Any-Publishing.xml#L21>
>>> in CLDR transforms. I'm curious why that "$" character is inside the
>>> character class. Here's the line reproduced:
>>>
>>> <tRule>$makeRight = [[:Z:][:Ps:][:Pi:]$] ;</tRule>
>>>
>>> I see an outer character class that contains three internal unicode
>>> character sets and a literal dollar sign. Usually in regular expressions,
>>> the dollar sign is used to match the end of the string. When it's included
>>> in a character class however, it should be interpreted as a literal
>>> character.
>>>
>>> Was including the dollar sign in the character class intentional? Should
>>> it be treated as an end-of-string anchor or a literal string?
>>>
>>> -Cameron
>>>
>>> _______________________________________________
>>> CLDR-Users mailing list
>>> CLDR-Users at unicode.org
>>> http://unicode.org/mailman/listinfo/cldr-users
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20141230/b837f921/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: emoji_u2615.png
Type: image/png
Size: 1890 bytes
Desc: not available
URL: <http://unicode.org/pipermail/cldr-users/attachments/20141230/b837f921/attachment.png>


More information about the CLDR-Users mailing list