Unicode Regex Question

Philippe Verdy verdy_p at wanadoo.fr
Wed Dec 31 04:02:31 CST 2014


No the way it is written is really a litteral $ or a or b or a Greek
character.
And yes you used a notation embedding two character classes within another
character class to create an union. However $ (if it means an end of
string) cannot be part of that union and cannot even be part of a character
class as it is is then not a character itself but a boundary condition.

So yes youe extension is very confusive (in addition of bing incoherent and
not enough general to handle various boundary conditions)

TL;DR: it was another proposal making a BETTER use of the $ for something
else more productive and about how regexp can be embedded into a special
syntax allowing to define any custom boundary conditions including end of
strings, or other boundaries (and also not limited to properties defined
with properties in the UCD. It is a generalisation of the concept; which
will be used everywhere Uncode properties are not sufficient, and without
necessarily needing addition of new properties to handle specific locales
(for example these boundaries could be used in CLDR data instead of the
UCD, or in specific locales not supported by CLDR).


2014-12-31 10:27 GMT+01:00 Mark Davis ☕️ <mark at macchiato.com>:

>
> On Wed, Dec 31, 2014 at 1:40 AM, Philippe Verdy <verdy_p at wanadoo.fr>
> wrote:
>
>> Your example with "[[a$b][:script=greek:]]" does not make any sense if
>> that $ means an "end of string" and where it is embedded in a character
>> class itself in another embedding character-class.
>>
>
> ​That is incorrect. The way the transform works, any reference to a
> character position outside the bounds of a string matches $. So what I
> wrote matches the start or end of a string, or a, or b, or any greek-script
> character.
>
> However, if you look at the transform data files, you'll see real cases
> where $ is used, rather than the artificial one I used.
>
> As to the rest of your post, tl;dr.
>
> Mark <https://google.com/+MarkDavis>
>
> *— Il meglio è l’inimico del bene —*
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20141231/dc4bc1d7/attachment.html>


More information about the CLDR-Users mailing list