Transform Rule Syntax

Cameron Dutro cameron at lumoslabs.com
Thu Dec 17 13:19:18 CST 2015


Ah wonderful, thanks Philippe. That's something about regular expressions I
didn't know, but I was able to verify in several programming languages.
Happy holidays!

-Cameron

On Wed, Dec 16, 2015 at 4:54 PM, Philippe Verdy <verdy_p at wanadoo.fr> wrote:

> When a dash-hyphen "-" appears as the first character within an inclusive
> (or negative) character class, just after "[" (or after "[^" in a negative
> class), it does not denote a range separator, but itself literally as being
> part of the inclusive character class (or being excludedfrom the negative
> class).
> This is how most regexp engines treat it, and you don't need to escape it
> (with a "\").
>
> So "[-\ ]" is the character class containing only the dash-hyphen and the
> space (which needs to be escaped in CLDR rules because whitespaces are
> relaxed, as you noted), and it has NO range.
>
> <https://www.avast.com/?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> Cet
> e-mail a été envoyé depuis un ordinateur protégé par Avast.
> www.avast.com
> <https://www.avast.com/?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
> <#151ad6eae99ea346_DDB4FAA8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
> 2015-12-17 1:25 GMT+01:00 Cameron Dutro <cameron at lumoslabs.com>:
>
>> Hey cldr-users,
>>
>> I'm working with the CLDR transform rules and finding myself flummoxed.
>> Specifically I'm looking at this rule
>> <http://unicode.org/cldr/trac/browser/tags/release-28-d05/common/transforms/es-es_FONIPA.xml#L138>
>> in the es-es_FONIPA transform rule set. In this rule, we see what appears
>> to be a Unicode set or character class from a regular expression: [-\ ]
>> Either way, this does not appear to be valid syntax. Hyphens are used in
>> character classes to denote ranges of characters, for example [a-z].
>> Literal hyphens must be escaped. The hyphen in question is neither part of
>> a range nor escaped. Why is this? Finally, it appears the character class
>> contains an escaped space character. Space characters are not required to
>> be escaped in character classes.
>>
>> My suspicion is that this syntax is to be treated in a special way since
>> it is used in the context of transformation rules. Please let me know if
>> this is the case. I have been unable to find any documentation regarding
>> the special treatment of hyphens in UTS #35 or other documents.
>>
>> Thanks!
>>
>> -Cameron
>>
>> _______________________________________________
>> CLDR-Users mailing list
>> CLDR-Users at unicode.org
>> http://unicode.org/mailman/listinfo/cldr-users
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20151217/0c864349/attachment-0001.html>


More information about the CLDR-Users mailing list