Transform rule syntax clarifications

Martin J. Dürst via CLDR-Users cldr-users at unicode.org
Sun Nov 17 22:41:47 CST 2019


On 2019/11/17 11:37, Richard Wordingham via CLDR-Users wrote:
> On Sat, 16 Nov 2019 13:18:00 -0800
> Cameron Dutro via CLDR-Users <cldr-users at unicode.org> wrote:
> 
>> The other bits of syntax you've mentioned are from the Unicode Set
>> specification, which you can find in UTS #35
>> <https://unicode.org/reports/tr35/#Unicode_Sets>. Unicode Sets are
>> like regex character classes, but as you've noticed, there are a
>> couple of special operations they support that regexes don't.
>> Specifically, the "-" operator is the symmetric difference
>> <https://en.wikipedia.org/wiki/Symmetric_difference> between the two
>> operands (UTS 35 says "asymmetric difference," but I don't think
>> that's a thing - I can't find any definition of it online).
> 
> It very much is a thing!

Well, yes, except that it's usually just called "set difference" without 
an explicit adjective. (I'd strongly suggest that UTS 35 put the word 
'asymmetric' in parentheses.) Also, one wouldn't use the symbol '-' for 
symmetric difference.

Regards,   Martin.

> In this particular case,
> 
> $accent_minus = [[$accent]-[$iotasub$macron]];
> 
> is probably the same as the symmetric difference, because from
> the names i think everything in the second set is in the first set, but
> this doesn't always apply.  [abcd] - [abef] is [cd], not the symmetric
> difference [cdef].
> 
> Richard.



More information about the CLDR-Users mailing list