Transform rule syntax clarifications
Martin J. Dürst via CLDR-Users
cldr-users at unicode.org
Sun Nov 17 22:41:47 CST 2019
On 2019/11/17 11:37, Richard Wordingham via CLDR-Users wrote:
> On Sat, 16 Nov 2019 13:18:00 -0800
> Cameron Dutro via CLDR-Users <cldr-users at unicode.org> wrote:
>
>> The other bits of syntax you've mentioned are from the Unicode Set
>> specification, which you can find in UTS #35
>> <https://unicode.org/reports/tr35/#Unicode_Sets>. Unicode Sets are
>> like regex character classes, but as you've noticed, there are a
>> couple of special operations they support that regexes don't.
>> Specifically, the "-" operator is the symmetric difference
>> <https://en.wikipedia.org/wiki/Symmetric_difference> between the two
>> operands (UTS 35 says "asymmetric difference," but I don't think
>> that's a thing - I can't find any definition of it online).
>
> It very much is a thing!
Well, yes, except that it's usually just called "set difference" without
an explicit adjective. (I'd strongly suggest that UTS 35 put the word
'asymmetric' in parentheses.) Also, one wouldn't use the symbol '-' for
symmetric difference.
Regards, Martin.
> In this particular case,
>
> $accent_minus = [[$accent]-[$iotasub$macron]];
>
> is probably the same as the symmetric difference, because from
> the names i think everything in the second set is in the first set, but
> this doesn't always apply. [abcd] - [abef] is [cd], not the symmetric
> difference [cdef].
>
> Richard.
More information about the CLDR-Users
mailing list