Transform rule syntax clarifications

Richard Wordingham via CLDR-Users cldr-users at unicode.org
Sat Nov 16 20:37:24 CST 2019


On Sat, 16 Nov 2019 13:18:00 -0800
Cameron Dutro via CLDR-Users <cldr-users at unicode.org> wrote:

> The other bits of syntax you've mentioned are from the Unicode Set
> specification, which you can find in UTS #35
> <https://unicode.org/reports/tr35/#Unicode_Sets>. Unicode Sets are
> like regex character classes, but as you've noticed, there are a
> couple of special operations they support that regexes don't.
> Specifically, the "-" operator is the symmetric difference
> <https://en.wikipedia.org/wiki/Symmetric_difference> between the two
> operands (UTS 35 says "asymmetric difference," but I don't think
> that's a thing - I can't find any definition of it online).

It very much is a thing!  In this particular case,

$accent_minus = [[$accent]-[$iotasub$macron]];

is probably the same as the symmetric difference, because from
the names i think everything in the second set is in the first set, but
this doesn't always apply.  [abcd] - [abef] is [cd], not the symmetric
difference [cdef].

Richard.


More information about the CLDR-Users mailing list