String Ranges in Unicode Sets

Mark Davis ☕️ mark at macchiato.com
Tue Sep 8 06:46:48 CDT 2015


On Tue, Sep 8, 2015 at 9:53 AM, Asmus Freytag (t) <asmus-inc at ix.netcom.com>
wrote:

> it is implied the String Range formulation is a compact form.
>
> Can you prove that it doesn't create any set of strings that can't be
> specified in other ways (other than full enumeration of the strings?).
>

I
​t is simply a compact string representation, and is defined semantically
by what it expands to.
​
​ Just like character ranges, [a-z], etc. Of course, the underlying
implementation *could* differ, but that doesn't affect the semantics.


> What about set operations on sets with string ranges?
>

​Again, the range notation is just a formatting issue. Anything you can do
with [{ax}-{bz}​] you can also do with [{ax}{ay}{az}{bx}{by}{bz}​], and
vice versa, since the former is defined to be equivalent to the latter.
These are just string representations of the same *logical* underlying
implementation.


> Can they be expressed (other than working them out and writing down the
> full enumeration of the resulting set)?
>

I'm not quite sure what you mean. That's like asking, "Can [a-z] be
expressed, ​other than by writing out the full enumeration [a b c d e ...
z]?". Well, yes. You could represent [a-z] in many ways:
[\p{ASCII}&\p{lu}], for example. Or [\u0061 \u0062 ...]. Or....

​But I'm probably misunderstanding what you are trying to say.​

Mark <https://google.com/+MarkDavis>

*— Il meglio è l’inimico del bene —*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150908/7ea27294/attachment.html>


More information about the Unicode mailing list