Unicode Sets in 'Unicode Regular Expressions'

Phillips, Addison addison at lab126.com
Tue May 27 17:36:04 CDT 2014


A "Unicode set" in this context means "a set of code points". This is discussed in section 1.2:

--
This is done by providing syntax for sets of characters based on the Unicode character properties, and allowing them to be mixed with lists and ranges of individual code points.
--

More generally, there is no term "Unicode set" defined, although is it referred to in places such as RL1.3 as a shorthand. It merely means "the set of all code points selected" (by whatever selection, subtraction, intersection, or differencing has been applied beginning from the Universal Character Set as a whole). Or at least this is how I have already read it.

Addison

> -----Original Message-----
> From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Richard
> Wordingham
> Sent: Tuesday, May 27, 2014 3:18 PM
> To: unicode at unicode.org
> Subject: Unicode Sets in 'Unicode Regular Expressions'
> 
> UTS#18 'Unicode Regular Expressions' Version 17 Requirement RL1.3
> 'Subtraction and Intersection' talks of Unicode sets.  What is the relevant
> definition of a 'Unicode set'? Is it a finite set of non-empty strings?  Other
> possibilities that occur to me, depending on context, include sets of codepoints
> and sets of indecomposable codepoints.
> 
> Richard.
> _______________________________________________
> Unicode mailing list
> Unicode at unicode.org
> http://unicode.org/mailman/listinfo/unicode



More information about the Unicode mailing list