Parsers for the UnicodeSet notation?

Steven R. Loomis srl at
Wed Jul 23 18:18:20 CDT 2014

On 07/23/2014 03:28 PM, Roozbeh Pournader wrote:
> On Wed, Jul 23, 2014 at 3:23 PM, Eric Muller <emuller at
> <mailto:emuller at>> wrote:
>     I would like to work with the exemplarCharacters data in the CLDR.
>     That uses the UnicodeSet notation. Is there somewhere a parser for
>     that notation, that would return me just the list of characters in
>     the set? 
> Note that it's a set of strings, not characters.
>     I suspect that the exemplarCharacters use a restricted form of the
>     UnicodeSet notation (e.g. do not use property values). Is that
>     correct, and if so, what's the subset?
> I have an Apache-licensed parser in Python here:
Nice, you should get those CLDR folks to add a link!  I'm cross posting
this to cldr-users, which may be more appropriate.

 Eric, to answer your second question, the TR35 spec does not say that
exemplars are a restricted set, as per
- in practice, a restricted set is used, ranges are expanded. But
there's no guarantee of this by the spec.



IBMer but all opinions are mine. // fingerprint @

More information about the Unicode mailing list