Parsers for the UnicodeSet notation?
    Eric Muller 
    emuller at adobe.com
       
    Wed Jul 23 17:23:46 CDT 2014
    
    
  
I would like to work with the exemplarCharacters data in the CLDR. That 
uses the UnicodeSet notation. Is there somewhere a parser for that 
notation, that would return me just the list of characters in the set? 
Something a bit like the UnicodeSet utility at 
<http://unicode.org/cldr/utility/list-unicodeset.jsp>, but for use in 
apps/shell.
I suspect that the exemplarCharacters use a restricted form of the 
UnicodeSet notation (e.g. do not use property values). Is that correct, 
and if so, what's the subset?
Incidentally, I copy/pasted the punctuation exemplar characters for 
he.xml into the utility, and it reported that the set contains 8,130 
code points, including the ascii letters. Somehow, that seems incorrect. 
What did I do wrong?
Thanks,
Eric.
    
    
More information about the Unicode
mailing list