UCD in XML or in CSV?

Richard Wordingham via Unicode unicode at unicode.org
Sat Sep 1 06:35:32 CDT 2018

On Fri, 31 Aug 2018 10:36:45 +0200
Manuel Strehl via Unicode <unicode at unicode.org> wrote:

> For me it's currently much easier to have all the data in a single
> place, e.g. a large XML file, than spread over a multitude of files
> _with different ad-hoc syntaxes_.
> The situation would possibly be different, though, if the UCD data
> would be split in several files of the same format. (Be it JSON, CSV,
> YAML, XML, TOML, whatever. Just be consistent.)

Most properties are stored in pretty much the same format in the UCD
files. UnicodeData.txt is the major exception; it seems to date from
when the set of properties was expected to be stable.

The big exception is set-valued properties.  PropList.txt can be viewed
as having an odd syntax for storing the set of miscellaneous Boolean
properties for which the codepoint has the value of 'true'.


More information about the Unicode mailing list