UCD in XML or in CSV? (was: Re: Unicode Digest, Vol 56, Issue 20)

Marcel Schneider via Unicode unicode at unicode.org
Sat Sep 1 01:00:02 CDT 2018

On 31/08/18 08:25 Marius Spix via Unicode wrote:
> A good compromise between human readability, machine processability and
> filesize would be using YAML.
> Unlike JSON, YAML supports comments, anchors and references, multiple
> documents in a file and several other features.

Thanks for advice. Already I do use YAML syntaxic highlighting to display 
XCompose files, that use the colon as a separator, too.

Did you figure out how YAML would fit UCD data? It appears to heavily rely
on line breaks, that may get lost as data turns around across environments.
XML indentation is only a readability feature and irrelevant to content. The 
structure is independent of invisible characters and is stable if only graphics
are not corrupted (while it may happen that they are). Linebreaks are odd in
that they are inconsistent across OSes, because Unicode was denied the 
right to impose a unique standard in that matter. The result is mashed-up 
files, and I fear YAML might not hold out.

Like XML, YAML needs to repeat attribute names in every instance. That 
is precisely what CSV gets around of, at the expense of readability in 
plain text. Personally I could use YAML as I do use XML for lookup in
the text editor, but I’m afraid that there is no advantage over CSV with
respect to file size.


> Regards,
> Marius Spix
> On Fri, 31 Aug 2018 06:58:37 +0200 (CEST) Marcel Schneider via Unicode
> wrote:

More information about the Unicode mailing list