UCD in XML or in CSV? (was: Re: Unicode Digest, Vol 56, Issue 20)

Marius Spix via Unicode unicode at unicode.org
Sat Sep 1 02:12:12 CDT 2018


Hello Marcel,

YAML supports references, so you can refer to another character’s
properties.

Example:

repertoire: 
 char:
  -
   name_alias: 
    - [NUL,abbreviation]
    - ["NULL",control]
   cp: 0000
   na1: "NULL"
   props: &0000
     age: "1.1"
     na: ""
     JSN: ""
     gc: Cc
     ccc: 0
     dt: none
     dm: "#"
     nt: None
     nv: NaN
     bc: BN
     bpt: n
     bpb: "#"
     Bidi_M: N
     bmg: ""
     suc: "#"
     slc: "#"
     stc: "#"
     uc: "#"
     lc: "#"
     tc: "#"
     scf: "#"
     cf: "#"
     jt: U
     jg: No_Joining_Group
     ea: N
     lb: CM
     sc: Zyyy
     scx: Zyyy
     Dash: N
     WSpace: N
     Hyphen: N
     QMark: N
     Radical: N
     Ideo: N
     UIdeo: N
     IDSB: N
     IDST: N
     hst: NA
     DI: N
     ODI: N
     Alpha: N
     OAlpha: N
     Upper: N
     OUpper: N
     Lower: N
     OLower: N
     Math: N
     OMath: N
     Hex: N
     AHex: N
     NChar: N
     VS: N
     Bidi_C: N
     Join_C: N
     Gr_Base: N
     Gr_Ext: N
     OGr_Ext: N
     Gr_Link: N
     STerm: N
     Ext: N
     Term: N
     Dia: N
     Dep: N
     IDS: N
     OIDS: N
     XIDS: N
     IDC: N
     OIDC: N
     XIDC: N
     SD: N
     LOE: N
     Pat_WS: N
     Pat_Syn: N
     GCB: CN
     WB: XX
     SB: XX
     CE: N
     Comp_Ex: N
     NFC_QC: Y
     NFD_QC: Y
     NFKC_QC: Y
     NFKD_QC: Y
     XO_NFC: N
     XO_NFD: N
     XO_NFKC: N
     XO_NFKD: N
     FC_NFKC: "#"
     CI: N
     Cased: N
     CWCF: N
     CWCM: N
     CWKCF: N
     CWL: N
     CWT: N
     CWU: N
     NFKC_CF: "#"
     InSC: Other
     InPC: NA
     PCM: N
     blk: ASCII
     isc: ""

  -
   cp: 0001
   na1: "START OF HEADING"
   name_alias: 
    - [SOH,abbreviation]
    - [START OF HEADING,control]
   props: *0000





Regards,

Marius Spix


On Sat, 1 Sep 2018 08:00:02 +0200 (CEST)
schrieb Marcel Schneider wrote:

> On 31/08/18 08:25 Marius Spix via Unicode wrote:
> > 
> > A good compromise between human readability, machine processability
> > and filesize would be using YAML.
> > 
> > Unlike JSON, YAML supports comments, anchors and references,
> > multiple documents in a file and several other features.
> 
> Thanks for advice. Already I do use YAML syntaxic highlighting to
> display XCompose files, that use the colon as a separator, too.
> 
> Did you figure out how YAML would fit UCD data? It appears to heavily
> rely on line breaks, that may get lost as data turns around across
> environments. XML indentation is only a readability feature and
> irrelevant to content. The structure is independent of invisible
> characters and is stable if only graphics are not corrupted (while it
> may happen that they are). Linebreaks are odd in that they are
> inconsistent across OSes, because Unicode was denied the right to
> impose a unique standard in that matter. The result is mashed-up
> files, and I fear YAML might not hold out.
> 
> Like XML, YAML needs to repeat attribute names in every instance.
> That is precisely what CSV gets around of, at the expense of
> readability in plain text. Personally I could use YAML as I do use
> XML for lookup in the text editor, but I’m afraid that there is no
> advantage over CSV with respect to file size.
> 
> Regards,
> 
> Marcel
> > 
> > Regards,
> > 
> > Marius Spix
> > 
> > 
> > On Fri, 31 Aug 2018 06:58:37 +0200 (CEST) Marcel Schneider via
> > Unicode wrote:
> > 
> […]

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: Digitale Signatur von OpenPGP
URL: <http://unicode.org/pipermail/unicode/attachments/20180901/359c88e5/attachment.pgp>


More information about the Unicode mailing list