Minifying CLDR sources (also: Re: Hard-to-use "annotations" files in LDML)

Marcel Schneider via CLDR-Users cldr-users at unicode.org
Sat Dec 1 07:59:58 CST 2018


VS Code is a great text editor. Thanks for sharing the hint.
My issue is just that while taking into account all my key remappings, it does not so for BKSP,
so the backspace key did Ctrl+Backspace all at once. Just fixed it by editing keybindings.json.

> Would you mind adding these comments to a copy of the following two files:

We may identify the emoji using their short names already present in the files next to the keywords.

I now understand that leaving out the code points is a way of minifying the files.
Eg the English flag element would be in annotationsDerived/fr.xml:

<annotation cp="1F3F4 E0067 E0062 E0065 E006E E0067 E007F" char="��������������" type="tts">drapeau : Angleterre</annotation>

But indeed for survey we don’t need that information. Sorry for my request.

Given that minifying the files is an interesting issue, one might wish to go even a step further by collapsing
the element of the keywords and the element of the short name.
Taking again the first emoji (modifier) in annotations/fr.xml:

Now:
<annotation cp="��">peau | peau claire</annotation>
<annotation cp="��" type="tts">peau claire</annotation>

After collapsing:
<annotation cp="��" sna="peau claire">peau | peau claire</annotation>

That would reduce these files to almost half their actual size without any loss of data, given
extracting the short name from an argument value rather than from an element content is only a
matter of processing XML/LDML.

Best regards,
Marcel


More information about the CLDR-Users mailing list