Feedback on CLDR JSON and encoding crucial data only in keys

Ben Hamilton beng at fb.com
Tue Nov 3 11:12:32 CST 2015


Filed http://unicode.org/cldr/trac/ticket/9061 and 
http://unicode.org/cldr/trac/ticket/9062.

Ben

> Ben Hamilton <mailto:beng at fb.com>
> November 3, 2015 at 8:36 AM
> Will do, thanks! I'll file two separate issues, since they're unrelated.
>
> Outlook 
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__aka.ms_Ox5hz3&d=CwMGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=gpfxwYl04l1SD8BzaPGd9w&m=Ib0aMGh4Jfxynsnzk_RceX-daYG0QLcAKfILVk2InXU&s=19o9Q4YnMSt3Q_j3iOh3KE2e0rv3tp-H-bnV79EueQk&e=> 
> より送信
>
>
>
>
> I suggest that you file this as a bug, and we can discuss in the meeting.
>
> For #1, the knottiest issue is
> "dayperiod": {
> "displayName": "AM/PM",
> "displayName-alt-variant": "am/pm"
> },
>
> We've wrestled with this. As I recall, we considered fleshing it out, 
> be something like:
>
> "dayperiod": {
>  "displayName": {
>     "plain": "AM/PM",
>     "variant": "am/pm"
>   },
> },
> But because 'alt' could potentially go on every leaf node that would 
> require adding a level (and "plain") for essentially every leaf node. 
> (And where alt can go on non-leaf nodes we'd have to work that in 
> also.) But we could explore some ideas.
>
> For #2, we could probably go to a simpler format for JSON. We could 
> look at space-delimited strings, maybe with a special sequence for 
> ranges, that would be easy to parse.
>
>
>
> Mark
> //
>
>
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__unicode.org_mailman_listinfo_cldr-2Dusers&d=CwICAg&c=5VD0RTtNlTh3ycd41b3MUw&r=gpfxwYl04l1SD8BzaPGd9w&m=Ib0aMGh4Jfxynsnzk_RceX-daYG0QLcAKfILVk2InXU&s=7YR3UwcbXoW_O4m0vgEYJwDX0607f0TKOttZNrPsc6w&e= 
>
> Ben Hamilton <mailto:beng at fb.com>
> November 2, 2015 at 10:25 AM
> Hi folks,
>
> I'm working on a server to allow arbitrary queries of slices of CLDR 
> data using the GraphQL protocol 
> (https://urldefense.proofpoint.com/v2/url?u=https-3A__facebook.github.io_graphql_&d=CwIFEA&c=5VD0RTtNlTh3ycd41b3MUw&r=gpfxwYl04l1SD8BzaPGd9w&m=rT51mQQLvsjt7sQglXXiKvR6CKDXtrULU44pV-coQyw&s=MPkKRjhTfCClsHWZu5SrFLR1vWCEgVy7N1UTDAQMV5k&e= 
> ).
>
> While working with the fully resolved CLDR JSON data, I noticed a few 
> design decisions that complicate building a structured object model 
> (required by GraphQL) to represent it:
>
> 1) Crucial LDML data is often encoded only in JSON keys, requiring 
> clients to parse keys to extract them
>
> For example, number formats (e.g. from main/root/numbers.json) require 
> parsing the keys to know the range of values to which the format 
> should be applied:
>
> "decimalFormat": {
> "1000-count-other": "0K",
> "10000-count-other": "00K",
> "100000-count-other": "000K",
> "1000000-count-other": "0M",
> (snip)
> }
>
> If I wanted to build an object model to represent this, I'd need to 
> know that the keys of this dictionary include three pieces of data 
> separated by "-" and write a parser which understands the meaning of 
> each section.
>
> This becomes much more complicated when dealing with dateFields.json, 
> which include keys with particularly complex encodings. From 
> main/root/dateFields.json:
>
> "sat-narrow": {
> "relative-type--1": "last Sa",
> "relative-type-0": "this Sa",
> "relative-type-1": "next Sa"
> },
> "dayperiod": {
> "displayName": "AM/PM",
> "displayName-alt-variant": "am/pm"
> },
>
> For this, I need to know that the "-" separators have multiple 
> meanings, and might be present (or not), and could act either as a 
> field separator, or as a negation operation in front of a number.
>
> I think we can keep the keys as-is as opaque unique identifiers, but 
> the values should be more structured. A map with separate fields for 
> the meanings of each item in the key (plus the original value) would 
> be great. The original XML format does this pretty well; I think we 
> can do that in the JSON without too much trouble.
>
> 2) Much of the LDML data is represented as serialized UTS #35 
> UnicodeSet objects, which requires deserializing them to understand 
> the underlying meaning
>
> For example, main/root/characters.json includes:
>
> "characters": {
> "exemplarCharacters": "[]",
> "auxiliary": "[]",
> "punctuation": "[\\\\- , ; \\\\: ! ? . ( ) \\\\[ \\\\] \\\\{ \\\\}]",
> (snip)
> }
>
> This means every program which wants to interact with this data needs 
> to include a UTS #35 UnicodeSet deserializer (or forward the raw 
> patterns on to the client with the assumption that it will include a 
> UnicodeSet deserializer).
>
> For many languages including JavaScript / ECMAScript, I don't think 
> there exists such a deserializer today—please let me know if I'm wrong!
>
> Ben
>
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__unicode.org_mailman_listinfo_cldr-2Dusers&d=CwIFEA&c=5VD0RTtNlTh3ycd41b3MUw&r=gpfxwYl04l1SD8BzaPGd9w&m=rT51mQQLvsjt7sQglXXiKvR6CKDXtrULU44pV-coQyw&s=i9nNfvh9I-e-O1a11xM7qMW6XX08kqnEl66m8UELYZA&e= 
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20151103/ed3e3d60/attachment-0001.html>


More information about the CLDR-Users mailing list