Ways to detect that XXXX in JSON \uXXXX does not correspond to a Unicode character?

Daniel Bünzli daniel.buenzli at erratique.ch
Fri May 8 07:32:51 CDT 2015


Le vendredi, 8 mai 2015 à 13:48, Philippe Verdy a écrit :
> JSON came initially from Javascript, and it is used extensively with Javascript.  

But not *only* for a long time now.
  
> The RFC is deviating from the currently running implementations.

Well did you test them all ? There's quite a big list here http://www.json.org. Taking a random one mentioned on that page leads me to http://golang.org/pkg/encoding/json/ in which they say that they replace invalid UTF-16 surrogate pairs by U+FFFD. This is really not very surprising since apparently go's strings as text are UTF-8 encoded so when you need to produce your results as UTF-8 then you don't have a lot of solutions... error and/or U+FFFD.   

In any case deviating or not, that's for good since it would be insane to impose JavaScript's string as a data structure for an interchange format that intents to be universal and *textual*.
  
Best,

Daniel



More information about the Unicode mailing list