Ways to detect that XXXX in JSON \uXXXX does not correspond to a Unicode character?

Philippe Verdy verdy_p at wanadoo.fr
Fri May 8 06:48:38 CDT 2015


JSON came initially from Javascript, and it is used extensively with
Javascript. My tests with their JSON parser is that any string that is
valdi for Javascript is also valid in JSON (no exception raised, no
replaced characters, no deleted characters even if there are unpaired
surrogates or non-characters like '\uFFFF').
The RFC is deviating from the currently running implementations.


2015-05-08 13:04 GMT+02:00 Daniel Bünzli <daniel.buenzli at erratique.ch>:

> Le vendredi, 8 mai 2015 à 05:08, Philippe Verdy a écrit :
> > The RFC is jsut informative not normative,
>
> RFC 7159 is not informational, it is a proposed standard.
>
> > Try by yourself, you can perfectly send JSON text containing '\uFFFF'
> (non-character) or '\uF800' (unpaired surrogate) and I've not seen any JSON
> implementation complaining about one or the other,
> Well now you have (mine). The RFC is very clear that we are dealing with
> *text-based* data not *binary* data. Maybe programming languages that
> represent their Unicode strings as possibly invalid UTF-16 sequences will
> happily input this but as section 8.2 mentions that may not be the case
> everywhere, software receiving these values  "might return different values
> for the length of a string value or even suffer fatal runtime exceptions".
>
> Best,
>
> Daniel
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150508/ac89a6c5/attachment.html>


More information about the Unicode mailing list