Ways to detect that XXXX in JSON \uXXXX does not correspond to a Unicode character?
verdy_p at wanadoo.fr
Thu May 7 22:08:21 CDT 2015
The RFC is jsut informative not normative, and thez effective usage and
implementations just support JSON as plain 16-bit streams, even if the
transport syntax requires encoding it in plain-text (using some UTF, not
necessarily UTF-8 even if this is the default).
Try by yourself, you can perfectly send JSON text containing '\uFFFF'
(non-character) or '\uF800' (unpaired surrogate) and I've not seen any JSON
implementation complaining about one or the other, when receiving the JSON
replaced code units and no exception as well.
2015-05-08 3:22 GMT+02:00 Daniel Bünzli <daniel.buenzli at erratique.ch>:
> Le vendredi, 8 mai 2015 à 02:16, Philippe Verdy a écrit :
> > It would be more exact to say that JSON strings, just like strings in
> of 16-bit code units.
> I suggest you have a careful read at RFC 7159 as it specifically implies
> that this is not the model it supports (albeit using broken or let's say
> ambiguous/imprecise Unicode terminology).
> > Then the JSON processor will decode this text and will remap it to an
> internal UTF-16 encoding (for characters that are not escaped) and the
> "\uXXXX" will be decoded as plain 16-bit code units. The result will be a
> stream of 16-bit code units, which can then externally be outpout and
> encoded or stored in any convenient encoding that preserves this stream,
> EVEN if this is not valid UTF-16.
> I don't know where you get this from but you won't find any mention of
> this in the standard. We are dealing with text, Unicode scalar values, not
> encodings. At the risk of repeating myself, read section 8.2 of RFC 7159.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode