Ways to detect that XXXX in JSON \uXXXX does not correspond to a Unicode character?

Markus Scherer markus.icu at gmail.com
Thu May 7 14:59:54 CDT 2015

I assume that the JSON spec deliberately allows anything that Java and
JavaScript allow. In particular, there is no requirement for a Java String
or JavaScript string to contain "text", or well-formed UTF-16, or only
assigned characters. Some code stores binary data (sequence of arbitrary
16-bit unsigned integers) in a "string", just because it is easy and fairly
efficient to transport.

You should "validate" *text* only when you are certain that it is indeed
text. And when you do validate, you might want to be narrower than
"assigned character"; for example, you might require Unicode identifiers or
XML NMTOKENS or whatever. Also remember that "assigned" and "identifier"
and such depend on the version of Unicode your library currently implements.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150507/64062c47/attachment.html>

More information about the Unicode mailing list