Ways to detect that XXXX in JSON \uXXXX does not correspond to a Unicode character?
markus.icu at gmail.com
Thu May 7 14:59:54 CDT 2015
I assume that the JSON spec deliberately allows anything that Java and
assigned characters. Some code stores binary data (sequence of arbitrary
16-bit unsigned integers) in a "string", just because it is easy and fairly
efficient to transport.
You should "validate" *text* only when you are certain that it is indeed
text. And when you do validate, you might want to be narrower than
"assigned character"; for example, you might require Unicode identifiers or
XML NMTOKENS or whatever. Also remember that "assigned" and "identifier"
and such depend on the version of Unicode your library currently implements.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode