Generating U+FFFD when there's no content between ISO-2022-JP escape sequences

Shawn Steele via Unicode unicode at unicode.org
Mon Dec 10 14:12:50 CST 2018


IMO, trying to do security checks on an encoded string that will be decoded later is pretty much guaranteed to miss cases.  Particularly with ISO-2022-JP, which has a plethora of variations in how different software/libraries/OS's decode it and treat the invalid/edge cases.

I typically encourage security checks on encodings  to be done after the translation to Unicode has been done, but that only works if that is the Unicode stream itself is being checked.  Eg: a firewall may not decode it the same way as the end-recipient of the data.  Which I guess is the point of the encoding project, but... nobody can't guarantee that an endpoint conforms to any "standard", so from a security perspective, the recommended guidance is pretty much moot, secure applications have to consider non-conforming behavior of endpoints as well.

Providing a "best practice" or suggestions in a standard is nice, but in practice systems are going to have differing interpretations and behaviors. Applications can't "depend" on any consistency.  Even if all the standard documents agreed, there'd still be legacy implementations that people didn't update for whatever reason and other implementations would miss some of the subtleties (or less subtle differences) of the standards. 

IMO, all of the "state shifting" encodings should be treated with care by software.  There're a lot of ways to encode the same or similar strings in different ways, and you never know what kind of validation happened "on the other end".  It's pretty much a given that ISO-2022-JP, particularly edge cases, are going to be interpreted differently by different applications.  

-Shawn



More information about the Unicode mailing list