Why Work at Encoding Level?
Mark Davis ☕️
mark at macchiato.com
Tue Oct 20 20:23:17 CDT 2015
> there is never any excuse for software to create unpaired surrogates, or
any other sort of invalid code unit sequences
First off, it depends on when one is encountered. They are invalid in
UTF16, but are permitted in a Unicode 16-bit string.
But more fundamentally, there may not be "excuses" for such software, but
it happens anyway. Pretending it doesn't, makes for unhappy customers. For
example, you don't want to be throwing an exception when one is
encountered, when that could cause an app to fail.
So the point is to handle the situation as gracefully, consistently, and as
safely as possible. And 'safely' is key. Pretending that it doesn't exist
is logically equivalent to deletion, and can cause security problems. (see
On Mon, Oct 19, 2015 at 10:07 AM, Doug Ewell <doug at ewellic.org> wrote:
> This discussion was originally about how to handle unpaired surrogates,
> as if that were a normal use case.
> Regardless of what encoding model is used to handle characters under the
> hood, and regardless of how the Delete key should work with actual
> characters or clusters, there is never any excuse for software to create
> unpaired surrogates, or any other sort of invalid code unit sequences.
> That is like having an image editor that deletes every 128th byte from a
> JPEG file, and then worrying about how to display the file.
> Doug Ewell | http://ewellic.org | Thornton, CO
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode