Best practices for replacing UTF-8 overlongs

Mon Dec 19 23:31:43 CST 2016

Yes, I just don't see how the # of emitted replacement characters changes the flowchart on what to do when it's bad :)

-----Original Message-----
From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Martin J. Dürst
Sent: Monday, December 19, 2016 7:20 PM
To: 'Unicode Mailing List' <unicode at unicode.org>
Subject: Re: Best practices for replacing UTF-8 overlongs

On 2016/12/20 11:35, Tex Texin wrote:
> Shawn,
>
> Ok, but that begs the questions of what to do...
> "All bets are off" is not instructive.

Well, it may be instructive in that its difficult to get software to decide what happened. A human may be in a better position to analyze the error and the cause(s) of the error, and to fix these.

> How software behaves in the face of invalid bytes, what it does with them, what it does about them, and how it continues (or not) still needs to be determined.

Yes, but that will depend on circumstances. In a safety-critical application, you'll want to do something different than if you are sending the text to a printer just to have a look at it.

Regards,   Martin.