Corrigendum #9

David Starner prosfilaes at
Tue Jun 3 02:41:18 CDT 2014

On Mon, Jun 2, 2014 at 11:55 PM, Mark Davis ☕️ <mark at> wrote:
> Thinking that a utility would never encounter them in input text
> was a pipe-dream.

Thinking that a utility would never mangle them if encountered in
input text was a pipe-dream.

> If a utility or library is so fragile that it breaks on
> input of any valid UTF sequence, then it is a "lamebrained" utility.

And?  The world is filled with lamebrained utilities, and being
cautious about what you take in can prevent one of those lamebrained
utilities from turning into an exploit.

> A good
> unit test for any production chain would be to check there is no crash on
> any input scalar value (and for that matter, any ill-formed UTF text).

Right; and if you filter out stuff at the frontend, like ill-formed
UTF text and noncharacters, you don't have to worry about what the
middle end will do with them.

I don't get what the goal of these changes were. It seems you've taken
these characters away from programmers to use them in programs and
given them to CLDR and anyone else willing to make their "plain text
files" skirt the limits.

Kie ekzistas vivo, ekzistas espero.

More information about the Unicode mailing list