Corrigendum #9

Tue Jun 3 01:55:09 CDT 2014

On Mon, Jun 2, 2014 at 10:32 PM, David Starner <prosfilaes at gmail.com> wrote:

> Why? It seems you're changing the rules
> ...
>
>
This isn't "are changing", it is "has changed". The Corrigendum was issued
at the start of 2013, about 16 months ago; applicable to all relevant
earlier versions. It was the result of fairly extensive debate inside the
UTC; there hasn't been a single issue on this thread that wasn't considered
during the discussions there. And as far back as 2001, the UTC made it
clear that noncharacters *are* scalar values, and are to be converted by
UTF converters. Eg, see
http://www.unicode.org/mail-arch/unicode-ml/y2001-m09/0149.html (by chance,
one day before 9/11).

> probably trigger serious bugs in some lamebrained utility.

There were already plenty of programs that passed the noncharacters
through; very few would filter them (some would delete them, which is
horrible for security). Thinking that a utility would never encounter them
in input text was a pipe-dream. If a utility or library is so fragile that
it *breaks* on input of any valid UTF sequence, then it *is* a "lamebrained"
utility. A good unit test for any production chain would be to check there
is no crash on any input scalar value (and for that matter, any ill-formed
UTF text).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140603/01214a85/attachment.html>