Corrigendum #9

Richard Wordingham richard.wordingham at ntlworld.com
Wed Jul 2 17:59:55 CDT 2014


On Wed, 2 Jul 2014 21:19:16 +0200
Philippe Verdy <verdy_p at wanadoo.fr> wrote:

> 2014-07-02 20:19 GMT+02:00 David Starner <prosfilaes at gmail.com>:
> 
> > I might argue 11111111b for 0x00 in UTF-8 would be technically
> > legal
 
> But the same C libraries are also using -1 as end-of-stream values
> and if they are converted to bytes, they will be undistinctable from
> the NULL character that could be stored everywhere in the stream.

A 0xFF byte in a narrow character stream is converted to 0x00FF (int is
at least 16 bits wide) in the interfaces while the narrow character
end-of-stream value EOF is required to be negative.  Unfortunately, the
wide character end-of-stream marker WEOF is not required to be
negative, but it is not allowed to be a representable character.  C
appears to prohibit U+FFFF as well as supplementary characters if
wchar_t is only 16 bits wide.

Richard.


More information about the Unicode mailing list