Corrigendum #9
Richard Wordingham
richard.wordingham at ntlworld.com
Wed Jul 2 17:59:55 CDT 2014
On Wed, 2 Jul 2014 21:19:16 +0200
Philippe Verdy <verdy_p at wanadoo.fr> wrote:
> 2014-07-02 20:19 GMT+02:00 David Starner <prosfilaes at gmail.com>:
>
> > I might argue 11111111b for 0x00 in UTF-8 would be technically
> > legal
> But the same C libraries are also using -1 as end-of-stream values
> and if they are converted to bytes, they will be undistinctable from
> the NULL character that could be stored everywhere in the stream.
A 0xFF byte in a narrow character stream is converted to 0x00FF (int is
at least 16 bits wide) in the interfaces while the narrow character
end-of-stream value EOF is required to be negative. Unfortunately, the
wide character end-of-stream marker WEOF is not required to be
negative, but it is not allowed to be a representable character. C
appears to prohibit U+FFFF as well as supplementary characters if
wchar_t is only 16 bits wide.
Richard.
More information about the Unicode
mailing list