EBCDIC control characters
Richard Wordingham
richard.wordingham at ntlworld.com
Sat Jun 20 10:53:26 CDT 2020
On Sat, 20 Jun 2020 07:45:45 -0700
Ken Whistler via Unicode <unicode at unicode.org> wrote:
> Richard is making the purist point that U+0000 is a Unicode
> character, and therefore should be transmissible as part of any
> Unicode plain text stream.
Prompted by the pain of Unicode test files with embedded nulls and even
embedded end of file.
I could never work out why isolated UTF-16 code units should be
handled, but there was no need to handle isolated UTF-8 code units.
> 7-bit ASCII: One cannot represent NULL (0x00) as part of the content
> of a C string. Resort to char arrays.
Actually, you can. As the size of char is at least 8 bits, you have
128 spare codes. :-)
Richard.
More information about the Unicode
mailing list