EBCDIC control characters

Richard Wordingham richard.wordingham at ntlworld.com
Sat Jun 20 10:53:26 CDT 2020


On Sat, 20 Jun 2020 07:45:45 -0700
Ken Whistler via Unicode <unicode at unicode.org> wrote:

> Richard is making the purist point that U+0000 is a Unicode
> character, and therefore should be transmissible as part of any
> Unicode plain text stream.

Prompted by the pain of Unicode test files with embedded nulls and even
embedded end of file.

I could never work out why isolated UTF-16 code units should be
handled, but there was no need to handle isolated UTF-8 code units.

> 7-bit ASCII: One cannot represent NULL (0x00) as part of the content
> of a C string. Resort to char arrays.

Actually, you can.  As the size of char is at least 8 bits, you have
128 spare codes. :-)

Richard.


More information about the Unicode mailing list