EBCDIC control characters

Ken Whistler kenwhistler at sonic.net
Thu Jun 18 19:24:35 CDT 2020


On 6/18/2020 4:55 PM, Asmus Freytag via Unicode wrote:
> The problem with the C/C++ compilers in this regard has always been 
> that they attempted to implement the character-set insensitive model, 
> which doesn't play well with Unicode, so if you want to compile a 
> program where string literals are in Unicode (and not just any 16-bit 
> character set) then you can't simply zero-extend. (And if you are 
> trying to create a UTF-8 literal, then all bets are off unless you 
> have a real conversion).

As I said, daft. ;-)

Anybody who depends on zero-sign extension for embedding Unicode 
character literals in an 8859-1 (or any other 8-bit character set) 
program text ought to have their head examined. Just because you *can* 
do it, and the compilers will cheerily do what the spec says they should 
in such cases doesn't mean that anybody *should* use it. (There is lots 
of stuff in C++ that no sane programmer should use. )


More information about the Unicode mailing list