EBCDIC control characters
Asmus Freytag (c)
asmusf at ix.netcom.com
Thu Jun 18 20:16:05 CDT 2020
On 6/18/2020 5:24 PM, Ken Whistler wrote:
> On 6/18/2020 4:55 PM, Asmus Freytag via Unicode wrote:
>> The problem with the C/C++ compilers in this regard has always been
>> that they attempted to implement the character-set insensitive model,
>> which doesn't play well with Unicode, so if you want to compile a
>> program where string literals are in Unicode (and not just any 16-bit
>> character set) then you can't simply zero-extend. (And if you are
>> trying to create a UTF-8 literal, then all bets are off unless you
>> have a real conversion).
> As I said, daft. ;-)
An argument can certainly be made that trying to be "character set
independent" is daft - and back in the '90s I walked away from a job
interview at a place that told me that they had "figured it all out" and
were going to use "character set independence" as their i18n strategy
and "only" needed someone to implement it. Easiest decision on my part.
(They got creamed by their Unicode-based competitor in short order).
My experience with C/C++ is perhaps colored a bit by the fact that I've
always used compilers that were targeting Unicode-based systems and had
special extension; not sure where things stand right now, for a purely
> Anybody who depends on zero-sign extension for embedding Unicode
> character literals in an 8859-1 (or any other 8-bit character set)
> program text ought to have their head examined. Just because you *can*
> do it, and the compilers will cheerily do what the spec says they
> should in such cases doesn't mean that anybody *should* use it. (There
> is lots of stuff in C++ that no sane programmer should use. )
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode