EBCDIC control characters

Asmus Freytag (c) asmusf at ix.netcom.com
Thu Jun 18 20:16:05 CDT 2020


On 6/18/2020 5:24 PM, Ken Whistler wrote:
> Asmus,
>
> On 6/18/2020 4:55 PM, Asmus Freytag via Unicode wrote:
>> The problem with the C/C++ compilers in this regard has always been 
>> that they attempted to implement the character-set insensitive model, 
>> which doesn't play well with Unicode, so if you want to compile a 
>> program where string literals are in Unicode (and not just any 16-bit 
>> character set) then you can't simply zero-extend. (And if you are 
>> trying to create a UTF-8 literal, then all bets are off unless you 
>> have a real conversion).
>
> As I said, daft. ;-)

Ken,

An argument can certainly be made that trying to be "character set 
independent" is daft - and back in the '90s I walked away from a job 
interview at a place that told me that they had "figured it all out" and 
were going to use "character set independence" as their i18n strategy 
and "only" needed someone to implement it. Easiest decision on my part. 
(They got creamed by their Unicode-based competitor in short order).

My experience with C/C++ is perhaps colored a bit by the fact that I've 
always used compilers that were targeting Unicode-based systems and had 
special extension; not sure where things stand right now, for a purely 
generic implementation.

A./

>
> Anybody who depends on zero-sign extension for embedding Unicode 
> character literals in an 8859-1 (or any other 8-bit character set) 
> program text ought to have their head examined. Just because you *can* 
> do it, and the compilers will cheerily do what the spec says they 
> should in such cases doesn't mean that anybody *should* use it. (There 
> is lots of stuff in C++ that no sane programmer should use. )
>
> --Ken
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/mailman/private/unicode/attachments/20200618/ff43f77c/attachment.htm>


More information about the Unicode mailing list