Basic Latin digits, not everything else (was: RE: How the C programming language bridges the man-machine gap)
Hans Åberg
haberg-1 at telia.com
Mon Apr 18 14:36:54 CDT 2022
> On 18 Apr 2022, at 21:10, Jens Maurer <Jens.Maurer at gmx.net> wrote:
>
> I sense some confusion here, but it's a bit hard for me
> to pinpoint it. I've been participating in the standardization
> of C++ for more than 20 years; C++ has a similar provision.
>
> The C standard (ISO 9899) says in section 5.2.1 paragraph 3:
>
> "In both the source and execution basic character sets,
> the value of each character after 0 in the above list
> of decimal digits shall be one greater than the value
> of the previous."
>
> Note the use of the term "basic character set".
>
> That is defined directly above based on the Latin
> alphabet and "the 10 decimal digits" 0...9. This
> is all understood to be subsets of the ASCII
> repertoire; any superscript or non-Western
> representation of digits is not in view here.
So in your interpretation, a C or a C++ compiler cannot use EBDIC? —C++ used to have trigraphs to allow for that encoding.
The question is not what is a useful version of a C compiler, but what is acceptable by the C standard. The main intent, as I see it, is allow to define C programs in a fairly portable way. So if one ensures the digits chosen are consecutive, one can write a portable C program using that feature by keeping track of the character translation.
Another requirement is that the values must also fit into a C byte. So one must keep track of what a C byte is.
One might compare with Unicode, it does not define what binary representation the code points should have, one only gets that by applying encodings like UTF-8 etc., but one does not have to use those standard encodings.
More information about the Unicode
mailing list