Basic Latin digits, not everything else (was: RE: How the C programming language bridges the man-machine gap)

Hans Åberg haberg-1 at telia.com
Mon Apr 18 14:36:54 CDT 2022


> On 18 Apr 2022, at 21:10, Jens Maurer <Jens.Maurer at gmx.net> wrote:
> 
> I sense some confusion here, but it's a bit hard for me
> to pinpoint it.  I've been participating in the standardization
> of C++ for more than 20 years; C++ has a similar provision.
> 
> The C standard (ISO 9899) says in section 5.2.1 paragraph 3:
> 
> "In both the source and execution basic character sets,
> the value of each character after 0 in the above list
> of decimal digits shall be one greater than the value
> of the previous."
> 
> Note the use of the term "basic character set".
> 
> That is defined directly above based on the Latin
> alphabet and "the 10 decimal digits" 0...9.  This
> is all understood to be subsets of the ASCII
> repertoire; any superscript or non-Western
> representation of digits is not in view here.

So in your interpretation, a C or a C++ compiler cannot use EBDIC? —C++ used to have trigraphs to allow for that encoding.

The question is not what is a useful version of a C compiler, but what is acceptable by the C standard. The main intent, as I see it, is allow to define C programs in a fairly portable way. So if one ensures the digits chosen are consecutive, one can write a portable C program using that feature by keeping track of the character translation.

Another requirement is that the values must also fit into a C byte. So one must keep track of what a C byte is.

One might compare with Unicode, it does not define what binary representation the code points should have, one only gets that by applying encodings like UTF-8 etc., but one does not have to use those standard encodings.





More information about the Unicode mailing list