Basic Latin digits, not everything else (was: RE: How the C programming language bridges the man-machine gap)

Jens Maurer Jens.Maurer at gmx.net
Mon Apr 18 14:51:12 CDT 2022


On 18/04/2022 21.36, Hans Åberg wrote:
>
>> On 18 Apr 2022, at 21:10, Jens Maurer <Jens.Maurer at gmx.net> wrote:
>>
>> I sense some confusion here, but it's a bit hard for me
>> to pinpoint it.  I've been participating in the standardization
>> of C++ for more than 20 years; C++ has a similar provision.
>>
>> The C standard (ISO 9899) says in section 5.2.1 paragraph 3:
>>
>> "In both the source and execution basic character sets,
>> the value of each character after 0 in the above list
>> of decimal digits shall be one greater than the value
>> of the previous."
>>
>> Note the use of the term "basic character set".
>>
>> That is defined directly above based on the Latin
>> alphabet and "the 10 decimal digits" 0...9.  This
>> is all understood to be subsets of the ASCII
>> repertoire; any superscript or non-Western
>> representation of digits is not in view here.
>
> So in your interpretation, a C or a C++ compiler cannot use EBDIC? —C++ used to have trigraphs to allow for that encoding.

Sure, a compiler can use EBCDIC, and existing compilers do.
I said "ASCII repertoire", not "ASCII encoding".

According to https://en.wikipedia.org/wiki/EBCDIC
EBCDIC does have contiguous digits.
(However, letters are not contiguous, but that is not
what we're talking about here.)

> The question is not what is a useful version of a C compiler, but what is acceptable by the C standard. The main intent, as I see it, is allow to define C programs in a fairly portable way. So if one ensures the digits chosen are consecutive, one can write a portable C program using that feature by keeping track of the character translation.

The whole point of a programming language standard is to permit
writing portable programs --- portable across compilers and
hardware/operating system environments.

The requirement in the C and C++ standards about contiguous
digits ensures that programs relying on that property are
portable to all conforming compilers.
(In contrast, programs relying on contiguous Latin letters
are not so portable.)

> Another requirement is that the values must also fit into a C byte. So one must keep track of what a C byte is.

I don't know what you mean by "one must keep track..."

> One might compare with Unicode, it does not define what binary representation the code points should have, one only gets that by applying encodings like UTF-8 etc., but one does not have to use those standard encodings.

Right, but it seems a programming language standard is at liberty
to impose restrictions on the generality of Unicode when deemed
practical.

Jens



More information about the Unicode mailing list