Basic Latin digits, not everything else (was: RE: How the C programming language bridges the man-machine gap)

Tom Honermann tom at honermann.net
Mon Apr 18 14:54:15 CDT 2022


On 4/18/22 3:36 PM, Hans Åberg via Unicode wrote:
>> On 18 Apr 2022, at 21:10, Jens Maurer <Jens.Maurer at gmx.net> wrote:
>>
>> I sense some confusion here, but it's a bit hard for me
>> to pinpoint it.  I've been participating in the standardization
>> of C++ for more than 20 years; C++ has a similar provision.
>>
>> The C standard (ISO 9899) says in section 5.2.1 paragraph 3:
>>
>> "In both the source and execution basic character sets,
>> the value of each character after 0 in the above list
>> of decimal digits shall be one greater than the value
>> of the previous."
>>
>> Note the use of the term "basic character set".
>>
>> That is defined directly above based on the Latin
>> alphabet and "the 10 decimal digits" 0...9.  This
>> is all understood to be subsets of the ASCII
>> repertoire; any superscript or non-Western
>> representation of digits is not in view here.
> So in your interpretation, a C or a C++ compiler cannot use EBDIC? —C++ used to have trigraphs to allow for that encoding.

Strictly conforming C and C++ compilers can use EBCDIC so long as the 
EBCDIC code pages used for character and string literals (at 
compile-time) and the locale encoding of execution character sets (at 
run-time) is constrained to EBCDIC code pages that satisfy the property 
that decimal digits are encoded in sequence. Other code pages can be 
supported as extensions. The necessary property is probably satisfied by 
all EBCDIC code pages (but I don't know that for sure) since the decimal 
digits are encoded in sequence in the invariant subset of EBCDIC.

Tom.

>
> The question is not what is a useful version of a C compiler, but what is acceptable by the C standard. The main intent, as I see it, is allow to define C programs in a fairly portable way. So if one ensures the digits chosen are consecutive, one can write a portable C program using that feature by keeping track of the character translation.
>
> Another requirement is that the values must also fit into a C byte. So one must keep track of what a C byte is.
>
> One might compare with Unicode, it does not define what binary representation the code points should have, one only gets that by applying encodings like UTF-8 etc., but one does not have to use those standard encodings.
>
>
>


More information about the Unicode mailing list