How the C programming language bridges the man-machine gap

Hans Åberg haberg-1 at telia.com
Sat Apr 16 08:17:07 CDT 2022


> On 15 Apr 2022, at 20:02, Doug Ewell via Unicode <unicode at corp.unicode.org> wrote:
> 
> Marius Spix wrote:
> 
>> char literals are not reliable for arithmetic expressions.
>> '1' - '0' = 1 may be true for Windows-1252 or EBCDIC systems, but you
>> cannot expect that this works in all character sets.
> 
> Modern C language specifications (at least C99) ensure that you can:
> 
>> 5.2.1 Character sets
>> 
>> Both the basic source and basic execution character sets shall have
>> the following members: the 26 uppercase letters of the Latin alphabet
>> 
>>   A   B   C   D   E   F   G   H   I   J   K   L   M
>>   N   O   P   Q   R   S   T   U   V   W   X   Y   Z
>> 
>> the 26 lowercase letters of the Latin alphabet
>> 
>>   a   b   c   d   e   f   g   h   i   j   k   l   m
>>   n   o   p   q   r   s   t   u   v   w   x   y   z
>> 
>> the 10 decimal digits
>> 
>>   0   1   2   3   4   5   6   7   8   9
>> 
>> the following 29 graphic characters
>> 
>>   !   "   #   %   &   '   (   )   *   +   ,   -   .   /   :
>>   ;   <   =   >   ?   [   \   ]   ^   _   {   |   }   ~
>> 
>> [...]
>> 
>> In both the source and execution basic character sets, the value of
>> each character after 0 in the above list of decimal digits shall be
>> one greater than the value of the previous.
> 
> I'm not aware of any character set that meets the repertoire requirement but not the digit-sequencing requirement.

One can't use say the Unicode superscript numbers and their code points directly for C.

One way to use such superscript integers is to first pick the string up, say using a lexer like Flex, and then convert the string to ordinary digits, which then can be used in C/C++ functions.





More information about the Unicode mailing list