Basic Latin digits, not everything else (was: RE: How the C programming language bridges the man-machine gap)

Jens Maurer Jens.Maurer at gmx.net
Mon Apr 18 14:10:58 CDT 2022


On 18/04/2022 20.47, Doug Ewell via Unicode wrote:
> Hans Åberg wrote:
>
>>> The superscript European digits are not the same characters as the
>>> regular, full-size European digits, by either Unicode's definition of
>>> "same" or that of any other character encoding standard. Thus the C
>>> standard is only talking about 0123456789, not ⁰¹²³⁴⁵⁶⁷⁸⁹.
>>
>> I suggest you check in some C language standard forum.
>
> Then this constraint (consecutive and strictly increasing) cannot be met using Unicode or any other character encoding that I have ever seen.

I sense some confusion here, but it's a bit hard for me
to pinpoint it.  I've been participating in the standardization
of C++ for more than 20 years; C++ has a similar provision.

The C standard (ISO 9899) says in section 5.2.1 paragraph 3:

"In both the source and execution basic character sets,
the value of each character after 0 in the above list
of decimal digits shall be one greater than the value
of the previous."

Note the use of the term "basic character set".

That is defined directly above based on the Latin
alphabet and "the 10 decimal digits" 0...9.  This
is all understood to be subsets of the ASCII
repertoire; any superscript or non-Western
representation of digits is not in view here.

Jens



More information about the Unicode mailing list