Basic Latin digits, not everything else (was: RE: How the C programming language bridges the man-machine gap)
Doug Ewell
doug at ewellic.org
Mon Apr 18 00:51:05 CDT 2022
Hans Åberg wrote:
>>> One can't use say the Unicode superscript numbers and their code
>>> points directly for C.
>>
>> Was that part of the use case?
>
> 5.2.1 Character sets
> …
> In both the source and execution basic character sets, the value of
> each character after 0 in the above list of decimal digits shall be
> one greater than the value of the previous.
I think it's abundantly clear that the C standard, specifically "the above list of decimal digits," applies to the Basic Latin digits U+0030 through U+0039, and not to superscript digits, subscript digits, negative circled digits, mathematical sans-serif bold digits, or any other digits encoded in Unicode.
The digits shown in the PDF version of ISO/IEC 9899 are, visually, the size and alignment one would expect of Basic Latin digits. The accompanying text makes no mention of superscript digits. If superscript digits had been intended, one would think they would have been explicitly mentioned along with, or in opposition to, normal digits.
Although Roger Costello's original post in this thread was elementary for this list, it did clearly describe the normal process of converting integer values to and from strings of Basic Latin digits, not strings of other kinds of digits. THAT is what we are talking about here.
I'm not sure exactly what Murray Sargent meant by using superscript and subscript digits in C++ programs. If there are standard libraries for converting those to integer values and back, I wasn't aware of them.
But in any case, yes! It is true! Processing the superscript digits does require dealing with non-contiguous code point allocations and non–strictly-increasing code point order, because some of them were encoded in Latin-1 Supplement as part of ISO 8859-1 direct convertibility while the rest were added to the Superscripts and Subscripts block. But I haven't got a clue what this has to do with handling of Basic Latin digits.
It seems we are having a hard time staying focused in this thread. I received another response, offline, that the C standard doesn't promise anything about order and contiguity of hex digits (i.e. decimal digits interspersed with letters), nor does it discuss how to handle non-positional number systems, such as Greek and Roman numerals. No, it doesn't, but how do these prove or disprove anything about the handling of '0' through '9' either?
Perhaps I'm dense and simply need the responses to be more clear as to whether they are disputing something stated in the C standard, or just demonstrating knowledge that there are other digits and number systems for which the stated rules don't apply.
Or perhaps I'm just being grumpy.
--
Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org
More information about the Unicode
mailing list