Basic Latin digits, not everything else (was: RE: How the C programming language bridges the man-machine gap)
Hans Åberg
haberg-1 at telia.com
Mon Apr 18 15:32:14 CDT 2022
> On 18 Apr 2022, at 21:51, Jens Maurer via Unicode <unicode at corp.unicode.org> wrote:
>
>>> That is defined directly above based on the Latin
>>> alphabet and "the 10 decimal digits" 0...9. This
>>> is all understood to be subsets of the ASCII
>>> repertoire; any superscript or non-Western
>>> representation of digits is not in view here.
>>
>> So in your interpretation, a C or a C++ compiler cannot use EBDIC? —C++ used to have trigraphs to allow for that encoding.
>
> Sure, a compiler can use EBCDIC, and existing compilers do.
> I said "ASCII repertoire", not "ASCII encoding".
The C standard does not refer to ASCII at all.
>> Another requirement is that the values must also fit into a C byte. So one must keep track of what a C byte is.
>
> I don't know what you mean by "one must keep track..."
If the values used do not fit into an octet, one must use a larger byte, and such have used been in the past, but not nowadays, I think. But large enough to carry all the Unicode values in a byte might be a possibility. An expert on C might tune in.
But this is carrying too far away into tangents: I just wanted to illustrate one aspect of the C standard, the requirement that the digit values must consecutive.
More information about the Unicode
mailing list