Basic Latin digits, not everything else (was: RE: How the C programming language bridges the man-machine gap)

Hans Åberg haberg-1 at telia.com
Mon Apr 18 15:32:14 CDT 2022


> On 18 Apr 2022, at 21:51, Jens Maurer via Unicode <unicode at corp.unicode.org> wrote:
> 
>>> That is defined directly above based on the Latin
>>> alphabet and "the 10 decimal digits" 0...9. This
>>> is all understood to be subsets of the ASCII
>>> repertoire; any superscript or non-Western
>>> representation of digits is not in view here.
>> 
>> So in your interpretation, a C or a C++ compiler cannot use EBDIC? —C++ used to have trigraphs to allow for that encoding.
> 
> Sure, a compiler can use EBCDIC, and existing compilers do.
> I said "ASCII repertoire", not "ASCII encoding".

The C standard does not refer to ASCII at all.

>> Another requirement is that the values must also fit into a C byte. So one must keep track of what a C byte is.
> 
> I don't know what you mean by "one must keep track..."

If the values used do not fit into an octet, one must use a larger byte, and such have used been in the past, but not nowadays, I think. But large enough to carry all the Unicode values in a byte might be a possibility. An expert on C might tune in.

But this is carrying too far away into tangents: I just wanted to illustrate one aspect of the C standard, the requirement that the digit values must consecutive.





More information about the Unicode mailing list