Basic Latin digits, not everything else (was: RE: How the C programming language bridges the man-machine gap)

Marius Spix marius.spix at web.de
Mon Apr 18 17:24:59 CDT 2022


Also note >

On Mon, 18 Apr 2022 21:10:58 +0200
Jens Maurer via Unicode wrote:

> On 18/04/2022 20.47, Doug Ewell via Unicode wrote:
> > Hans Åberg wrote:
> "In both the source and execution basic character sets,
> the value of each character after 0 in the above list
> of decimal digits shall be one greater than the value
> of the previous."
> 
> Note the use of the term "basic character set".

Also note that SHALL be does not mean MUST be. For example, the basic
character set SHALL include certain characters like “[”, “]”, “{” or
“}”, but whenever they do not exist in the current character set, C
allows to replace them by digraphs and trigraphs. C++ also adds and
alternative tokens (like “and” or “xor” instead of “&&” or “^”).
Trigraphs are not supported in C++17 anymore, which breaks
downwards-compatibility.

C also expect that the backslash (\, ASCII codepoint 0x5C) is used for
escape sequences in string literals, but some users of Shift JIS
encoding use the Yen sign (¥), with shares the same codepoint 0x5C.

Regards,

Marius



More information about the Unicode mailing list