EBCDIC control characters
Kent Karlsson
kent.b.karlsson at bahnhof.se
Fri Jun 19 18:06:45 CDT 2020
> 18 juni 2020 kl. 20:00 skrev Ken Whistler via Unicode <unicode at unicode.org>:
> […]
> It isn't really a "character set" issue. Either ASCII graphic character sets or EBCDIC graphic character sets could be used, in principle, with different sets of control functions, mapped onto the control code positions in each overall scheme.
That does not seem to be a very good idea at all. Especially since we do not have any good way of telling which set of control codes are used in such cases, in particular it would be a very very bad idea for Unicode encodings. It would be even worse than the situation that lead up the the construction of Unicode.
So let’s assume ”normal” control code allocation in the C0 and C1 areas when using the U+nnnn notation (or \unnnn). (Here, not saying anything about the contents for C0/C1 for other encodings.)
I don’t usually need to worry about EBCDIC-based encodings… But it seems that at least earlier (UTF-EBCDIC not so much) EBCDIC based encoding had some control codes that has no direct correspondence in the ”normal” C0/C1. Several are listed in the Wikipedia page about EBCDIC.
Even though there is no direct correspondence for them, there is a way to represent them; provided one agrees on a mapping: ISO/IEC 6429/ECMA-48 comes to the rescue. There are very many unused, but syntactically correct, escape sequences and control sequences. A few of them are designated as private use. So for (old?) EBCDIC control codes that do not have a representation in ”normal” C0/C1, if it is a parameterless one, ”allocate” an escape sequence (cmp. each C1 control code has an alternative as an escape sequence, like HTJ can be designated ESC I and NEL as ESC E), and for the ones that take a parameter, ”allocate” a control sequence (in the ECMA-48 sense) that takes a parameter (you will need a parameter value mapping as well).
I’m not saying that these (old?) control codes unique to EBCDIC are well-designed all worthy of implementation and perpetual use. Not at all. But if you do need to keep some of them, in some contexts (and otherwise ignore them) allocating escape sequences and control sequences is the way to go. No need to allocate new characters in Unicode… And no need to interpret the C0/C1 space in Unicode ”strangely” in some contexts. That way you can represent ”odd” control codes, from e.g. (old?) EBCDIC-based encodings also in Unicode, and \unnnn notation (ok for some (old?) EBCDIC-based encodings one needs an extra conversion step to convert the escape/control sequence to the (old?) control codes if the string targets such an encoding).
Happy summer (northern hemisphere...) solstice
/Kent Karlsson
More information about the Unicode
mailing list