OverStrike control character

Harriet Riddle harjitmoe at outlook.com
Tue Jun 9 18:44:26 CDT 2020


> The programming language APL also heavily relied on the overstrike control character, so many systems in the 80s had the character including Lisp machines.

The current way of handling APL overstamping sequences is to include the entire sequences in the mapping file: https://www.unicode.org/Public/MAPPINGS/VENDORS/MISC/APL-ISO-IR-68.TXT

The interpreter/compiler would presumably have a hardcoded list of sequences it recognises anyway…

> Unicode/ASCII currently has at ASCII 8 the character "BS" thats supposed to go back a character without deleting it, and "DEL" at ASCII 127 that does delete the character. But nowadays BS just deletes the previous character.

Unicode itself is fairly hands-off about how higher level protocols can interpret C0 and C1 control codes (general category Cc). Indeed, ISO 10646:2017 section 12.4, while giving the designation sequences of the ISO 6429 (ECMA 48) controls as the default, does go on to (on the next page) permit the use of ISO 2022 designations of other control code sets with UCS/Unicode (by contrast, ISO 2022 designation of graphical sets are not permitted inside UCS, and have no compatible semantic).

That being said, TUS chapter 23.1 names a limited subset of them (HT, LF, VT, FF, CR, FS, GS, RS, US, NEL), so that they can be given custom behaviours for line breaking, bidirectional processing and classification as whitespace. BS is not amongst these.

In practice, BS is not supported at all (i.e. has neither behaviour) outside of terminal emulators in my experience.

> In fact, it's prohibited in ISO/IEC 8859 for BS to not delete the previous character. 

ISO 8859 defines profiles of ISO 4873 (ECMA 43) Level 1. Both ISO 8859 and ISO 4873 stipulate fixed character repertoires, and so prohibit creating new characters from overstamping existing ones by any means (including using BS or CR to seek back over them). I don't read this as limiting how BS itself might be implemented, just that it is invalid ISO 8859 for a text to use it to stamp characters on top of other characters to create a character with a different meaning to the two one after the other.

They do permit using the GCC control sequence defined by ISO 6429 (ECMA 48) though, since it doesn't overstamp anything but merely renders them in one em-square (if that function is supported, and it usually isn't so far as I can tell, the most extreme example I can think of is that the byte sequence 9B 31 20 5F D5 E4 E9 20 C7 E4 E4 E7 20 D9 E4 EA E7 20 E8 D3 E4 E5 9B 32 20 5F in ISO-8859-6 might be shown with a U+FDFA glyph).





More information about the Unicode mailing list