OverStrike control character

Sławomir Osipiuk sosipiuk at gmail.com
Tue Jun 9 19:01:32 CDT 2020


On Tue, Jun 9, 2020 at 6:57 PM abraham gross via Unicode
<unicode at unicode.org> wrote:
>
> What do yall think about adding an OverStrike control character?

I don't think it's a goer. There are two things that immediately stand out:

1. Unicode doesn't seem eager to define control characters at all. In
fact, aside from a handful of format effectors which were so universal
and obvious that it made no sense to exclude them, Unicode is very
passive on the topic of even the well-defined controls of ISO
6429/ECMA 48. An interesting exception to this is the pair of U+2028
and U+2029 (line and paragraph separators). Any control character is
going to be a "hard sell".

2. Overstriking arbitrary characters is a qualitatively different
process than using combining characters. In the latter case, the set
of characters is restricted, and certain algorithms can be applied to
make the presentation look sane (to varying degrees of success).
Overstriking implies the need for the rendering engine to be able to
combine any two characters, regardless of elements that interfere or
clash. It seems simple in principle to just render the characters
separately and overlay the pixels, but I'm very skeptical of what the
results would actually look like in real-life, with users making
unpredictable font and formatting choices.

> Unicode/ASCII currently has at ASCII 8 the character "BS" thats supposed to go back a character without deleting it, and "DEL" at ASCII 127 that does delete the character. But nowadays BS just deletes the previous character. In fact, it's prohibited in ISO/IEC 8859 for BS to not delete the previous character.

Is it? I know that's the behaviour in all modern software, but I can't
find that prohibition. Can you point out the section?

Speaking of old standards, though, ISO 6429/ECMA 48 has the GCC
(GRAPHIC CHARACTER COMBINATION) control which seems to be its
recommendation for overstriking (though it also waffles about how
combined characters may simply be made half-width and inserted into
the horizontal space of a single character, leaving the ultimate
decision of "relative sizes and placements" to the implementation.)
GCC looks like a mess. Because of the way it's built up from a CSI
(control sequence introducer) and uses parameters, the way to combine
two characters is to precede them both with the sequence [0x1B 0x5B
0x30 0x20 0x5F], and to combine more than two characters, enclose them
with an initial [0x1B 0x5B 0x31 0x20 0x5F] and a final [0x1B 0x5B 0x32
0x20 0x5F]. How fun.

Sławomir Osipiuk



More information about the Unicode mailing list