OverStrike control character

Harriet Riddle harjitmoe at outlook.com
Tue Jun 16 15:43:15 CDT 2020


> Your equivalence calls for <l, OVERSTRIKE, m> and
> > <m, OVERSTRIKE, l> to have the same advance width.
> > 
> 
> Right, exactly. Why's that a problem?

Because with how things usually work currently (in something like Roman or Greek, at any rate), the text renderer will make space for the first character first, and then position any following combining diacritics in that space. That is to say, the anchor point is a point inside the space allocated for the base character, and the diacritics are positioned so their anchors are at that point. The combining diacritics themselves have zero advance width, and no space is allocated for them; in the absence of anchors, they just poke over the previous character and (somewhat optimistically) hope for the best.

So if, say, <OVS>m and <OVS>l were treated just as postfix combining diacritics are today, the m in l<OVS>m would significantly poke out of both sides of the space allocated for the l. Whereas m<OVS>l would not do that (since the space is allocated for the m, which is the wider of the two), and hence they wouldn't display the same way.

In terms of use of <BS> for this in e.g. 7-bit ASCII, this works only because the output device is using a fixed width font such as Courier, and so doesn't have to worry about this sort of thing.

Obviously, there are some existing exceptions to this being how combining characters work (e.g. some Tamil vowel marks actually display in-line before the base character, and so shove it forward in the line despite being encoded after it). But these exceptions pose an implementation burden, requiring the layout engine to actively support these scripts.


More information about the Unicode mailing list