OverStrike control character

Mark E. Shoulson mark at kli.org
Wed Jun 10 07:59:16 CDT 2020

On 6/9/20 10:50 PM, abrahamgross--- via Unicode wrote:
> It should just simply overlay the pixels of the two characters, with a thin character going in the center of a wider character.

What are these "pixels" to which you refer?  Fonts these days are 
defined in terms of strokes, not pixels.  And Richard Wordingham points 
out the flaw in your notion of how it would be rendered, your claim that 
x OS z would look the same as z OS x:

> Consider <l, OVERSTRIKE, m> and <m, OVERSTRIKE, l> in a proportional
> width font.  Are you expecting the rendering system to position the 'l'
> using the knowledge that it will be overstruck? Overstriking is
> designed for a teletype with fixed width characters.
Besides, even if it worked as you said, with the narrow character 
centered, how long would it take before you found some examples that 
didn't really quite work out right?  Like overlaying a HEBREW LETTER YOD 
on a LATIN CAPITAL LETTER L, but what you really wanted was the YOD 
centered in the negative space of the L and not between the 
side-bearings, so next you'll want to be able to add some control over 
the exact positioning.  And of course that won't work right in general, 
because it all depends on the font(s) involved.

And when it comes to matching, you say of x OS z and z OS x,

> In a perfect world they would be identical for string matching, but 
> since its a new control character I would understand if ppl don't want 
> to put in the effort to adopt it properly.

But we're talking about making the rules here, the "perfect world."  
What should the *rule* be about string-matching?  You can't have an 
optional rule, so a pair of strings will match on one system and not the 
other and both are right.  Are we to understand that you think the rule 
should be that overstruck characters are considered to match in either 
order?  Your gracious forgiveness of laxity in the rules doesn't really 
enter into the picture.  And what about larger considerations?  Can I 
have "ab←←xy" (using ← for the overstrike) to overstrike a&x and b&y?  
What about "a←b←c←d←e←f←g←h"?  What about "abc←d←←fg"? The f&b are 
overstruck and so are the c&d&g?  Is that combination of c←d overstruck 
with g different from c←d←g or the same?  What about other 
combinations?  These are all things that need answers.  What about 
overstriking a LTR character with a RTL one, or vice-versa?  Which way 
does the text go after that?

But I think what you're really missing is the crucial point that Garth 
Wallace pointed out:

> Display is not the only thing text is for.

You're focussing a lot on how characters *look*, can we get this letter 
to look a little different, can we layer characters to make other weird 
symbols (which will look radically different depending on the font)... 
You're looking at how to _draw_ stuff, how to make things look this way 
or that on paper or when rendered.  But that's not what Unicode 
encodes.  You need to think more about the distinction Unicode makes 
between characters and glyphs.  "Plain text" isn't about display,  it's 
about representing what's in a document, the characters which encode (in 
a different sense) the spoken language (usually) that is being 
communicated.  All the things you're talking about are firmly in the 
realm of fonts and higher-level protocols.  You surely could work out 
this overstriking display with a sufficiently-advanced font (you could 
make zero-advance-width overlaying characters and ligatures that would 
replace X← with a zero-width equivalent of X, for example, in a 
monospace font), and you are welcome to do so, but that's where it belongs.
> It shouldnt do any fancy processing by default
Figuring out how much to backspace in order to center a glyph on another 
one, in a proportional-spaced font, is pretty fancy processing.
> Most systems have just about the same font so I wouldn't worry about the results of overstriking not coming out perfect.

What a bland world you live in, wherein most fonts are the same! It's 
not about working with the default font on your favorite system; we're 
dealing with _characters_ here, which could be represented in ANY font.


More information about the Unicode mailing list