OverStrike control character
Mark E. Shoulson
mark at kli.org
Wed Jun 10 07:59:16 CDT 2020
On 6/9/20 10:50 PM, abrahamgross--- via Unicode wrote:
> It should just simply overlay the pixels of the two characters, with a thin character going in the center of a wider character.
What are these "pixels" to which you refer? Fonts these days are
defined in terms of strokes, not pixels. And Richard Wordingham points
out the flaw in your notion of how it would be rendered, your claim that
x OS z would look the same as z OS x:
> Consider <l, OVERSTRIKE, m> and <m, OVERSTRIKE, l> in a proportional
> width font. Are you expecting the rendering system to position the 'l'
> using the knowledge that it will be overstruck? Overstriking is
> designed for a teletype with fixed width characters.
Besides, even if it worked as you said, with the narrow character
centered, how long would it take before you found some examples that
didn't really quite work out right? Like overlaying a HEBREW LETTER YOD
on a LATIN CAPITAL LETTER L, but what you really wanted was the YOD
centered in the negative space of the L and not between the
side-bearings, so next you'll want to be able to add some control over
the exact positioning. And of course that won't work right in general,
because it all depends on the font(s) involved.
And when it comes to matching, you say of x OS z and z OS x,
> In a perfect world they would be identical for string matching, but
> since its a new control character I would understand if ppl don't want
> to put in the effort to adopt it properly.
But we're talking about making the rules here, the "perfect world."
What should the *rule* be about string-matching? You can't have an
optional rule, so a pair of strings will match on one system and not the
other and both are right. Are we to understand that you think the rule
should be that overstruck characters are considered to match in either
order? Your gracious forgiveness of laxity in the rules doesn't really
enter into the picture. And what about larger considerations? Can I
have "ab←←xy" (using ← for the overstrike) to overstrike a&x and b&y?
What about "a←b←c←d←e←f←g←h"? What about "abc←d←←fg"? The f&b are
overstruck and so are the c&d&g? Is that combination of c←d overstruck
with g different from c←d←g or the same? What about other
combinations? These are all things that need answers. What about
overstriking a LTR character with a RTL one, or vice-versa? Which way
does the text go after that?
But I think what you're really missing is the crucial point that Garth
Wallace pointed out:
> Display is not the only thing text is for.
You're focussing a lot on how characters *look*, can we get this letter
to look a little different, can we layer characters to make other weird
symbols (which will look radically different depending on the font)...
You're looking at how to _draw_ stuff, how to make things look this way
or that on paper or when rendered. But that's not what Unicode
encodes. You need to think more about the distinction Unicode makes
between characters and glyphs. "Plain text" isn't about display, it's
about representing what's in a document, the characters which encode (in
a different sense) the spoken language (usually) that is being
communicated. All the things you're talking about are firmly in the
realm of fonts and higher-level protocols. You surely could work out
this overstriking display with a sufficiently-advanced font (you could
make zero-advance-width overlaying characters and ligatures that would
replace X← with a zero-width equivalent of X, for example, in a
monospace font), and you are welcome to do so, but that's where it belongs.
> It shouldnt do any fancy processing by default
Figuring out how much to backspace in order to center a glyph on another
one, in a proportional-spaced font, is pretty fancy processing.
> Most systems have just about the same font so I wouldn't worry about the results of overstriking not coming out perfect.
What a bland world you live in, wherein most fonts are the same! It's
not about working with the default font on your favorite system; we're
dealing with _characters_ here, which could be represented in ANY font.
~mark
More information about the Unicode
mailing list