OverStrike control character

Harriet Riddle harjitmoe at outlook.com
Wed Jun 10 12:27:07 CDT 2020


> From: Unicode <unicode-bounces at unicode.org> on behalf of Kent Karlsson via Unicode <unicode at unicode.org>
> Sent: Wednesday, June 10, 2020 5:12:04 PM
> […]
> Well, however the overtyping, or overlapping, is achieved (BS, GCC (is there any implementation of that at all? I would strongly recommend against it), pinching the glyph spacing (there is a control sequence for that in ECMA-48) too much, or simply via how the font’s glyphs are designed and spaced), there is no telling what the displayed result will be.
>[…]

Looking at Annex C of ECMA-43 (ECMA's designation for ISO 4873, in turn referenced from ISO 8859), GCC is only permitted because it is not supposed to create an effectively new character, but rather to “juxtapose” the characters in one position (i.e. force a ligature, which if unsupported could just be shown as a sequence of individual characters).

Similarly, BS is prohibited precisely because it overstamps to create a new character that the target system cannot be expected to support properly. Which you rightly mention, and which makes sense even for rendering, and let us not forget filename handling, narrator software for the visually impaired, _et cetera_…

The example it gives is using GCC on the sequence Pts to represent a ligature form (i.e. U+20A7).

ECMA-48 (ISO 6429) defines GCC's coded representation and parameters as a CSI sequence, and gives as a mere example the simplest case of triggering display of two characters side-by-side in one kanji width, i.e. what the Japanese era name ligatures, the CJK Compatibility block unit symbols, _et cetera_ do.

So apparently, GCC was (from what I can tell from the standards themselves) an attempt at defining a general mechanism for coding arbitrary ligatures and arbitrary CJK squared forms. Not character overstamping.

As a final note, I should probably mention that the best existing way to create an overstamped character cluster in HTML5 is probably to use embedded SVG. But for the reasons mentioned, this would inherently not be very good for accessibility.
________________________________
From: Unicode <unicode-bounces at unicode.org> on behalf of Kent Karlsson via Unicode <unicode at unicode.org>
Sent: Wednesday, June 10, 2020 5:12:04 PM
To: Mark E. Shoulson <mark at kli.org>
Cc: unicode at unicode.org <unicode at unicode.org>
Subject: Re: OverStrike control character

(You (all) apparently mean ”overtype” rather than ”overstrike”…; at least I read the latter as the same as crossed-out or strike-through.)

Well, however the overtyping, or overlapping, is achieved (BS, GCC (is there any implementation of that at all? I would strongly recommend against it), pinching the glyph spacing (there is a control sequence for that in ECMA-48) too much, or simply via how the font’s glyphs are designed and spaced), there is no telling what the displayed result will be.

Doing such things really passes from being text display (styled or not) into the realm of graphics. And sure, you can do lots of things in graphic design, also with overlapping ”graphic elements” (including glyphs for letters/digits/...). But as (possibly styled) TEXT, the displayed/printed result of overlaps would be ”implementation defined”. Please use a graphics editing program for controlling how overlapping graphic elements look like; for overlapping, you may want to use different layers, graphics editing programs often support ”layers”, for the graphic elements that overlap even if there are no layers when converting to (say) PNG.

(And for graphics, sorting, searching, and other text operations do not apply…; in HTML for images/graphics, you can have an ”alt” text, which may or may not, indicate what is in the image/graphics.)

/Kent Karlsson

> 10 juni 2020 kl. 14:59 skrev Mark E. Shoulson via Unicode <unicode at unicode.org>:
>
> On 6/9/20 10:50 PM, abrahamgross--- via Unicode wrote:
>> It should just simply overlay the pixels of the two characters, with a thin character going in the center of a wider character.
>
> What are these "pixels" to which you refer?  Fonts these days are defined in terms of strokes, not pixels.  And Richard Wordingham points out the flaw in your notion of how it would be rendered, your claim that x OS z would look the same as z OS x:
>
>> Consider <l, OVERSTRIKE, m> and <m, OVERSTRIKE, l> in a proportional
>> width font.  Are you expecting the rendering system to position the 'l'
>> using the knowledge that it will be overstruck? Overstriking is
>> designed for a teletype with fixed width characters.
> Besides, even if it worked as you said, with the narrow character centered, how long would it take before you found some examples that didn't really quite work out right?  Like overlaying a HEBREW LETTER YOD on a LATIN CAPITAL LETTER L, but what you really wanted was the YOD centered in the negative space of the L and not between the side-bearings, so next you'll want to be able to add some control over the exact positioning.  And of course that won't work right in general, because it all depends on the font(s) involved.
>
> And when it comes to matching, you say of x OS z and z OS x,
>
>> In a perfect world they would be identical for string matching, but since its a new control character I would understand if ppl don't want to put in the effort to adopt it properly.
>
> But we're talking about making the rules here, the "perfect world."  What should the *rule* be about string-matching?  You can't have an optional rule, so a pair of strings will match on one system and not the other and both are right.  Are we to understand that you think the rule should be that overstruck characters are considered to match in either order?  Your gracious forgiveness of laxity in the rules doesn't really enter into the picture.  And what about larger considerations?  Can I have "ab←←xy" (using ← for the overstrike) to overstrike a&x and b&y?  What about "a←b←c←d←e←f←g←h"?  What about "abc←d←←fg"? The f&b are overstruck and so are the c&d&g?  Is that combination of c←d overstruck with g different from c←d←g or the same?  What about other combinations?  These are all things that need answers.  What about overstriking a LTR character with a RTL one, or vice-versa?  Which way does the text go after that?
>
> But I think what you're really missing is the crucial point that Garth Wallace pointed out:
>
>> Display is not the only thing text is for.
>
> You're focussing a lot on how characters *look*, can we get this letter to look a little different, can we layer characters to make other weird symbols (which will look radically different depending on the font)... You're looking at how to _draw_ stuff, how to make things look this way or that on paper or when rendered.  But that's not what Unicode encodes.  You need to think more about the distinction Unicode makes between characters and glyphs.  "Plain text" isn't about display,  it's about representing what's in a document, the characters which encode (in a different sense) the spoken language (usually) that is being communicated.  All the things you're talking about are firmly in the realm of fonts and higher-level protocols.  You surely could work out this overstriking display with a sufficiently-advanced font (you could make zero-advance-width overlaying characters and ligatures that would replace X← with a zero-width equivalent of X, for example, in a monospace fo!
 nt), and you are welcome to do so, but that's where it belongs.
>> It shouldnt do any fancy processing by default
> Figuring out how much to backspace in order to center a glyph on another one, in a proportional-spaced font, is pretty fancy processing.
>> Most systems have just about the same font so I wouldn't worry about the results of overstriking not coming out perfect.
>
> What a bland world you live in, wherein most fonts are the same! It's not about working with the default font on your favorite system; we're dealing with _characters_ here, which could be represented in ANY font.
>
> ~mark
>
>
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/mailman/private/unicode/attachments/20200610/68f2aa7d/attachment.htm>


More information about the Unicode mailing list