Encoding italic (was: A last missing link)
Kent Karlsson via Unicode
unicode at unicode.org
Sat Jan 19 19:30:37 CST 2019
(I have skipped some messages in this thread, so maybe the following
has been pointed out already. Apologies for this message if so.)
You will not like this... But...
There is already a standardised, "character level" (well, it is from
a character standard, though a more modern view would be that it is
a higher level protocol) way of specifying italics (and bold, and
underline, and more):
\u001b[3mbla bla bla\u001b[0m
Terminal emulators implement some such escape sequences. The terminaI
emulators I use support bold (1 after the [) but not italic (3). Every time
use the "man"-command in a Linux/Unix/similar terminal you "use" the
escape sequences for bold and underline... Other terminal based programs
often use bold as well as colour esc-sequences for emphasis as well as for
warning/error messages, and other "hints" of various kinds. For xterm,
So I don't see these esc-sequences becoming obsolete any time soon.
But I don't foresee them being supported outside of terminal emulators
either... (Though for style esc-sequences it would certainly be possible.
And a "smart" cut-and-paste operation could auto-insert an esc-sequence
that sets the the style after the paste to the one before the paste...)
Had HTML (somehow, magically) been invented before terminals, maybe
terminals (terminal emulators) would have used some kind of "mini-HTML"
instead. But things are like they are on that point.
The cut-and-paste I used here convert (imperfectly: bold is lost and
spurious ! inserted) to HTML
(surely going through some internal attribute-based representation, the HTML
when I press send):
man - format and display the on-line manual pages
man [-acdfFhkKtwW] [--path] [-m system] [-p string] [-C
[-M pathlist] [-P pager] [-B browser] [-H htmlpager] [-S
[section] name ...
Den 2019-01-18 20:18, skrev "Asmus Freytag via Unicode"
<unicode at unicode.org>:
> I would full agree and I think Mark puts it really well in the message below
> why some of the proposals brandished here are no longer plain text but
> "not-so-plain" text.
> I think we are better served with a solution that provides some form of
> "light" rich text, for basic emphasis in short messages. The proper way for
> this would be some form of MarkDown standard shared across vendors, and
> perhaps implemented in a way that users don't necessarily need to type
> anything special, but that, if exported to "true" plain text, it turns into
> the source format for the "light" rich text.
> This is an effort that's out of scope for Unicode to implement, or, I should
> say, if the Consortium were to take it on, it would be a separate technical
> standard from The Unicode Standard.
> PS: I really hate the creeping expansion of pseudo-encoding via VS characters.
> The only worse thing is adding novel control functions.
> On 1/18/2019 7:51 AM, Mark E. Shoulson via Unicode wrote:
>> On 1/16/19 6:23 AM, Victor Gaultney via Unicode wrote:
>>> Encoding 'begin italic' and 'end italic' would introduce difficulties when
>>> partial strings are moved, etc. But that's no different than with current
>>> punctuation. If you select the second half of a string that includes an end
>>> quote character you end up with a mismatched pair, with the same problems of
>>> interpretation as selecting the second half of a string including an 'end
>>> italic' character. Apps have to deal with it, and do, as in code editors.
>> It kinda IS different. If you paste in half a string, you get a mismatched
>> or unmatched paren or quote or something. A typo, but a transient one. It
>> looks bad where it is, but everything else is unaffected. It's no worse than
>> hitting an extra key by mistake. If you paste in a "begin italic" and miss
>> the "end italic", though, then *all* your text from that point on is
>> affected! (Or maybe "all until a newline" or some other stopgap ending, but
>> that's just damage-control, not damage-prevention.) Suddenly, letters and
>> symbols five words/lines/paragraphs/pages look different, the pagination is
>> all altered (by far more than merely a single extra punctuation mark, since
>> italic fonts generally are narrower than roman). It's a disaster.
>> No. This kind of statefulness really is beyond what Unicode is designed to
>> cope with. Bidi controls are (almost?) the sole exception, and even they
>> cause their share of headaches. Encoding separate _text_ italics/bold is IMO
>> also a disastrous idea, but I'm not putting out reasons for that now. The
>> only really feasible suggestion I've heard is using a VS in some fashion.
>> (Maybe let it affect whole words instead of individual characters? Makes for
>> fewer noisy VSs, but introduces a whole other host of limitations (how to
>> italicize part of a word, how to italicize non-letters...) and is also just
>> damage-control, though stronger.)
>>> Apps (and font makers) can also choose how to deal with presenting strings
>>> of text that are marked as italic. They can choose to present visual symbols
>>> to indicate begin/end, such as /this/. Or they can present it using the
>>> italic variant of the font, if available.
>> At which point, you have invented markdown. Instead of making Unicode
>> declare it, just push for vendors everywhere to recognize /such notation/ as
>> italics (OK, I know, you want dedicated characters for it which can't be
>> confused for anything else.)
>>> - Those who develop plain text apps (social media in particular) don't have
>>> to build in a whole markup/markdown layer into their apps
>> With the complexity of writing an social media app, a markup layer is really
>> the least of the concerns when it comes to simplifying.
>>> - Misuse of math chars for pseudo-italic would likely disappear
>>> - The text runs between markers remain intact, so they need no special
>>> treatment in searching, selecting, etc.
>>> - It finally, and conclusively, would end the decades of the mess in HTML
>>> that surrounds <em> and <italic>.
>> Adding _another_ solution to something will *never* "conclusively end"
>> anything. On a good day, you can hope it will swamp the others, but they'll
>> remain at least in legacy. More likely, it will just add one more way to be
>> confused and another side to the mess. (People have pointed out here about
>> the difficulties of distinguishing or not-distinguishing between HTML-level
>> <i> and putative plain-text italics. And yes, that is an issue, and one that
>> already exists with styling that can change case and such. As with anything,
>> the question is not whether there are going to be problems, but how those
>> problems weigh against potential benefits. That's an open question.)
>>> My main point in suggesting that Unicode needs these characters is that
>>> italic has been used to indicate specific meaning - this text is somehow
>>> special - for over 400 years, and that content should be preserved in plain
>> There is something to this: people have been *emphasizing* text in some
>> fashion or another for ages. There is room to call this plain text.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode