Encoding italic (was: A last missing link)
Mark E. Shoulson via Unicode
unicode at unicode.org
Fri Jan 18 09:51:18 CST 2019
On 1/16/19 6:23 AM, Victor Gaultney via Unicode wrote:
>
> Encoding 'begin italic' and 'end italic' would introduce difficulties
> when partial strings are moved, etc. But that's no different than with
> current punctuation. If you select the second half of a string that
> includes an end quote character you end up with a mismatched pair,
> with the same problems of interpretation as selecting the second half
> of a string including an 'end italic' character. Apps have to deal
> with it, and do, as in code editors.
>
It kinda IS different. If you paste in half a string, you get a
mismatched or unmatched paren or quote or something. A typo, but a
transient one. It looks bad where it is, but everything else is
unaffected. It's no worse than hitting an extra key by mistake. If you
paste in a "begin italic" and miss the "end italic", though, then *all*
your text from that point on is affected! (Or maybe "all until a
newline" or some other stopgap ending, but that's just damage-control,
not damage-prevention.) Suddenly, letters and symbols five
words/lines/paragraphs/pages look different, the pagination is all
altered (by far more than merely a single extra punctuation mark, since
italic fonts generally are narrower than roman). It's a disaster.
No. This kind of statefulness really is beyond what Unicode is designed
to cope with. Bidi controls are (almost?) the sole exception, and even
they cause their share of headaches. Encoding separate _text_
italics/bold is IMO also a disastrous idea, but I'm not putting out
reasons for that now. The only really feasible suggestion I've heard is
using a VS in some fashion. (Maybe let it affect whole words instead of
individual characters? Makes for fewer noisy VSs, but introduces a
whole other host of limitations (how to italicize part of a word, how to
italicize non-letters...) and is also just damage-control, though stronger.)
> Apps (and font makers) can also choose how to deal with presenting
> strings of text that are marked as italic. They can choose to present
> visual symbols to indicate begin/end, such as /this/. Or they can
> present it using the italic variant of the font, if available.
>
At which point, you have invented markdown. Instead of making Unicode
declare it, just push for vendors everywhere to recognize /such
notation/ as italics (OK, I know, you want dedicated characters for it
which can't be confused for anything else.)
> - Those who develop plain text apps (social media in particular) don't
> have to build in a whole markup/markdown layer into their apps
>
With the complexity of writing an social media app, a markup layer is
really the least of the concerns when it comes to simplifying.
>
> - Misuse of math chars for pseudo-italic would likely disappear
>
> - The text runs between markers remain intact, so they need no special
> treatment in searching, selecting, etc.
>
> - It finally, and conclusively, would end the decades of the mess in
> HTML that surrounds <em> and <italic>.
>
Adding _another_ solution to something will *never* "conclusively end"
anything. On a good day, you can hope it will swamp the others, but
they'll remain at least in legacy. More likely, it will just add one
more way to be confused and another side to the mess. (People have
pointed out here about the difficulties of distinguishing or
not-distinguishing between HTML-level <i> and putative plain-text
italics. And yes, that is an issue, and one that already exists with
styling that can change case and such. As with anything, the question
is not whether there are going to be problems, but how those problems
weigh against potential benefits. That's an open question.)
> My main point in suggesting that Unicode needs these characters is
> that italic has been used to indicate specific meaning - this text is
> somehow special - for over 400 years, and that content should be
> preserved in plain text.
>
There is something to this: people have been *emphasizing* text in some
fashion or another for ages. There is room to call this plain text.
~mark
More information about the Unicode
mailing list