Encoding italic (was: A last missing link)

Mark E. Shoulson via Unicode unicode at unicode.org
Fri Jan 18 09:51:18 CST 2019


On 1/16/19 6:23 AM, Victor Gaultney via Unicode wrote:
>
> Encoding 'begin italic' and 'end italic' would introduce difficulties 
> when partial strings are moved, etc. But that's no different than with 
> current punctuation. If you select the second half of a string that 
> includes an end quote character you end up with a mismatched pair, 
> with the same problems of interpretation as selecting the second half 
> of a string including an 'end italic' character. Apps have to deal 
> with it, and do, as in code editors.
>
It kinda IS different.  If you paste in half a string, you get a 
mismatched or unmatched paren or quote or something.  A typo, but a 
transient one.  It looks bad where it is, but everything else is 
unaffected.  It's no worse than hitting an extra key by mistake. If you 
paste in a "begin italic" and miss the "end italic", though, then *all* 
your text from that point on is affected!  (Or maybe "all until a 
newline" or some other stopgap ending, but that's just damage-control, 
not damage-prevention.)  Suddenly, letters and symbols five 
words/lines/paragraphs/pages look different, the pagination is all 
altered (by far more than merely a single extra punctuation mark, since 
italic fonts generally are narrower than roman).  It's a disaster.

No.  This kind of statefulness really is beyond what Unicode is designed 
to cope with.  Bidi controls are (almost?) the sole exception, and even 
they cause their share of headaches.  Encoding separate _text_ 
italics/bold is IMO also a disastrous idea, but I'm not putting out 
reasons for that now.  The only really feasible suggestion I've heard is 
using a VS in some fashion. (Maybe let it affect whole words instead of 
individual characters?  Makes for fewer noisy VSs, but introduces a 
whole other host of limitations (how to italicize part of a word, how to 
italicize non-letters...) and is also just damage-control, though stronger.)

> Apps (and font makers) can also choose how to deal with presenting 
> strings of text that are marked as italic. They can choose to present 
> visual symbols to indicate begin/end, such as /this/. Or they can 
> present it using the italic variant of the font, if available.
>
At which point, you have invented markdown.  Instead of making Unicode 
declare it, just push for vendors everywhere to recognize /such 
notation/ as italics (OK, I know, you want dedicated characters for it 
which can't be confused for anything else.)


> - Those who develop plain text apps (social media in particular) don't 
> have to build in a whole markup/markdown layer into their apps
>
With the complexity of writing an social media app, a markup layer is 
really the least of the concerns when it comes to simplifying.
>
> - Misuse of math chars for pseudo-italic would likely disappear
>
> - The text runs between markers remain intact, so they need no special 
> treatment in searching, selecting, etc.
>
> - It finally, and conclusively, would end the decades of the mess in 
> HTML that surrounds <em> and <italic>.
>
Adding _another_ solution to something will *never* "conclusively end" 
anything.  On a good day, you can hope it will swamp the others, but 
they'll remain at least in legacy.  More likely, it will just add one 
more way to be confused and another side to the mess.  (People have 
pointed out here about the difficulties of distinguishing or 
not-distinguishing between HTML-level <i> and putative plain-text 
italics.  And yes, that is an issue, and one that already exists with 
styling that can change case and such.  As with anything, the question 
is not whether there are going to be problems, but how those problems 
weigh against potential benefits.  That's an open question.)

> My main point in suggesting that Unicode needs these characters is 
> that italic has been used to indicate specific meaning - this text is 
> somehow special - for over 400 years, and that content should be 
> preserved in plain text.
>
There is something to this: people have been *emphasizing* text in some 
fashion or another for ages.  There is room to call this plain text.

~mark



More information about the Unicode mailing list