Encoding italic (was: A last missing link)

James Kass via Unicode unicode at unicode.org
Wed Jan 16 21:38:46 CST 2019


Victor Gaultney wrote,

 > Treating italic like punctuation is a win for a lot of people:

Italic Unicode encoding is a win for a lot of people regardless of 
approach.  Each of the listed wins remains essentially true whether 
treated as punctuation, encoded atomically, or selected with VS.

 > My main point in suggesting that Unicode needs these characters is that
 > italic has been used to indicate specific meaning - this text is somehow
 > special - for over 400 years, and that content should be preserved in 
plain
 > text.

( http://www.unicode.org/versions/Unicode11.0.0/ch02.pdf )

"Plain text must contain enough information to permit the text to be 
rendered legibly, and nothing more."

The argument is that italic information can be stripped yet still be 
read.  A persuasive argument towards encoding would need to negate that; 
it would have to be shown that removing italic information results in a 
loss of meaning.

The decision makers at Unicode are familiar with italic use conventions 
such as those shown in "The Chicago Manual of Style" (first published in 
1906).  The question of plain-text italics has arisen before on this 
list and has been quickly dismissed.

Unicode began with the idea of standardizing existing code pages for the 
exchange of computer text using a unique double-byte encoding rather 
than relying on code page switching.  Latin was "grandfathered" into the 
standard.  Nobody ever submitted a formal proposal for Basic Latin.  
There was no outreach to establish contact with the user community -- 
the actual people who used the script as opposed to the "computer nerds" 
who grew up with ANSI limitations and subsequent ISO code pages.  
Because that's how Unicode rolled back then.  Unicode did what it was 
supposed to do WRT Basic Latin.

When someone points out that italics are used for disambiguation as well 
as stress, the replies are consistent.

"That's not what plain-text is for."  "That's not how plain-text 
works."  "That's just styling and so should be done in rich-text." 
"Since we do that in rich-text already, there's no reason to provide for 
it in plain-text."  "You can already hack it in plain-text by enclosing 
the string with slashes."  And so it goes.

But if variant letter form information is stripped from a string like 
"Jackie Brown", the primary indication that the string represents either 
a person's name or a Tarantino flick title is also stripped.  "Thorstein 
Veblen" is either a dead economist or the name of a fictional yacht in 
the Travis McGee series.  And so forth.

Computer text tradition aside, nobody seems to offer any legitimate 
reason why such information isn't worthy of being preservable in 
plain-text.  Perhaps there isn't one.

I'm not qualified to assess the impact of italic Unicode inclusion on 
the rich-text world as mentioned by David Starner.  Maybe another list 
member will offer additional insight or a second opinion.



More information about the Unicode mailing list