Encoding italic (was: A last missing link)
James Kass via Unicode
unicode at unicode.org
Wed Jan 16 21:38:46 CST 2019
Victor Gaultney wrote,
> Treating italic like punctuation is a win for a lot of people:
Italic Unicode encoding is a win for a lot of people regardless of
approach. Each of the listed wins remains essentially true whether
treated as punctuation, encoded atomically, or selected with VS.
> My main point in suggesting that Unicode needs these characters is that
> italic has been used to indicate specific meaning - this text is somehow
> special - for over 400 years, and that content should be preserved in
plain
> text.
( http://www.unicode.org/versions/Unicode11.0.0/ch02.pdf )
"Plain text must contain enough information to permit the text to be
rendered legibly, and nothing more."
The argument is that italic information can be stripped yet still be
read. A persuasive argument towards encoding would need to negate that;
it would have to be shown that removing italic information results in a
loss of meaning.
The decision makers at Unicode are familiar with italic use conventions
such as those shown in "The Chicago Manual of Style" (first published in
1906). The question of plain-text italics has arisen before on this
list and has been quickly dismissed.
Unicode began with the idea of standardizing existing code pages for the
exchange of computer text using a unique double-byte encoding rather
than relying on code page switching. Latin was "grandfathered" into the
standard. Nobody ever submitted a formal proposal for Basic Latin.
There was no outreach to establish contact with the user community --
the actual people who used the script as opposed to the "computer nerds"
who grew up with ANSI limitations and subsequent ISO code pages.
Because that's how Unicode rolled back then. Unicode did what it was
supposed to do WRT Basic Latin.
When someone points out that italics are used for disambiguation as well
as stress, the replies are consistent.
"That's not what plain-text is for." "That's not how plain-text
works." "That's just styling and so should be done in rich-text."
"Since we do that in rich-text already, there's no reason to provide for
it in plain-text." "You can already hack it in plain-text by enclosing
the string with slashes." And so it goes.
But if variant letter form information is stripped from a string like
"Jackie Brown", the primary indication that the string represents either
a person's name or a Tarantino flick title is also stripped. "Thorstein
Veblen" is either a dead economist or the name of a fictional yacht in
the Travis McGee series. And so forth.
Computer text tradition aside, nobody seems to offer any legitimate
reason why such information isn't worthy of being preservable in
plain-text. Perhaps there isn't one.
I'm not qualified to assess the impact of italic Unicode inclusion on
the rich-text world as mentioned by David Starner. Maybe another list
member will offer additional insight or a second opinion.
More information about the Unicode
mailing list