Encoding italic

Adam Borowski via Unicode unicode at unicode.org
Tue Jan 22 00:40:52 CST 2019

On Mon, Jan 21, 2019 at 12:29:42AM -0800, David Starner via Unicode wrote:
> On Sun, Jan 20, 2019 at 11:53 PM James Kass via Unicode
> <unicode at unicode.org> wrote:
> >  Even though /we/ know how to do
> > it and have software installed to help us do it.
> You're emailing from Gmail, which has support for italics in email.

... and how exactly can they send italics in an e-mail?  All they can do is
to bundle a web page as an attachment, which some clients display instead of
the main text.

The e-mail's body text supports anything Unicode does, including ������������ and
even ������ ������������, but, remarkably, not italic umlauted characters, thai nor

> > Splendidly!  (smile)  Social platforms, plain-text editors, and other
> > applications do enhance their interfaces based on user demand from time
> > to time.  User demand, at least on Twitter, seems established.
> Then it would take six months, tops, for Twitter to produce and
> release a rich-text interface for Twitter. Far less time than waiting
> for Unicode to get around to it.

Similar to many mail clients, Twitter does have a rich-text interface. 
It will present that rich-text as a link -- it will even has specific
support to reduce the full URL to conserve the character count.

But the primary interface is plain text, which unlike anything "rich" is
interoperable with pretty much anything.

> > Copy/pasting from a web page into a plain-text editor removes any
> > italics content, which is currently expected behavior.  Opinions differ
> > as to whether that represents mere format removal or a loss of meaning.
> > Those who consider it as a loss of meaning would perceive a problem with
> > interoperability.
> Copy/pasting from a web page into a plain-text editor removes any
> pictures and destuctures tables, which definitely loses meaning.
> It also removes strike-out markup, which can have an even more
> dramatic effect on meaning than removing italics. As you pointed out
> below, it removes superscripts and subscripts; unless you wish to
> press for automatic conversion of those to Unicode, that's going to
> continue happening. It drops bold and font changes, and any number of
> other things that can carry meaning.

Ie, any non-standard additions.  There's a common base that's supposed to be
interoperable, developed by a certain consortium -- and that base is pretty
much guaranteed to work everywhere.  Even if a specific display engine can't
display some fancier elements, at least the underlying transport will
transfer the text unmolested.  There still are some issues here and there
(like eg. people rejecting UCS2/UTF-16 on Windows which Microsoft insisted
on, thus UTF-8 as system encoding is a new thing there and AFAIK even not
the default yet AFAIK) -- but pretty much we're there.  Last holdouts of
ancient encodings are dying fast.

There's a need to agree on a boundary between "this is what all means of
interchange are supposed to support" and "fancy client-specific markup",
and Unicode served at defining the former admirably.

⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands
⢿⡄⠘⠷⠚⠋⠀ for Privacy.

More information about the Unicode mailing list