Encoding italic (was: A last missing link)
Victor Gaultney via Unicode
unicode at unicode.org
Thu Jan 17 04:51:35 CST 2019
( I appreciate that UTC meetings are going on - I too will be traveling
a bit over the next couple of weeks, so may not respond quickly. )
Support for marking 'italic' in plain text - however it's done - would
certainly require changes in text processing. That would also be the
case for some of the other span-like issues others have mentioned.
However a clear model for how to handle that could solve all the issues
at once. Italic would only be one application of that model, and only
applicable to certain scripts. Other scripts might have parallel issues.
BTW - I'm speaking only about span-like things that encode content, not
the additional level of rich-text presentation.
If however, we say that this "does not adequately consider the harm done
to the text-processing model that underlies Unicode", then that exposes
a weakness in that model. That may be a weakness that we have to accept
for a variety of reasons (technical difficulty, burden on developers, UI
impact, cost, maturity).
We then have to honestly admit that the current model cannot always
unambiguously encode text content in English and many other languages.
It is impossible to express Crystal's distinction between 'red slippers'
and '/red/ slippers' in plain text without using other characters in
non-standardized ways. Here I am using my favourite technique for this -
There are other uses of italic that indicate difference in actual
meaning, many that go back centuries, and for which other span-like
punctuation like quotes aren't used. Examples:
- Titles of books, films, compositions, works of art: 'Daredevil' - the
Marvel comics character vs. '/Daredevil/' - the Netflix series.
- Internal voice, such as a character's private thoughts within a
narrative: 'She pulled out a knife. /What are you doing? How did you
- Change of author/speaker, as in editorial comments: '/The following
should be considered.../'
- Heavy stress in speech, which is different than Crystal's distinction:
'Come here /this instant/'
- Examples: 'The phrase /I could care less/...' (quotes are sometimes
used for this one)
Is it important to preserve these distinctions in plain text? The text
seems 'readable' without them, but that requires some knowledge of
context. And without some sort of other marking, as I've done, some of
the meaning is lost. This is why italics within text have always been
considered an editorial decision, not a typesetting one.
In a similar way, we really don't need to include diacritics when
encoding French. In all but a few rare cases, French is perfectly
'readable' without accents - the content can usually be inferred from
context. Yet we would never consider unaccented French to be correct.
More evidence for italics as an important element within encoded text
comes from current use. A couple of years ago I collected every tweet
that referred to italics for a month. People frequently complained that
they were not able to express themselves fully without italics, and
resorted to 40 different techniques to try and mark words and phrases as
In the current model, plain text cannot fully preserve important
distinctions in content. Maybe we just need to admit and accept that.
But maybe an enhancement to the text processing model would enable more
complete encoding of content, both for italics in Latin script and for
other features in other scripts.
As for how the UIs of the world would need to change: Until there is a
way to encode italic in plain text there's no motivation for people to
even experiment and innovate.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode