Encoding italic (was: A last missing link)

Thu Jan 17 04:51:35 CST 2019

( I appreciate that UTC meetings are going on - I too will be traveling 
a bit over the next couple of weeks, so may not respond quickly. )

Support for marking 'italic' in plain text - however it's done - would 
certainly require changes in text processing. That would also be the 
case for some of the other span-like issues others have mentioned. 
However a clear model for how to handle that could solve all the issues 
at once. Italic would only be one application of that model, and only 
applicable to certain scripts. Other scripts might have parallel issues. 
BTW - I'm speaking only about span-like things that encode content, not 
the additional level of rich-text presentation.

If however, we say that this "does not adequately consider the harm done 
to the text-processing model that underlies Unicode", then that exposes 
a weakness in that model. That may be a weakness that we have to accept 
for a variety of reasons (technical difficulty, burden on developers, UI 
impact, cost, maturity).

We then have to honestly admit that the current model cannot always 
unambiguously encode text content in English and many other languages. 
It is impossible to express Crystal's distinction between 'red slippers' 
and '/red/ slippers' in plain text without using other characters in 
non-standardized ways. Here I am using my favourite technique for this - 
/slashes/.

There are other uses of italic that indicate difference in actual 
meaning, many that go back centuries, and for which other span-like 
punctuation like quotes aren't used. Examples:

- Titles of books, films, compositions, works of art: 'Daredevil' - the 
Marvel comics character vs. '/Daredevil/' - the Netflix series.

- Internal voice, such as a character's private thoughts within a 
narrative: 'She pulled out a knife. /What are you doing? How did you 
find out.../'

- Change of author/speaker, as in editorial comments: '/The following 
should be considered.../'

- Heavy stress in speech, which is different than Crystal's distinction: 
'Come here /this instant/'

- Examples: 'The phrase /I could care less/...' (quotes are sometimes 
used for this one)

Is it important to preserve these distinctions in plain text? The text 
seems 'readable' without them, but that requires some knowledge of 
context. And without some sort of other marking, as I've done, some of 
the meaning is lost. This is why italics within text have always been 
considered an editorial decision, not a typesetting one.

In a similar way, we really don't need to include diacritics when 
encoding French. In all but a few rare cases, French is perfectly 
'readable' without accents - the content can usually be inferred from 
context. Yet we would never consider unaccented French to be correct.

More evidence for italics as an important element within encoded text 
comes from current use. A couple of years ago I collected every tweet 
that referred to italics for a month. People frequently complained that 
they were not able to express themselves fully without italics, and 
resorted to 40 different techniques to try and mark words and phrases as 
'italic'.

In the current model, plain text cannot fully preserve important 
distinctions in content. Maybe we just need to admit and accept that. 
But maybe an enhancement to the text processing model would enable more 
complete encoding of content, both for italics in Latin script and for 
other features in other scripts.

As for how the UIs of the world would need to change: Until there is a 
way to encode italic in plain text there's no motivation for people to 
even experiment and innovate.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20190117/6f444ed7/attachment.html>