Encoding italic (was: A last missing link)

Victor Gaultney via Unicode unicode at unicode.org
Wed Jan 16 05:23:59 CST 2019


James Kass wrote:
> Concerns about statefulness in plain-text exist.  Treating "italic" as 
> an opening/closing "punctuation" may help get around such concerns. 
> IIRC, it was proposed that the Egyptian cartouche be handled that way.

I do appreciate the technical issues surrounding statefulness and user 
expectation when they select, copy, and paste. However that has always 
been an issue. The Latin script (and many others) already has 'states', 
and that is reflected in the encoding of the markers that indicate the 
beginning and end of those states (parens, quotes, etc.). In the Latin 
script those markers are visually represented as separate glyphs, 
although sometimes enterprising font makers will use OpenType or 
Graphite to adjust those glyphs in context.

Encoding 'begin italic' and 'end italic' would introduce difficulties 
when partial strings are moved, etc. But that's no different than with 
current punctuation. If you select the second half of a string that 
includes an end quote character you end up with a mismatched pair, with 
the same problems of interpretation as selecting the second half of a 
string including an 'end italic' character. Apps have to deal with it, 
and do, as in code editors.

Apps (and font makers) can also choose how to deal with presenting 
strings of text that are marked as italic. They can choose to present 
visual symbols to indicate begin/end, such as /this/. Or they can 
present it using the italic variant of the font, if available. Yes that 
brings up the issue of what to do if no italic counterpart is there. But 
that's already an issue with people using math characters for 
pseudo-italic. I'd guess that far, far more fonts in the world have 
italic counterparts than contain math chars, and the trend toward always 
having roman/italic matched pairs is something I've established in my 
research interviews.

Treating italic like punctuation is a win for a lot of people:

- Users get their italic content preserved in plain text

- Those who develop plain text apps (social media in particular) don't 
have to build in a whole markup/markdown layer into their apps

- Misuse of math chars for pseudo-italic would likely disappear

- The text runs between markers remain intact, so they need no special 
treatment in searching, selecting, etc.

- It finally, and conclusively, would end the decades of the mess in 
HTML that surrounds <em> and <italic>.

My main point in suggesting that Unicode needs these characters is that 
italic has been used to indicate specific meaning - this text is somehow 
special - for over 400 years, and that content should be preserved in 
plain text.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20190116/fb82daa7/attachment.html>


More information about the Unicode mailing list