Italics get used to express important semantic meaning, so unicode should support them

Sławomir Osipiuk sosipiuk at gmail.com
Mon Dec 14 22:36:03 CST 2020


On Mon, Dec 14, 2020 at 8:05 PM Mark E. Shoulson via Unicode
<unicode at unicode.org> wrote:
>
> All TAG symbols placed between a U+E003D TAG LESS-THAN SIGN and a U+E003E TAG GREATER-THAN SIGN, inclusive, are to be treated as if they were they corresponding ASCII characters, and run that through an HTML renderer.  I guess if you wanted you could stipulate some reduced or restricted subset of HTML

I've been informed off-list that BabelPad uses this as a formatting
option. So, it's been done.
This solution technically constitutes a higher-level protocol anyway.
It's a markup language, just using unusual characters, but it's not in
any fundamental way a Unicode feature, official or not.

> If this sounds disturbing and wrong to you,

Disturbing? No. Wrong? I'd say "not my first choice". There are plenty
of things already approved that actually disturb me, but I won't go on
that tangent now.

> then other pseudo-markup ideas probably should as well.

Pseudo-markup already exists in Unicode, in multiple, inconsistent
ways. It exists because it was, at some point, by some people, deemed
useful enough and compatible enough with the aims of Unicode to be
included. I'm boggled by how annotations got in.

I'm well aware of scope creep and I'm not at all in favour of making
Unicode a Turing-complete programming language. That's why I proposed
something that fits into an already-established method that Unicode
has already defined. It even includes a bit of syntactic salt in the
way format nesting must be done that drives implementers to other
protocols for anything beyond rudimentary effects.

My guiding example is, "record fully the story text of a paperback
novel". There are things that are irrelevant for this purpose, such as
choice of font, or drop caps ("fancy first letters"), or page numbers,
or sizing of chapter titles, etc., etc.. Even something like
monospaced text is almost always used purely stylistically (to
indicate in-story things like signage, computer output, telegrams.)
and can be substituted with imagination by engaged readers. But
italics or underlines are often a meaningful part of text and
something is lost when that formatting is lost. Necessitating a
higher-level protocol for something so simple, when it can be easily
accommodated through an existing Unicode framework, is needlessly
conservative.

The thread-starter, Christian Kleineidam, gave a different use case
but I think it's a valid one as well. I think this would be an easy
win with not a whole lot of downside.

Reading the room here, not many agree. C'est la vie.

Cheers,
Sławomir Osipiuk



More information about the Unicode mailing list