Italics get used to express important semantic meaning, so unicode should support them

Martin J. Dürst duerst at it.aoyama.ac.jp
Mon Dec 21 03:08:08 CST 2020


Hello David, others,

On 20/12/2020 16:23, David Starner via Unicode wrote:
> On Sat, Dec 19, 2020 at 4:49 AM Otto Stolz via Unicode
> <unicode at unicode.org> wrote:
>> A notorious German example:
>>     Er hat in Moskau liebe Genossen. (= He’s got dear comrades at Moskow)
>>     Er hat in Moskau Liebe genossen. (= He has enjoyed love at Moskow)
>>     (And I assure you, the prosody varies accordingly, hence the
>>     difference is quite clear in speech, and must be preserved
>>     in writing.)
> 
> She _loves_ him !?! (= I can't believe her emotion towards him is love.)
> She loves _him_ !?! (= I can't believe that he is the one she loves,
> and not someone else.)
> 
> And the prosody varies accordingly, and any accurate preservation in
> writing would need to record the difference.

I think the above "and most be preserved in writing" is easy to 
misunderstand, as it is a bit too strong. It wouldn't have been 
preserved on very early computers (or earlier, in telegrams) that only 
used upper case. But there was a very strong expectation that it would 
be preserved on things as simple as a typewriter, and definitely also in 
handwriting.

On the other hand, there is no such expectation for your example. If 
prosody has to be reconstructed, that might happen e.g. from context 
(e.g. in a playscript), or the sentences might have been rewritten for 
clarity in the first place.

I don't think there is a single writing system that is able to denote 
every aspect of spoken language. When compared with spoken language, 
most writing systems leave something out. (Some may also add something, 
e.g. distinction of some homonyms.)


>> As only the author (and no other stage, be it human or automatic) can
>> know the intended meaning, Unicode is quite right when encoding the case
>> distinction.
> 
> Meh. I could come up with similar examples, though probably a bit more
> contrived, for just about every bit of markup. Italics/emphasis has a
> bunch of pretty clear meaning changes, like the example above,
> possibly more than casing in English. Fraktur/Antiqua mixing allows
> for any number of examples; "<fraktur>Er was</fraktur> clever." is
> different from "<fraktur>Er was clever</fraktur>".* Casing certainly
> had more of an argument to be encoded in the character set than
> italics, historically,

Exactly.


> but I can imagine an alternate history, maybe
> one the leaders in computing history used a non-casing script, where
> casing was relegated to markup, and a lot of issues would be
> easier--no more problems with case-insensitive matching, and the
> Turkish i would be a font difference under markup.

An alternate history indeed. The history we followed gave us italics 
relegated to markup, and avoided the problems with italic-insensitive 
matching. And please note that your alternate history does NOT lead to 
technology that encodes italics separately. [And that I was perfectly 
able to put stress on a word in the previous sentence without italics, 
even if the main purpose of that was just to make a point.] Also, it's 
not clear that encoders starting with a non-casing script would have 
decided to relegate casing to markup. It's pretty annoying to markup 
single letters, and to change the markup when a word moves to the start 
of a sentence, and these are the main uses for upper case.


> * Italics marking in English could serve the same role in making a
> bunch of examples; e.g. "The French man said to stop at the coin" and
> "The French man said to stop at the <i>coin</i>." mean different
> things.

The important thing here is "could". Unicode doesn't invent writing 
systems. And I have to admit that I don't understand the difference 
between these two sentences even with your italic markup. But that may 
be only me.

Regards,   Martin.


More information about the Unicode mailing list