<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-cite-prefix">On 12/21/2020 1:08 AM, Martin J. Dürst
via Unicode wrote:<br>
</div>
<blockquote type="cite"
cite="mid:e68d936e-8158-0d47-fbb2-c713b3264563@it.aoyama.ac.jp">Hello
David, others,
<br>
<br>
On 20/12/2020 16:23, David Starner via Unicode wrote:
<br>
<blockquote type="cite">On Sat, Dec 19, 2020 at 4:49 AM Otto Stolz
via Unicode
<br>
<a class="moz-txt-link-rfc2396E" href="mailto:unicode@unicode.org"><unicode@unicode.org></a> wrote:
<br>
<blockquote type="cite">A notorious German example:
<br>
Er hat in Moskau liebe Genossen. (= He’s got dear comrades
at Moskow)
<br>
Er hat in Moskau Liebe genossen. (= He has enjoyed love at
Moskow)
<br>
(And I assure you, the prosody varies accordingly, hence
the
<br>
difference is quite clear in speech, and must be preserved
<br>
in writing.)
<br>
</blockquote>
<br>
She _loves_ him !?! (= I can't believe her emotion towards him
is love.)
<br>
She loves _him_ !?! (= I can't believe that he is the one she
loves,
<br>
and not someone else.)
<br>
<br>
And the prosody varies accordingly, and any accurate
preservation in
<br>
writing would need to record the difference.
<br>
</blockquote>
<br>
I think the above "and most be preserved in writing" is easy to
misunderstand, as it is a bit too strong. It wouldn't have been
preserved on very early computers (or earlier, in telegrams) that
only used upper case. But there was a very strong expectation that
it would be preserved on things as simple as a typewriter, and
definitely also in handwriting.
<br>
<br>
On the other hand, there is no such expectation for your example.
If prosody has to be reconstructed, that might happen e.g. from
context (e.g. in a playscript), or the sentences might have been
rewritten for clarity in the first place.
<br>
<br>
I don't think there is a single writing system that is able to
denote every aspect of spoken language. When compared with spoken
language, most writing systems leave something out. (Some may also
add something, e.g. distinction of some homonyms.)
<br>
</blockquote>
<p><br>
</p>
<p>The difference in the examples is rather fundamental. In the
first, the two meanings are utterly unrelated. In the second, they
differ less in the fundamental statement, but in the speaker's
presumed attitude. Stressing a word can sometimes disambiguate how
a sentence is to be interpreted, but not always. There are many
more situations where we rely on context to know how to "read" a
statement where there isn't a simple way to annotate that.</p>
<p>German can, of course, be written in all lowercase (if you care
to) and yes, you will come across statements where you don't know
what is being said, let alone "how" it is being said (or
intended). There are periodic discussions that have a slightly
circular quality to them: because of the way German orthography
uses case, you can write things in a certain way (and Readers
expect the cues and are used to the style of written language it
allows).<br>
<br>
If all-lowercase use were to be enforced, you would see people
avoid ambiguous wording; the written language would change. To
some degree. It's unclear, ahead of actually carrying out the
experiment, how intrusive these changes would turn out to be.
Outright pairs of sentences that differ only in case are not
common, but German has a rather flexible word order, so visible
cues about which words are nouns may contribute in more
significant ways to readability than would be the case for
languages with a more strict word order.</p>
<p>As things stand, case is necessarily part of the orthography and
w/o deep semantic analysis, cannot be "computed" as someone had
suggested. Technically, it could be captured as an attribute (and
if all implementations supported that seamlessly users wouldn't
care) However, it can be argued that use of upper case letters is
just as much part of "spelling" as letter choice and in some
applications users intuitively treat the uppercase letters as
extensions of a set, rather than a style. <br>
</p>
<p>Disambiguating prosody in some exceptional cases to settle the
desired interpretation of a statement is not in the same (basic
orthographic) category -- relegating it to a layer (rich text)
that is designed to handle many similar tasks is and remains the
proper approach.<br>
</p>
<blockquote type="cite"
cite="mid:e68d936e-8158-0d47-fbb2-c713b3264563@it.aoyama.ac.jp">
<br>
<br>
<blockquote type="cite">
<blockquote type="cite">As only the author (and no other stage,
be it human or automatic) can
<br>
know the intended meaning, Unicode is quite right when
encoding the case
<br>
distinction.
<br>
</blockquote>
<br>
Meh. I could come up with similar examples, though probably a
bit more
<br>
contrived, for just about every bit of markup. Italics/emphasis
has a
<br>
bunch of pretty clear meaning changes, like the example above,
<br>
possibly more than casing in English. Fraktur/Antiqua mixing
allows
<br>
for any number of examples; "<fraktur>Er
was</fraktur> clever." is
<br>
different from "<fraktur>Er was clever</fraktur>".*
Casing certainly
<br>
had more of an argument to be encoded in the character set than
<br>
italics, historically,
<br>
</blockquote>
<br>
Exactly.
<br>
<br>
<br>
<blockquote type="cite">but I can imagine an alternate history,
maybe
<br>
one the leaders in computing history used a non-casing script,
where
<br>
casing was relegated to markup, and a lot of issues would be
<br>
easier--no more problems with case-insensitive matching, and the
<br>
Turkish i would be a font difference under markup.
<br>
</blockquote>
<br>
An alternate history indeed. The history we followed gave us
italics relegated to markup, and avoided the problems with
italic-insensitive matching. And please note that your alternate
history does NOT lead to technology that encodes italics
separately. [And that I was perfectly able to put stress on a word
in the previous sentence without italics, even if the main purpose
of that was just to make a point.] Also, it's not clear that
encoders starting with a non-casing script would have decided to
relegate casing to markup. It's pretty annoying to markup single
letters, and to change the markup when a word moves to the start
of a sentence, and these are the main uses for upper case.
<br>
</blockquote>
<br>
<p>While users on some level don't care how something is
implemented, and only about how it is exposed to them, there are
side effects of making an underlying choice that tend to "leak".
It is at those points that a technical solution that obeys the
"least astonishment" principle will ultimately be superior, all
things being equal. <br>
</p>
<p>A./<br>
</p>
<blockquote type="cite"
cite="mid:e68d936e-8158-0d47-fbb2-c713b3264563@it.aoyama.ac.jp">
<br>
<br>
<blockquote type="cite">* Italics marking in English could serve
the same role in making a
<br>
bunch of examples; e.g. "The French man said to stop at the
coin" and
<br>
"The French man said to stop at the <i>coin</i>."
mean different
<br>
things.
<br>
</blockquote>
<br>
The important thing here is "could". Unicode doesn't invent
writing systems. And I have to admit that I don't understand the
difference between these two sentences even with your italic
markup. But that may be only me.
<br>
<br>
Regards, Martin.
<br>
</blockquote>
<p><br>
</p>
</body>
</html>