<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-cite-prefix">On 10/21/2021 3:40 PM, Mark E. Shoulson
via Unicode wrote:<br>
</div>
<blockquote type="cite"
cite="mid:247f65e7-d6c2-5fa5-1f26-5c713e261cf5@shoulson.com">If I
recall correctly, someone has proved that "fully automatic
high-quality translation" is AI-hard. Meaning that it's basically
the same as making a fully aware, human-intelligence AI. Now,
that probably depends a lot on the details of "high-quality."
There are probably sentences and texts one could cook up that a
would-be translator would need arbitrarily good understanding of
the context, situation, shared cultural memories and references,
etc etc for, and I guess that would be what the "proof" was
about. </blockquote>
<p>Sentences that require some understanding of the meaning for a
successful translation, even if you only consider factual
accuracy, are not hard to come by: they do prop up regularly.</p>
<p>An easily understood example are sentences that use two distinct,
but close terms in contrast. If both have strongly equivalent, but
distinct parallels in the target language, even current AI may get
the translation right. But think about the case where the target
language doesn't have multiple terms. <br>
</p>
<p>What kind of AI do you need to be able to create text in the
target language that brings across the point without using a
simple parallel?</p>
<p>Lest you think that this example is from literary fiction, let me
assure you the most recent example of that which I came across was
in a discussion of WAVE DASH in GitHub comments. It appears that
many languages cannot easily express the distinction between
hyphen and dash with just a pair of alternating words.</p>
<p>If I tried to translate the text manually, I would have to
understand what aspect of the distinction was important to the
author, so I could pick a pair of descriptive phrases (or, if the
target language was German, compound nouns, potentially novel
ones) to get the point across.<br>
</p>
<p>I posit that instances like these, while they do not occur in
every text, are nevertheless far more common than you might
imagine. I have come across several examples like the one above in
recent memory, without looking for them.<br>
<br>
And this is just one class of difficult statements. I'm sure, if
you did a survey, you could come up with other types of examples
that are equally generic and occur regularly enough that failing
on them would disqualify a translator as "high quality" no matter
what your definition is for that.<br>
</p>
<blockquote type="cite"
cite="mid:247f65e7-d6c2-5fa5-1f26-5c713e261cf5@shoulson.com">Obviously,
machine translation has improved in ways nobody(?) would have
expected it to when the field was in its infancy, and has done it
by a completely different method. Instead of making more and more
sophisticated programs to understand and parse the grammars of
various languages and build networks of subjects and predicates,
modern translation, afaik, depends greatly on throwing _vast_
amounts of known text into the mix and doing some heavy-duty
number- and memory-crunching to almost "guess" at what's probably
the best translation, without necessarily actually "understanding"
what it means. (BTW, am I totally wrong about this?) <br>
</blockquote>
<p>No, that's how it's been described to me as well, by several
people familiar with or active in the field.</p>
<p>Pattern matching has some rather interesting limitations;
curiously one of them seems to be the inability to maintain a
consistent gender for the subject across a text. It's a charming
device when used for generic subjects to imitate gender
neutrality, but fails badly when applied to a narrative about a
single person.</p>
<p>That's for cases where both source and target language encode
gender. If the source does not encode gender the same way, pattern
matching won't be able to correctly infer unless it can
reconstitute the implied subject of the sentence.</p>
<p>Here's a simpler case where simple text analysis does help
"guessing". Take the verbs "put" or "place" in English. If you
translate to German, you need to pick from a set of verbs that are
specific to the context: that context includes the shape and
orientation of the object and the configuration of the container
or location it is put in.</p>
<p>Now, for common containers or locations, and common types of
objects, statistical analysis will generate a good guess, but if
both are (novel) product names without descriptions, as you might
find in a user's manual, you're sunk. You would have to know
something about them, such as their dimensions and aspect ratios.
<br>
</p>
<p>What level AI you need to infer these qualities from the
remainder of the text is an interesting question.<br>
</p>
<blockquote type="cite"
cite="mid:247f65e7-d6c2-5fa5-1f26-5c713e261cf5@shoulson.com"> It
seems to me that that does have farther to take us, and we'll
probably see a lot more improvement, but it can only take us so
far. Then again, "so far" might be far enough. If you have a
translator whose results are semantically satisfactory, say, 97%
of the time, and sound only a little awkwardnessful to a native
speaker in the target language... well, customers' standards may
be willing to duck a little.
<br>
</blockquote>
<br>
<p>There's a level of "quality" that equates to "a human looking at
the translation can guess what might have been in the original".</p>
<p>A lot of what current engines produce falls into that category.
You may easily spot something that is wrong, and infer what the
original might have meant. That only works if the result either
violates context or is clearly a less than "native" phrasing.</p>
<p>The more naturally phrased something is, the harder it may be to
spot deviation from the original, by the way. So it will be
interesting to watch the progress.</p>
<p>The current state includes such howlers as translating "virus" as
"worm" (FB translation from Burmese). So those are easy for human
reader to correct for.</p>
<p>I came across an interesting example where social media had
substituted an English translation for a language I'm fluent in.
The translation was just what you would have expected to see if a
native speaker of that language had tried to write in the target
language (English) without being fluent, making the same mistakes
humans would make, such as using a "false friend" translation. <br>
</p>
<p>A./</p>
<p>PS: I don't work in machine translation, or in real translation,
but I have proofread a number of translations of works of literary
fiction. (And I'm exposed to machine translations in the usual
contexts). The former has taught me that translation is
effectively impossible, and the latter only confirms that :).
However, I'm actually impressed by how useful even a crude machine
translation can be -- especially in contexts where actual
translations would never be affordable or available.<br>
</p>
<blockquote type="cite"
cite="mid:247f65e7-d6c2-5fa5-1f26-5c713e261cf5@shoulson.com">~mark
<br>
<br>
On 10/21/21 17:11, James Kass via Unicode wrote:
<br>
<blockquote type="cite">
<br>
<br>
On 2021-10-21 9:41 AM, Dreiheller, Albrecht via Unicode wrote:
<br>
<blockquote type="cite">Without understanding the context, Live
Translate won't have a good chance to find the right
translation.
<br>
Machine translation often only pretends to know the meaning
but in fact it fails.
<br>
I'm not worried about human translators.
<br>
</blockquote>
They may be safe in the short-term. Machine translation is
much, much better than it was at the onset. I recall
translating a German web page about the Phaistos disk into
English and the page title was translated as "The Discotheques
of Phaistos". (That was a foreshadowing of what to expect in
the article, which was amusing to read through.) Even before
machine translation, translations could be humorous. A French
speaking friend once told me that the French title of a certain
Steinbeck novel could translate back into English as "The
Raisins of Anger".
<br>
<br>
The web page of Google Translate offers an option for the end
user to contribute suggestions for improving the specific
translation. This would be expected to make the machine
translations better over time. If this option is also offered
in Live Translate, which travels around town in pockets and
purses and accepts source material from more than plain-text,
wouldn't that expedite machine translation improvement?
<br>
<br>
And what will happen when AI is added to the mix?
<br>
</blockquote>
</blockquote>
<p><br>
</p>
</body>
</html>