<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <div class="moz-cite-prefix">On 10/21/2021 3:40 PM, Mark E. Shoulson

      via Unicode wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:247f65e7-d6c2-5fa5-1f26-5c713e261cf5@shoulson.com">If I

      recall correctly, someone has proved that "fully automatic

      high-quality translation" is AI-hard.  Meaning that it's basically

      the same as making a fully aware, human-intelligence AI.  Now,

      that probably depends a lot on the details of "high-quality."

      There are probably sentences and texts one could cook up that a

      would-be translator would need arbitrarily good understanding of

      the context, situation, shared cultural memories and references,

      etc etc for, and I guess that would be what the "proof" was

      about.  </blockquote>

    <p>Sentences that require some understanding of the meaning for a

      successful translation, even if you only consider factual

      accuracy, are not hard to come by: they do prop up regularly.</p>

    <p>An easily understood example are sentences that use two distinct,

      but close terms in contrast. If both have strongly equivalent, but

      distinct parallels in the target language, even current AI may get

      the translation right. But think about the case where the target

      language doesn't have multiple terms. <br>

    </p>

    <p>What kind of AI do you need to be able to create text in the

      target language that brings across the point without using a

      simple parallel?</p>

    <p>Lest you think that this example is from literary fiction, let me

      assure you the most recent example of that which I came across was

      in a discussion of WAVE DASH in GitHub comments. It appears that

      many languages cannot easily express the distinction between

      hyphen and dash with just a pair of alternating words.</p>

    <p>If I tried to translate the text manually, I would have to

      understand what aspect of the distinction was important to the

      author, so I could pick a pair of descriptive phrases (or, if the

      target language was German, compound nouns, potentially novel

      ones) to get the point across.<br>

    </p>

    <p>I posit that instances like these, while they do not occur in

      every text, are nevertheless far more common than you might

      imagine. I have come across several examples like the one above in

      recent memory, without looking for them.<br>

      <br>

      And this is just one class of difficult statements. I'm sure, if

      you did a survey, you could come up with other types of examples

      that are equally generic and occur regularly enough that failing

      on them would disqualify a translator as "high quality" no matter

      what your definition is for that.<br>

    </p>

    <blockquote type="cite"

      cite="mid:247f65e7-d6c2-5fa5-1f26-5c713e261cf5@shoulson.com">Obviously,

      machine translation has improved in ways nobody(?) would have

      expected it to when the field was in its infancy, and has done it

      by a completely different method. Instead of making more and more

      sophisticated programs to understand and parse the grammars of

      various languages and build networks of subjects and predicates,

      modern translation, afaik, depends greatly on throwing _vast_

      amounts of known text into the mix and doing some heavy-duty

      number- and memory-crunching to almost "guess" at what's probably

      the best translation, without necessarily actually "understanding"

      what it means.  (BTW, am I totally wrong about this?) <br>

    </blockquote>

    <p>No, that's how it's been described to me as well, by several

      people familiar with or active in the field.</p>

    <p>Pattern matching has some rather interesting limitations;

      curiously one of them seems to be the inability to maintain a

      consistent gender for the subject across a text. It's a charming

      device when used for generic subjects to imitate gender

      neutrality, but fails badly when applied to a narrative about a

      single person.</p>

    <p>That's for cases where both source and target language encode

      gender. If the source does not encode gender the same way, pattern

      matching won't be able to correctly infer unless it can

      reconstitute the implied subject of the sentence.</p>

    <p>Here's a simpler case where simple text analysis does help

      "guessing". Take the verbs "put" or "place" in English. If you

      translate to German, you need to pick from a set of verbs that are

      specific to the context: that context includes the shape and

      orientation of the object and the configuration of the container

      or location it is put in.</p>

    <p>Now, for common containers or locations, and common types of

      objects, statistical analysis will generate a good guess, but if

      both are (novel) product names without descriptions, as you might

      find in a user's manual, you're sunk. You would have to know

      something about them, such as their dimensions and aspect ratios.

      <br>

    </p>

    <p>What level AI you need to infer these qualities from the

      remainder of the text is an interesting question.<br>

    </p>

    <blockquote type="cite"

      cite="mid:247f65e7-d6c2-5fa5-1f26-5c713e261cf5@shoulson.com"> It

      seems to me that that does have farther to take us, and we'll

      probably see a lot more improvement, but it can only take us so

      far.  Then again, "so far" might be far enough.  If you have a

      translator whose results are semantically satisfactory, say, 97%

      of the time, and sound only a little awkwardnessful to a native

      speaker in the target language... well, customers' standards may

      be willing to duck a little.

      <br>

    </blockquote>

    <br>

    <p>There's a level of "quality" that equates to "a human looking at

      the translation can guess what might have been in the original".</p>

    <p>A lot of what current engines produce falls into that category.

      You may easily spot something that is wrong, and infer what the

      original might have meant. That only works if the result either

      violates context or is clearly a less than "native" phrasing.</p>

    <p>The more naturally phrased something is, the harder it may be to

      spot deviation from the original, by the way. So it will be

      interesting to watch the progress.</p>

    <p>The current state includes such howlers as translating "virus" as

      "worm" (FB translation from Burmese). So those are easy for human

      reader to correct for.</p>

    <p>I came across an interesting example where social media had

      substituted an English translation for a language I'm fluent in.

      The translation was just what you would have expected to see if a

      native speaker of that language had tried to write in the target

      language (English) without being fluent, making the same mistakes

      humans would make, such as using a "false friend" translation. <br>

    </p>

    <p>A./</p>

    <p>PS: I don't work in machine translation, or in real translation,

      but I have proofread a number of translations of works of literary

      fiction. (And I'm exposed to machine translations in the usual

      contexts). The former has taught me that translation is

      effectively impossible, and the latter only confirms that :).

      However, I'm actually impressed by how useful even a crude machine

      translation can be -- especially in contexts where actual

      translations would never be affordable or available.<br>

    </p>

    <blockquote type="cite"

      cite="mid:247f65e7-d6c2-5fa5-1f26-5c713e261cf5@shoulson.com">~mark

      <br>

      <br>

      On 10/21/21 17:11, James Kass via Unicode wrote:

      <br>

      <blockquote type="cite">

        <br>

        <br>

        On 2021-10-21 9:41 AM, Dreiheller, Albrecht via Unicode wrote:

        <br>

        <blockquote type="cite">Without understanding the context, Live

          Translate won't have a good chance to find the right

          translation.

          <br>

          Machine translation often only pretends to know the meaning

          but in fact it fails.

          <br>

          I'm not worried about human translators.

          <br>

        </blockquote>

        They may be safe in the short-term.  Machine translation is

        much, much better than it was at the onset.  I recall

        translating a German web page about the Phaistos disk into

        English and the page title was translated as "The Discotheques

        of Phaistos".  (That was a foreshadowing of what to expect in

        the article, which was amusing to read through.)  Even before

        machine translation, translations could be humorous.  A French

        speaking friend once told me that the French title of a certain

        Steinbeck novel could translate back into English as "The

        Raisins of Anger".

        <br>

        <br>

        The web page of Google Translate offers an option for the end

        user to contribute suggestions for improving the specific

        translation. This would be expected to make the machine

        translations better over time.  If this option is also offered

        in Live Translate, which travels around town in pockets and

        purses and accepts source material from more than plain-text,

        wouldn't that expedite machine translation improvement?

        <br>

        <br>

        And what will happen when AI is added to the mix?

        <br>

      </blockquote>

    </blockquote>

    <p><br>

    </p>

  </body>

</html>