Breaking barriers

David Starner prosfilaes at gmail.com
Fri Oct 22 16:04:41 CDT 2021


On Thu, Oct 21, 2021 at 12:46 AM James Kass via Unicode
<unicode at corp.unicode.org> wrote:
> This would mean that if an image of text can be scanned from a computer
> monitor, it could be translated.  The underlying source encoding
> wouldn't matter, it could be some obscure code page, Unicode PUA, or
> even a specialty custom ASCII font as long as the source display is
> correctly enabled and the translation software handles the source
> language(s).  Since the resulting data would likely be stored in
> Unicode, both pre- and post-translation -- the barrier between
> conflicting older encodings which Unicode has practically removed would
> then be completely demolished.

"as long as the source display is correctly enabled and the
translation software handles the source language(s)." So in no
interesting cases. Project Gutenberg had a Swedish bible translation
in an unknown encoding (a variant of the DOS encoding that doesn't
seem to have corresponded to anything documented); getting it to
display correctly was basically the same challenge as translating it
to Unicode, which was eventually done by figuring out what the unknown
codepoints (obviously quotes) must have been. The set of languages in
PUA and that have reliable transcription and translation is going to
be virtually empty, and if you care about correctness and you have the
font, directly convert the encoding.

> P.S. - Too bad about human translators, though.  Being a translator used
> to be a lucrative field with skilled translators in high demand.  Newer
> technology, as it breaks down the communication barrier between
> languages, will probably have an effect on translator employment, if it
> hasn't already.

Haven't you seen photos of billboards saying "Translation server is
down" or the like? It certainly already has impacted translator
employment. I recall an older story, from the 1970s, where a tobacco
firm was keeping track of a Brazilian anti-smoking group via a hired
translator; said translator eventually proceeded to give a copy of all
translated works to the Brazilian group (discreetly, or so he thought)
at which point the company never called him again. Translation
programs don't tend to do stuff like that.

On the other hand, someone called translation AI-hard; it's not, it's
impossible, in league with the halting problem. One example is Harry
Potter and the Half-Blood Prince, who has a character mentioned as
R.A.B. This is a preexisting character in the series, but which?
Translators had to ask Rowling to correctly translate the initials.
Now, a reference to the seventh book will answer the question, moving
it to AI-Hard, but such deliberate or accidental ambiguity is part of
the reason translators are traitors. (“traduttore, traditore”.)

-- 
The standard is written in English . If you have trouble understanding
a particular section, read it again and again and again . . . Sit up
straight. Eat your vegetables. Do not mumble. -- _Pascal_, ISO 7185
(1991)



More information about the Unicode mailing list