‘La pomme a mangé le garcon’ is a bizarre sentence, but an easily-comprehensible one (if you speak French). It means, ‘The apple ate the boy’. What does Google Translate make of it?
Bob Berwick at Faculty of Language has an explanation of why this is. In a nutshell: GT works by bombarding problems with corpus statistics, while paying very little attention at all to things like grammatical structure or thematic role. Since ‘the boy ate the apple’ is a statistically much more ‘likely’ sentence than ‘the apple ate the boy’, while both sentences contain English translations of all and only the words in the French source sentence, the former wins out. Berwick’s take-home message relates to the dangers of overusing statistics (Bayes’ Theorem in particular) in place of doing serious linguistics.
Notwithstanding mishaps like this, however, Google Translate is remarkably successful in general. Furthermore, overall it is significantly more successful than previous attempts at automated machine translation that paid much more attention to notions that are central in out best linguistic theories: things like grammatical structure (e.g. clause composition) and thematic role (e.g. verb subject/object).
It is possible to draw many morals from this scenario. At the very least, we can say the following: it is possible to write a computer program that mimics a human cognitive activity rather well, operating in a way that is nothing like the way that human cognition works. This is something to bear in mind amid the multifarious claims made on behalf of artificial intelligence.
N.B.: Of course, we really didn’t need this example to see that human cognition works nothing like Google Translate. Of course native speakers aren’t carrying n-grams around in their heads. Of course native speakers’ linguistic knowledge doesn’t amount to knowing statistical distributions of collocations of words … right?