We're not ready for flawless machine translation, but it's coming

Getting your Trinity Audio player ready...

OPINION

It has become a common trick: You read an article about artificial intelligence and then you realize, after a full paragraph, that what you just read was written by an AI. Recently, the best machine translation algorithms have passed a similar milestone: now you can read an interesting article translated from (let’s say) Spanish without any indication that the text was not originally written in English. (Yes, you guessed it, this paragraph was automatically translated from Spanish).*

Until recently, machine translation was a bit like having a conversation with a four-year-old: you’re very impressed that it’s possible at all, but you have to put in effort to understand what they’re trying to get at. Yet the four-year old inevitably grows up, and somehow becomes as fluent as the rest of us. The consumer versions of machine translation available today, such as the ubiquitous Google Translate, are not quite there yet, but the paid-for enterprise versions are shockingly close, and each new improvement in the enterprise version will inevitably be rolled out more widely in the future.

This will have some wonderful consequences, certainly: outstanding publications in smaller languages will be able to reach (literally) billions more readers with their insights and perspectives. It will also have some challenging consequences: for a start, publishers in every language will have to compete with those in every other, exacerbating the winners-and-losers dynamic that online publishers already experience. Finally, it will have some outright dangerous consequences: genuinely fake news will be easier to spread, because there will no longer be a fluency barrier to being taken seriously.

But first, the good news. Take, for example, Gazeta Wyborcza, a quality daily out of Warsaw. I don’t read any Polish, but that is no longer an issue: the articles can be magically converted into fairly flawless English. At first I just browsed Gazeta to get a Polish view on Poland; if things ended there this would not matter much to English-language publishers, who largely aren’t covering Polish issues in the first place. What I didn’t expect was to be drawn in to reading, and enjoying, Gazeta’s thoughtful, distinctive perspective on the global news topics you might usually read about in the BBC, Der Spiegel or the Washington Post.

Of course, those publications have far greater budgets, and smaller publications in smaller languages are unlikely to match the absolute quality of the world’s foremost publishers. The key insight, though, is that it isn’t necessary for them to do so: if you’re going to look at two newspapers every morning, it’s far more interesting to pair the Post with Gazeta than to pair the Post with the Guardian, because the difference in perspective is larger and therefore so is the pleasure and discovery for the reader. (On a similar note, niche topical publications in smaller languages could easily be world-class within their specific topic areas).

The implication for publishers in smaller languages is clear: invest as soon as possible in machine-translated editions for every relevant language. (Perhaps, where appropriate, a native-speaking sub-editor could be employed to go over the output and remove any remaining infelicities, but of course this would be much less work than having human translations in the first place). The maths is truly dizzying; publications that currently have a maximum potential readership in the tens of millions can suddenly appeal to literal billions. Of course, the Los Angeles Times could equally decide to launch a Polish-language edition tomorrow, but machine translation feels like a technology with asymmetric upside; it’s value is greatest to publishers in smaller languages.

At this point, it’s important to note that the technology is stronger with some language-pairs than others, making the issue more immediately relevant to some publishers than to others. Anecdotally — in each case, in terms of translating into English — Spanish and German are extraordinary; French, Dutch, and Italian a little below; Portuguese, Russian, and Chinese a rung below that; then Arabic and Turkish another rung again. Of course these limitations will also change over time, and different publishers will see differing value in publishing to the various languages.

The second impact is the challenge to all publishers from new competition. For English-language publishers especially, it’s vital to consider the arrival of new machine-translated competitors when planning for the years ahead. Of course, online publishers are more than used to facing plentiful competition, so the difference will be quantitative rather than qualitative. Still, every publisher should be asking themselves: what would I do if my readers had twice as many good alternatives tomorrow as they do today? With machine translation, that very well may be the reality.

Finally, the dangerous impact of the trend is this: perfect machine translation will be just as available for bad actors as for quality publishers. Soon, a person’s ability to churn out spam, chum and falsehoods in fluent English will be completely untethered from their ability to write fluent English in the first place. The issues we’ve seen so far with fake news in the most literal sense — not “news I don’t like” but “news that is literally made up” — are nothing compared to what will happen when that ability is fully developed, when fake stories can be written in one language and presented in another.

On this front, the world is not ready for the implications of flawless machine translation, and it’s not clear what any individual publisher can do in response. If the parable of Babel describes human conflict following the loss of mutual intelligibility, we are about to discover that flawless mutual intelligibility brings strong challenges of its own.

Uri Bram
CEO, The Browser

About: Uri Bram is the Publisher of The Browser, the world’s favorite curation newsletter, and the author of The Business of Big Data: How to Create Lasting Value in the Age of AI, with University of Oxford Business School professor Martin Schmalz.

* “Se ha convertido en un truco común: lees un artículo sobre inteligencia artificial y luego te das cuenta , después de un párrafo completo, que lo que acabas de leer fue escrito por una IA. Recientemente, los mejores algoritmos de traducción automática han superado un hito similar: ahora puedes leer un artículo interesante traducido del (digamos) español sin ninguna indicación de que el texto no fue escrito originalmente en inglés. (Sí, lo adivinaste, este párrafo fue traducido automáticamente del español)”. Translated with Google Translate here. With thanks to Milena at Proyecto Mundo Latino.