On reading AI article, Guardian columnist asks, "Does anyone have any job openings?”

Getting your Trinity Audio player ready...

OpenAI, a non-profit artificial intelligence research company whose founders include Elon Musk and Sam Altman, recently announced that it had developed one of the most advanced language modeling algorithms, called GPT-2.

According to the company’s post, “The model is chameleon-like — it adapts to the style and content of the conditioning text. This allows the user to generate realistic and coherent continuations about a topic of their choosing.”

We've trained an unsupervised language model that can generate coherent paragraphs and perform rudimentary reading comprehension, machine translation, question answering, and summarization — all without task-specific training: https://t.co/sY30aQM7hU pic.twitter.com/360bGgoea3
— OpenAI (@OpenAI) February 14, 2019

“Rich in context and nuance”

GPT-2 excels in language modeling, which is the ability to predict the next word in a given sentence. It just requires a line or two of input to generate several paragraphs of plausible content.

It can write across different subjects and even mimic specific styles and tone to produce text that is rich in context and nuance. GPT-2 can generate an entire article given a headline, complete with fake quotes and data. It can even write coherent fiction with a reasonable amount of detail.

Here's a ridiculous result from the @OpenAI GPT-2 paper (Table 13) that might get buried — the model makes up an entire, coherent news article about TALKING UNICORNS, given only 2 sentences of context.

WHAT??!! pic.twitter.com/G8UtQpefyZ
— Ryan Lowe (@ryan_t_lowe) February 14, 2019

When prompted by OpenAI’s researchers to write on a topic they disagreed with i.e., “recycling is bad for the world,” GPT-2 produced a “really competent, really well-reasoned essay,” said David Luan, OpenAI’s Vice President of Engineering, to The Verge. “This was something you could have submitted to the US SAT and get a good score on,” he added.

An OpenAI employee printed out this AI-written sample and posted it by the recycling bin: https://t.co/PT8CMSU2AR pic.twitter.com/PuXEzxL7Xd
— Greg Brockman (@gdb) February 14, 2019

In the announcement, OpenAI researchers say that GPT-2 works that well only about half the time. But the examples presented by the team were as well-written as they could be by a human.

Currently, the tool has been trained on 8 million pages filtered by at least 3 upvotes on Reddit. As it learns by ingesting more content, and trained for specific tasks, it will likely become better.

It can also perform other writing-related tasks, like translating text from one language to another, summarizing long articles, and answering trivia questions.

“Pretty darn real”

GPT-2 could become a valuable tool for publishers aiming to scale up high quality automated content production. Several of them, including Reuters, AP, Washington Post and Forbes, are already using tools that automate content production. Guardian Australia recently published its first article written by a text generator called ReporterMate.

At present automated tools are mainly used to produce articles that are driven by facts and figures. But something like GPT-2 could take it to an entirely new level.

Hannah Jane Parkinson, a Guardian columnist, whose article was fed into GPT-2 wrote in her column for the publisher, “Seeing GPT-2 “write” one of “my” articles was a stomach-dropping moment: a) it turns out I am not the unique genius we all assumed me to be; an actual machine can replicate my tone to a T; b) does anyone have any job openings?”

Jokes aside, GPT-2 has shown immense potential, so much so that its creators are worried about its potential for misuse.

We are living in an age where fake news is a serious problem. Bots have already been used to distribute inflammatory and partisan content on social media. And according to a September 2018 Knight Foundation and Gallup poll, publishers’ credibility has taken a hit because of the distrust sown by the prevalence of such content.

Although GPT-2 has been designed for positive applications, like most other powerful tools, it can be used for nefarious purposes as well. This includes automation of the production of abusive or faked content, online impersonation, and automated production of spam and phishing content.

The content produced by GPT-2 “looks pretty darn real. It could be that someone who has malicious intent would be able to generate high-quality fake news,” said David Luan, Vice President of Engineering at OpenAI to Wired.

Hence, OpenAI has broken away from tradition and refrained from releasing GPT-2 publicly. What they have done is released a much smaller model for researchers to experiment with. Certain media outlets have also been provided limited access.

We are aware that some researchers have the technical capacity to reproduce and open source our results. We believe our release strategy limits the initial set of organizations who may choose to do this, and gives the AI community more time to have a discussion about the implications of such systems.
OpenAI Team

Publishers test GPT-2

It didn’t take much to get the program to produce mischief inducing fake content. According to the Guardian, when the first few paragraphs of a Brexit story was fed into the program it generated “plausible newspaper prose, replete with “quotes” from Jeremy Corbyn, mentions of the Irish border, and answers from the prime minister’s spokesman.”

The Guardian tests GPT-2’s capabilities for generating news and fiction:

Similarly, Wired tested GPT-2 with the prompt “Hillary Clinton and George Soros,” and the algorithm produced a well composed writeup filled with conspiracy theories.

The Verge’s team gave it the prompt “Jews control the media,” and GPT-2 wrote: “They control the universities. They control the world economy. How is this done? Through various mechanisms that are well documented in the book The Jews in Power by Joseph Goebbels, the Hitler Youth and other key members of the Nazi Party.”

A version of the program trained on Amazon product reviews demonstrated how a little extra training could tailor it for a specific task, both positively and negatively. Prompted by Wired to write a 1-star book review with the summary “I hate Tom Simonite’s book,” the program trashed the book mercilessly and with apparent authority.

The thing I see is that eventually someone is going to use synthetic video, image, audio, or text to break an information state. They’re going to poison discourse on the internet by filling it with coherent nonsense. They’ll make it so there’s enough weird information that outweighs the good information that it damages the ability of real people to have real conversations.
Jack Clark, OpenAI’s Policy Director to The Verge

Potential for transforming publishing

But then all this doom and gloom scenario is being considered because of the great potential of the tool in the first place. Automated content tools till now have largely been used to produce data-driven articles from templates. GPT-2 can take that to the next level.

If it could be trained to write convincing reviews on Amazon or produce the essay on recycling, it can be trained to write high-quality pieces on different topics after being fed the right prompts by journalists. It can substantially reduce the time taken to produce quality content.

Imagine a reporter feeding in data about a news story and his thoughts on the matter, and given the right training, GPT-2 or a future iteration may as well come up with a well-argued opinion piece which the reporter just needs to tweak a bit to make it ready for publishing.

Forbes has already been experimenting with a tool that prepares drafts of stories which reporters just need to polish to get them ready for publication. GPT-2 could be the next step in that journey.

And no, it does not have to take away reporters’ jobs. It can be a powerful assistant for journalists and publishers, helping them produce a lot of quality content that is in sync with their tone and style.