CAPTCHAs have been protecting websites against bots for about a couple of decades now, but it appears that the world’s most popular website security system may soon become obsolete.
Researchers at the Lancaster University (UK), Northwest University, and Peking University (both in China) have developed an automated CAPTCHA solver that can defeat the majority of captcha systems in use today, like the one below.
The new algorithm is based on machine learning and deep learning methods that can crack sophisticated CAPTCHA codes. The solver is highly efficient; it can solve a captcha within 0.05 of a second by using a desktop PC.
The algorithm is not just very good at its job—with “near-human capability”—it also requires minimal human effort or oversight to work. Not only that, because the new solver requires little human involvement, it can easily be rebuilt to target new, or modified, captcha schemes.
“The first security defence of websites is no longer reliable”
“We show for the first time that an adversary can quickly launch an attack on a new text-based captcha scheme with very low effort. This is scary because it means that this first security defence of many websites is no longer reliable. This means captcha opens up a huge security vulnerability which can be exploited by an attack in many ways,” said Dr Zheng Wang, Senior Lecturer at Lancaster University’s School of Computing and Communications and co-author of the research.
We think our research probably has pronounced a death sentence for text CAPTCHA.Dr. Zheng Wang
“[The software] allows an adversary to launch an attack on services, such as Denial of Service attacks or spending spam or fishing messages, to steal personal data or even forge user identities,” said Guixin Ye, lead student author of the work.
Given the high success rate of our approach for most of the text captcha schemes, websites should be abandoning captchas.Guixin Ye
By using a machine-learned automatic captcha generator, the researchers were able to significantly reduce the effort, and time, needed to find and manually tag captchas to train their software. It only required 500 genuine captchas, instead of the millions that would normally be needed, to effectively train an attack programme.
The researchers tackled the task of beating text-based CAPTCHAs by using a recent development called the Generative Adversarial Network (GAN)—a type of neural network—which learns from examples.
The programme was tested on 33 captcha schemes, of which 11 are used by many of the world’s most popular websites, including eBay, Wikipedia and Microsoft.
Google’s new CAPTCHA technology (or reCAPTCHA) was the most difficult to beat, as you can see in the chart below. A captcha is considered to be ineffective above the 1% threshold, and even reCAPTCHA had a failure rate that’s 3-times worse.
Previous captcha solvers used to be specific to one particular captcha variation, and the machine-learning attack systems were labor-intensive to build, requiring a lot of manual tagging of captchas to train the systems. They could also be easily rendered obsolete by small changes in the security features used within captchas.
The new solver delivers significantly higher accuracy than previous captcha attack systems, and is able to successfully crack versions of CAPTCHAs where earlier systems have failed.
Researchers believe websites should be considering alternative measures that use multiple layers of security, such as a user’s use patterns, the device location or even biometric information.
This research is published in the paper ‘Yet Another Text Captcha Solver: A Generative Adversarial Network Based Approach’ and was presented at the ACM Conference on Computer and Communications Security (CCS) 2018 in Toronto.
Download WNIP’s new Media Moments 2018 report, which dives deeper into this year’s developments in publishing, and looks at what opportunities 2019 could usher in. The report is free and can be downloaded here.