Text-to-speech technology got more accessible than you think, and you should include an audio version of your articles. There are no excuses anymore.
In late 2016, the Danish online magazine Zetland made the decision to move into audio based on readers’ requests. In the summer of 2017, Zetland began publishing all articles as audio and things started to change seriously.
Within two months 40% of the consumption was audio, in less than 6 months it was 50%. And within a year 70% of all the consumption was audio. The move improved retention and also member satisfaction. The journalists read their own stories.
Zetland exceeded 28,000 members (Danish population: 5.8M) in 2021, and its operation has been financially sustainable since 2019.
Of course, having read articles by its authors is a very good brand-building exercise, as could be seen both with Zetland and NY Times. However, some publishers are increasingly turning to text-to-speech technology and using artificial and neural voices to read aloud their articles.
Use text-to-speech apps to create audio versions of your stories
During the pandemic, The Wall Street Journal hit its all-time high digital subscribers and also topped its overall traffic record. With that in mind, they ran a number of experiments with the aim of getting new and less-engaged members (who visit WSJ fewer than 10 days per month) coming back more often.
One of the most successful experiments was the “Listen to this article” feature, an automatically generated, text-to-speech audio version of every story on the website. The Journal said it proved to be more habit-forming than their popular crossword puzzles. And most of all, it was universally welcomed by younger and older readers alike.
WSJ built its own text-to-speech (TTS) player, which is connected to one of the several available cloud-based machine learning solutions offered by the big tech companies. You can use Google’s TTS API, or go for Amazon Polly API (as The Washington Post has) or some other cloud-based big tech provider of such services.
Of course, if you’re not WSJ or The Washington Post and have limited resources, this can seem like a far-reaching goal. Well, not anymore.
As it almost always happens with technology, you just have to wait a while for intermediary services to spring up and offer ready-to-use solutions for a fee that’s much more reasonable than tasking a whole team of developers to build the feature from the ground up.
After doing small research, I compiled below five services you can start using immediately and also examples of websites that are using them.
Now, all of them are in English, but don’t let that disturb you. Here is a list of languages the Google API is offering (which most of the services listed are using): Afrikaans, Arabic, Bengali , Bulgarian, Catalan, Chinese, Czech, Daish, Dutch, Filipino, Finnish , French, German, Greek, Gujarati, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Korean , Latvian, Malay, Malayalam, Mandarin Chinese, Norwegian, Polish, Portuguese, Punjabi, Romanian, Russian, Serbian, Slovak, Spanish, Swedish, Tamil, Telugu, Thai, Turkish, Ukrainian, Vietnamese.
BeyondWords is probably my favourite out of these services, honestly. It is used, for example by Journalism.co.uk. It uses AI voices and the latest text-to-speech voices from Amazon, Microsoft, Google, and Yandex – 700+ voices across 64 languages.
BeyondWords also offers voice cloning technology to create custom AI voices. You can use your own voice or create a synthetic copy of your voice and use that. The service offers a WordPress integration.
Play.ht also promises to generate realistic text-to-speech audio using its online AI Voice Generator and best synthetic voices from Google, Amazon, IBM and Microsoft. Here’s a list of supported languages. It doesn’t offer a free tier, but has a nice Medium integration and a WordPress integration.
Play.ht can, similarly to BeyondWords, turn your audio feed into a podcast feed.
Speechify promises an easy integration with your website with only 5 lines of code. It is used, for example, by Medium to automatically create an audio version for every post on the website. This is one of my older Medium blogs and I never added an audio version, but now it’s there.
Speechify has voices that can read the text in over 20+ different languages.
Remixd is used by the US-based tech online publication The Verge (example) to produce audio versions of its articles. Unfortunately, the website doesn’t provide much more information.
Start small and build up from there
I really think there aren’t any excuses for content websites now not to be providing audio versions of the articles.
Sure, if you only want to make it high-quality then that’s possible only with a selected number of languages that Google, Amazon, IBM and Microsoft also have a neural version of, which is a more natural-sounding voice than the typical Google Translate voice most of us are used to hearing.
Here you can hear the difference between a standard voice and a neural one, which synthesises speech with more human-like emphasis and inflection on syllables, phonemes, and words.
I did a test with my family and played them a clip of a human reading a text and a neural version of an AI voice reading a text. They couldn’t tell if one of them wasn’t human. Yes, scary.
But on the other hand, it provides publishers with a good option of turning their text-only website into a much richer experience that has a proven effect on longer spent time visits and an increased reader return.
Of course, using professionals or your own authors to read the articles still remains the best experience in the sense you can hear it is done by a human, especially with longer texts.
However, using a service like those mentioned above or Veritone Audio can extend what you are able to do.
Sounds Profitable, the ad tech weekly newsletter from Podnews, was able to synthesise the voice (create its clone) of the newsletter host Bryan Barletta and then use it to speak a language he doesn’t speak.
Barletta used Veritone Voice to build a voice model that can speak Spanish in his voice—his voice clone. Barletta wrote in length about the whole process in an earlier edition of his newsletter. The result is that thanks to the voice cloning technology he is able to reach new audiences in their native language and his audio delivery of the text doesn’t sound off-putting.
I think this is a really smart way of using technology – to extend and build on something a human has created.
This piece was originally published in The Fix and is re-published with permission.