Turning the volume up on audio deepfakes

The term ‘deepfake’ has been on the rise since it was first picked up by mainstream media in 2019. It typically refers to advanced machine learning that analyses a piece of media, and replaces the original content with another person’s likeness.

Whilst conversation around deepfakes have mostly revolved around their visual applications, audio deepfake technology has been discreetly developing on the sidelines. And whilst imitating and emulating voices opens up a new world of creative possibilities — especially when looking through a brand-building lens — the tech doesn’t come risk free.

A tale of brands and voices

Audio deepfakes allow you to harness the power of voice within a digital system. This technology empowers brands to sample anyone’s voice, analyse it, and create an emulated version to use as they wish. Yes, this is an exciting prospect. But its ability to distort reality can do more harm than good. And although it sounds like an early draft of a Black Mirror episode, we’re closer to owning this voice cloning technology than you may think.

Superstar DJ David Guetta recently used AI to replicate Eminem’s vocal style in an unreleased track, saying that ‘the future of music is in AI’. The most advanced applications on the market, such as Descript and Resemble AI, can analyse a snippet of speech to model and replicate any voice. Furthermore, these virtual voices can then be licensed for brands to use in their content.

Let me introduce you to… Emin-AI-em 👀 pic.twitter.com/48prbMIBtv
— David Guetta (@davidguetta) February 3, 2023

We live in a saturated media landscape

Consumers are bombarded with visual content on a daily basis. Brands face an uphill battle cutting through the noise — and this is where sound can be a powerful asset.

Voice has experienced a spectacular comeback in recent years: we have developed a global interest in podcasts, smart assistants such as Alexa and Google Nest fill our homes, and audio-based social media platforms like Clubhouse have seen peaks in popularity. Slowly but surely, we have been surrounding ourselves with new voices and learning to listen — and brands have started paying attention.

But until now, creating a cohesive brand voice has been complicated. The applications of brand voices are extensive and complex, requiring hours on end of recording sessions. Then there are the voice talents themselves, which can be expensive and difficult to manage — not to mention the fact they age and change, and may want to explore other projects.

Challenges around the costs and logistics of voice are have been mounting, whilst the demand for it is increasing. In the same breath, audio deepfakes are becoming ripe for use and a cost-efficient way to access malleable, virtual voices.

Is this the future of brand voices?

Striking a chord

Audio deepfakes may appear daunting, but let me paint you a picture. You cast a human voice that is in-sync with your brand identity. You record a few takes focussed on your brand’s key messaging and pay a fixed fee to cover both the recording and a voice rights buyout.

From here, the world’s your oyster. Machine learning allows you to analyse, modelise and tweak the voice as you see fit. This is now a tool, wholly owned by the brand, that can be distributed across the business. Your marketing and content teams have immediate, unlimited access to this unique brand voice.

This is your sonic typeface — an unmistakable voice that encapsulates your brand. It can be used across multiple mediums, be it television commercials, customer service helplines or product launch films. And it is this consistency, leveraged across every audio-enabled touchpoint, that is the catalyst to building brand value.

This tool’s practicality is unrivalled. It doesn’t change over time unless it’s willingly tweaked. Additionally, it doesn’t negate the need for human creatives on the ground. Your team can still experiment with artificially adjusting the tone of voice to fit demographics, new brand iterations and campaigns — all whilst retaining a consistent branded base.

Beware of the dark side

Despite their potential, not everything in the world of audio deepfakes is fun and games. As the ability to digitally clone and use any voice democratises, anyone with access to the right software can impersonate a boss or a client and extract sensitive information from your company.

Advanced social engineering attacks — which see your brand’s voice cloned — can open the door to scammers and malicious competitors taking control. And once the public can’t differentiate between what’s real and what’s not, the authenticity of your content becomes irrelevant. Research shows that 78% of consumers believe misinformation damages brand reputation.

But whilst it may seem we’re only just beginning to understand the dangers of audio deepfakes, there are proactive steps you can take to help protect your employees and brand equity:

Employee awareness
Communication channels
Response strategy
Legislation and research
Brand development

Your employees are your first line of defence. Awareness training around the associated risks of deepfake technology and how to spot imitations empowers them to effectively safeguard your business. The same can be said for minimising the channels you use for internal and external communications. Ensuring employees have access to secure platforms and processes helps protect sensitive information and aids in identifying malicious content.

This, however, doesn’t mean that things won’t spill through the cracks. Defamation and extortion attacks will continue to come — and this is where investing in a comprehensive response strategy pays dividends. You should be equipped to appropriately treat these as security and PR incidents.

You’re not alone in this venture

Some governments have started banning specific applications of deepfakes to protect elections, and DARPA has already invested a reported $68 million on deepfake detection technology. Work with the relevant authorities to push through protective legislation and contribute to the development of authentication guidelines and procedures.

Be careful in your implementation of voice-based brand services, ensuring the right protections are put in place as voice grows in importance for your brands. The best way to do this is to make sure you have the right expert partners to help you grow your branded voice. Deepfake technology is here to stay — and the tangible opportunities it presents are not only real, but already being exploited. The question brands face is simple: will they take pre-emptive action and leverage it to their advantage, or let themselves become subject to its hazards?

Put more simply, are you a leader or a follower?

Featured image: Pump Up The Volume (1990)