HOME
More
Industry insights
How AI-generated audio and video deepfakes are fueling BEC attacks

How AI-generated audio and video deepfakes are fueling BEC attacks

Pavithra Murugan
Last Updated : June 19, 2026
2 Views
8 Min Read

The email arrives on a Friday afternoon. A senior finance executive is asked to process an urgent wire transfer. The message mentions a confidential acquisition that’s sensitive enough that it needs to stay off the usual procurement channels. The email looks clean; it’s correctly authenticated, written in the right tone, and doesn’t have any suspicious links or attachments. The executive hesitates.

Then the phone rings. The CFO is on the other end, the voice unmistakably his, confirming the request. The transfer goes through.

This is where deepfakes change the business email compromise (BEC) equation. They don’t make the email harder to detect. They eliminate the moment at which detection matters most. The verification step, which is usually the one behavior every BEC training program drills into employees, has been turned into the attack’s closing move.

How deepfakes slot into the BEC attack chain

BEC has always been a trust exploit, not a technical one. Deepfakes add a second channel that reinforces the first.

In most cases, the pattern works like this: A phishing email establishes context and urgency—usually a wire transfer, an NDA, or a new vendor onboarding. On its own, the email might raise flags. The domain is close but not exact, the number is unfamiliar, something feels slightly off.

In traditional BEC, this is where attacks fail. With the right training, the employee pauses, calls the executive’s known number, and discovers that the request is fraudulent.

In deepfake-augmented BEC, that pause is anticipated. Before the employee reaches for the phone, the attacker calls first using a spoofed number and cloned voice to confirm the request. Or a voice note arrives, in the executive’s voice, reiterating the urgency. Or a meeting is already on the calendar with what looks like a live video of the CFO. At that point the psychological override is complete. The attack doesn’t defeat the email filter. It defeats the person who would otherwise have caught it.

Audio-only attacks are currently more common than video. They’re faster to build, easier to send through consumer apps, and harder to scrutinize in real time. Usually, a phone call doesn’t give you the chance to pause and study the face.

Voice clones can be generated from as little as 2 to 10 seconds of reference audio, which is the threshold OpenAI’s Voice Engine, and most commercial tools need somewhere between 10 and 30 seconds of clean audio to produce a usable clone. For any executive who has appeared on an earnings call, a podcast, a keynote, or a company YouTube video, that material is already out there and indexed.

Video deepfakes are more resource-intensive and easier to spot on closer inspection, but they’re increasingly used for high-value targets where the stakes justify the effort. The more dangerous variant isn’t a recorded clip sent over email. It’s a real-time session where the executive appears live on a video call, apparently responding naturally to the conversation, while the attacker runs the deepfake in the background.

Why this breaks standard BEC defenses

The common detection signals that work against text-based BEC, such as linguistic pattern deviation, domain anomalies, thread injection patterns, and request-context mismatches, operate at the email layer. Deepfake-augmented BEC moves the decisive moment somewhere else entirely.

The email may pass behavioral analysis without issue. The actual trust-breaking happens in a phone call, an instant message, or a video meeting, none of which passes through an email security platform. There’s no payload to sandbox, no signature to match, no sending domain to check against SPF records. The attack vector is a human voice on an app that security tools don’t see.

This creates a problem with the standard advice that most BEC training programs still give: If you receive a suspicious payment request, verify it by phone using a number you already have. That was sound guidance when the only attack vector was text. It falls apart when the attacker can place the confirming call themselves before the employee picks up the phone to check.

The “call to verify” instruction has become a social engineering opening. Attackers know employees have been trained to want voice confirmation. So they provide it preemptively, as part of the attack.

Real-world examples of deepfake attacks

WPP, May 2024

Fraudsters set up a WhatsApp account using a publicly available photo of WPP CEO Mark Read, then used it to schedule a Microsoft Teams meeting with a senior agency leader at the company. During the meeting, they ran a voice clone of Read alongside YouTube footage of him while also impersonating him in the meeting’s chat window. The target was asked to set up a new business. This was the cover story for extracting money and personal details. It didn’t work.

In an email to leadership afterward, Read warned his colleagues: “Just because the account has my photo doesn’t mean it’s me.” What makes this case worth studying is the layering: static image, cloned voice, real video footage, and live text impersonation all running at the same time in a single meeting, each one lending credibility to the others.

Ferrari, July 2024

A Ferrari executive started receiving WhatsApp messages that were apparently from CEO Benedetto Vigna about a confidential acquisition, citing that an NDA needed to be signed immediately and that it had to be kept quiet. According to Bloomberg, the messages came from an unknown number but included a real photo of Vigna.

A follow-up call used a voice clone accurate enough to replicate his southern Italian accent. The executive’s suspicion grew during the call, and he tested it by asking the caller about a book Vigna had recommended days earlier. The caller couldn’t answer. The call ended abruptly. Ferrari opened an internal investigation. The entire attack ran through a consumer messaging app, using nothing but publicly available photos and a voice model built from media appearances.

LastPass, April 2024

A LastPass employee received a string of calls, texts, and at least one WhatsApp voicemail, all impersonating CEO Karim Toubba. According to LastPass’s own account, the deepfake audio was probably trained on publicly available recordings such as conference talks, interviews, and video content. The employee didn’t catch it because the voice sounded wrong.

They caught it because the channel was wrong: Their CEO doesn’t send urgent requests via WhatsApp outside business hours. The incident was reported to the internal security team with no impact. LastPass published the details to put other companies on notice. According to CrowdStrike’s 2025 Global Threat Report, voice phishing attacks increased 442% between the first and second halves of 2024. The LastPass incident was at the front edge of that surge.

What makes an organization a high-risk target

The raw material for voice cloning is public, abundant, and free. Earnings calls, conference keynotes, media interviews, podcast appearances, company YouTube videos are all part of potential training data. A usable clone takes 15 to 30 seconds of clean audio, and most executives at publicly traded or growth-stage companies have far more than that available on the internet.

Unfortunately, the more visible an executive is, the easier they are to impersonate. Someone who speaks regularly at industry events, appears on earnings calls, or gives press interviews provides varied, high-quality training material that produces a more convincing clone across different registers and contexts.

But mid-market companies aren’t insulated by keeping a lower profile. A CFO who appeared once on a regional business podcast, or a VP of Finance with a short LinkedIn video, has left enough for a targeted attack. The threat isn’t limited to companies with household-name leadership.

It’s worth doing a quick audit of what’s publicly indexed for your senior executives. This doesn’t have to be a reason to stop communicating publicly, but to understand what the exposure actually looks like and build verification procedures that account for it.

Detection and countermeasures

With threat actors getting creative with how deepfake is used to propagate attacks, it’s more important than ever before to find ways to counter these threats. Let’s explore some ways in which these attacks can be detected and thwarted.

Fix the verification protocol before attackers use it

“Call to verify” needs an update. For any out-of-cycle financial request or sensitive action, the verification callback should use a number sourced from your internal directory. It shouldn’t be from the original message or from an incoming call.

Layer on at least one of the following: a pre-agreed code word established in advance between executives and finance teams; dual authorization from a second person through a separate channel; or confirmation through an authenticated internal system rather than a consumer messaging app.

The code word approach costs nothing and is harder to defeat than any technical control. What the Ferrari executive did instinctively by simply asking a question only the real person could answer is one such effective example. This is what a formalized verification phrase makes routine. No voice clone can defeat it without insider knowledge of what the phrase is.

Know the limits of deepfake detection tools

Detection tools are real and improving. They flag acoustic artifacts such as irregular cadence, flat emotional register, unnatural transitions on certain consonants, or absence of background noise variation, that current voice clones still tend to produce. In controlled environments, they’re useful. In the wild, on a live call to a personal device, or in a WhatsApp conversation that exists outside the corporate network, they don’t help. Treat them as one layer in a stack, not a solution.

Retrain employees on audio tells, not visual ones

Most deepfake awareness training is focused on video “tells”, such as hairline artifacts, unnatural eye movement, or facial boundary issues. That’s largely irrelevant for audio-only attacks, which are currently more prevalent.

Employees should know what to listen for: unnaturally even pacing without filler words or natural pauses, a slight mechanical quality on hard consonants, no ambient sound when there should be some (a person calling from an office or car has background noise), and a flatness in emotional tone that doesn’t match the urgency of what’s being said.

No single tell is definitive, but knowing these exist means that employees are listening critically rather than just hearing a familiar voice and accepting it.

Treat any out-of-cycle request as elevated risk

A payment instruction, a change to banking details, an NDA request, a credential update—if it arrives outside the normal workflow, it should trigger a formal verification process regardless of whether someone followed up by phone to confirm it.

A confirming call should raise scrutiny when the original request was already anomalous, not lower it. Attackers engineer time pressure and invoke executive authority specifically because those conditions cause people to skip verification. Building a security-first culture where checking is routine rather than an implicit accusation removes that lever.

Be deliberate about what content is publicly indexed

This isn’t an argument for executives going dark. It’s a narrower point: Internal town halls and all-hands calls don’t need to be public on YouTube. Remove what serves no external purpose. For what stays public, understand that it’s training data and weight the verification procedures accordingly.

Wrapping up

Deepfakes don’t make BEC conceptually harder to understand. They make it harder to intercept in real time. Organizations that have trained employees to verify suspicious emails by phone have inadvertently trained them to complete a step attackers have learned to provide.

Better email filters don’t solve this. What solves it is a verification approach that treats every channel with the same skepticism as the original suspicious email. The confirming phone call or video message shouldn’t be the end of the verification process.

eProtect is a cloud-based email security and archiving solution that provides an additional layer of security for email accounts. The solution offers advanced threat detection mechanisms that can secure on-premise and cloud email accounts from evolving email threats. eProtect is the security solution that powers Zoho Mail, a platform that millions of users trust.

Pavithra Murugan

Your email address will not be published. Required fields are marked