One of the stranger applications of deepfakes — AI technology used to manipulate audiovisual content — is the audio deepfake scam. Hackers use machine learning to clone someone’s voice and then combine that voice clone with social engineering techniques to convince people to move money where it shouldn’t be. Such scams have been successful in the past, but how good are the voice clones being used in these attacks? We’ve never actually heard the audio from a deepfake scam — until now.
Security consulting firm NISOS has released a report analyzing one such attempted fraud, and shared the audio with Motherboard. The clip below is part of a voicemail sent to an employee at an unnamed tech firm, in which a voice that sounds like the company’s CEO asks the employee for “immediate assistance to finalize an urgent business deal.”
The quality is certainly not great. Even under the cover of a bad phone signal, the voice is a little robotic. But it’s passable. And if you were a junior employee, worried after receiving a supposedly urgent message from your boss, you might not be thinking too hard about audio quality. “It definitely sounds human. They checked that box as far as: does it sound more robotic or more human? I would say more human,” Rob Volkert, a researcher at NISOS, told Motherboard. “But it doesn’t sound like the CEO enough.”
The attack was ultimately unsuccessful, as the employee who received the voicemail “immediately thought it suspicious” and flagged it to the firm’s legal department. But such attacks will be more common as deepfake tools become increasingly accessible.
All you need to create a voice clone is access to lots of recordings of your target. The more data you have and the better quality the audio, the better the resulting voice clone will be. And for many executives at large firms, such recordings can be easily collected from earnings calls, interviews, and speeches. With enough time and data, the highest-quality audio deepfakes are much more convincing than the example above.
The best known and first reported example of an audio deepfake scam took place in 2019, where the chief executive of a UK energy firm was tricked into sending €220,000 ($240,000) to a Hungarian supplier after receiving a phone call supposedly from the CEO of his company’s parent firm in Germany. The executive was told that the transfer was urgent and the funds had to be sent within the hour. He did so. The attackers were never caught.
Earlier this year, the FTC warned about the rise of such scams, but experts say there’s one easy way to beat them. As Patrick Traynor of the Herbert Wertheim College of Engineering told The Verge in January, all you need to do is hang up the phone and call the person back. In many scams, including the one reported by NISOS, the attackers are using a burner VOIP account to contact their targets.
“Hang up and call them back,” says Traynor. “Unless it’s a state actor who can reroute phone calls or a very, very sophisticated hacking group, chances are that’s the best way to figure out if you were talking to who you thought you were.”