pwshub.com

AI is spawning a flood of fake Trump and Harris voices. Here’s how to tell what’s real.

AI is spawning a flood of fake Trump and Harris voices. Here’s how to tell what’s real.

October 16, 2024 at 6:05 a.m.

Artificial intelligence has made it extraordinarily simple to copy someone’s voice — allowing thousands of audio impersonations, known as “deepfakes,” to flood the internet since early last year.

With a razor-thin margin in the presidential race between Vice President Kamala Harris and former president Donald Trump, experts are preparing to counter fabricated audio that could confuse voters in the hectic days leading up to the election. Already, Harris has been spoofed celebrating President Joe Biden’s decision to bow out of the 2024 campaign, and Trump’s voice has been cloned insulting the intelligence of Fox News viewers.

The Washington Post spoke with computer scientists, AI audio companies and linguistic experts to find out why AI audio fakes sound so realistic — and how to tell whether the speech is real.

The following audio clips are AI-generated.

This story is best experienced with sound.

To copy a voice, AI tools require a vast array of audio samples scraped from the internet. They use algorithms to recognize patterns in the speech and create clones that replicate them. Since politicians have a wealth of speeches and interviews available online, they are commonly included in these datasets, making them easy targets for high-quality deepfakes, AI experts said.

By analyzing very large datasets, AI models have gotten vastly better at mimicking speech, according to Sarah Barrington, an AI researcher at the University of California at Berkeley’s School of Information. Earlier models would produce robotic speech, but now “the cadence is pretty good,” she added. “It’s as though you’re speaking to a human.”

Which one is a Harris deepfake?

Audio 1 is a real clip of Harris speaking in her first presidential campaign ad. It was released in late July and incorporates Beyoncé’s song “Freedom.”

Audio 2 is an AI-generated clip of Harris, which Elon Musk shared on X in late July. It has been viewed more than 100 million times and was posted on X by a YouTuber named Mr. Reagan who called it a parody. (Mr. Reagan did not respond to a request for comment.)

The tools are so good, in fact, that researchers who study the audio frequencies of speech said they sometimes have trouble detecting an AI-generated voice.

Often, experts have to rely on small tells. AI fakes often have music in the background that hides oddities and floods speech frequency bands. In the AI-generated sample of Harris above, her voice doesn’t vary its tone. While her real speech in her presidential ad is consistent in tone, she emphasizes certain words and pauses in places that seem natural.

Morgan Finkelstein, a national security spokeswoman for the Harris campaign, said voters do not want “the fake, manipulated lies of Elon Musk and Donald Trump.” The Trump campaign did not respond to a request for comment.

Uncovering genuine speech

Here’s Harris in a recent interview with MSNBC. Let’s take a closer look at how experts would determine her speech is authentic:

In this clip, Harris stutters. This and other idiosyncrasies of how people talk are called “artifacts” and are common in natural speech and hard for AI to mimic.

Harris emphasizes several words, including “attempting,” to make her point more powerfully. AI still struggles to replicate this varied emphasis, failing to capture the distinctive ways people speak.

You hear Harris pause and inhale. Real speech is peppered with various types of breaths and pauses. AI has a hard time copying this, which makes even stellar samples sound slightly robotic.

Hear an example of each artifact:

Stuttering Emphasizing Inhaling

Listen to the entire audio clip

Imperfections, such as stutters, breaths, pauses and varying levels of voice inflection, allow experts to pinpoint real human speech.

“It’s more what’s happening around the voice that gives it away,” Barrington said. “It becomes very apparent,” she said, “looking at these periods of silence, where there’s pauses and where there’s mumblings and stuttering.”

But voice-cloning companies are getting better at cloning these imperfections and will make it harder to spot voice fakes in the coming years, according to Hany Farid, a computer science professor at Berkeley.

Which one is a Trump deepfake?

Audio 1 is a real clip of Trump speaking in a 2018 CBS News interview.

Audio 2 is an AI-generated clip of the 45th president. It was posted on TikTok by the user @unclesmudge13 in early July and was shared more than 61,000 times before being taken down by the company. (The TikTok user did not respond to a request for comment.)

You aren’t alone if you are having a hard time hearing the difference: The artifacts are subtle in the clips. In the real interview, Trump stops abruptly, takes an audible breath and emphasizes the word “vicious.” In the fake audio, Trump’s cadence is regular, and his volume is artificially constant throughout.

AI tools can now produce voices speaking with different accents, mimicking how a New Yorker might say “caw-ffee.” The use of shoddy regional dialects has traditionally been an easy way to detect an AI fake, said Farid. But as tools get better at spoofing dialects, the fakes become more deceptive, he added.

Social media companies such as TikTok and Instagram struggle to identify and label fake speech, AI experts say. They often rely on deepfake detectors, an imperfect technology, to flag whether a voice sample is AI-generated. Even when AI-generated media is correctly flagged, the platforms often still surface the content, allowing it to spread. TikTok and Meta, Instagram’s parent company, declined to comment.

More than 30 state legislatures have introduced measures banning election-related deepfakes. But the rules require enforcers to be able to identify AI-generated content, a difficult task for even experts.

Farid fears the advanced technology will alter society’s vision of reality — preventing people from trusting what they hear.

“What is the average [person] left to do?” he said. “There’s no way of ever knowing.”

About this story

Editing by Karly Domb Sadof and Alexis Sobel Fitts. Design editing by Betty Chavarria and Matt Callahan. Copy editing by Emily Morman.

Source: washingtonpost.com

Related stories
1 month ago - Get up to speed on the rapidly evolving world of AI with our roundup of the week's developments.
1 week ago - Content credentials should make it easier to understand where an image came from. They'll also help artists protect their work from AI.
1 week ago - How the integration of Google Gemini across Google Workspace turbo charges existing apps with AI power Sponsored Feature There's no doubt that generative AI (GenAI) is a revolutionary technology which has the power to fundamentally change...
4 days ago - Google and Meta targeted Omaha as a digital frontier. But their plans to push the energy transition forward there are not working out. A coal plant is filling the void.
1 month ago - The Department of Justice is reportedly deepening its probe into Nvidia. Officials have moved on from merely questioning competitors to subpoenaing...
Other stories
2 minutes ago - Embedded versions live longer – including Windows 10 LTSC Windows Embedded POSReady 7, the last supported version of Windows 7, has hit the end of the road nearly five years after the desktop edition.…
2 minutes ago - As the latest installment in the long-running Ys franchise nears its Western release, recent deep dives have revealed the radical under-the-hood changes made to the PC version by a support studio. Users can expect PC-only features like...
5 minutes ago - Susana Monsó chats with Ars about her new book, Playing Possum: How Animals Understand Death.
34 minutes ago - Why You Can Trust CNET Our expert, award-winning staff selects the products we cover and rigorously researches and tests our top picks. If you buy...
34 minutes ago - Why You Can Trust CNET Our expert, award-winning staff selects the products we cover and rigorously researches and tests our top picks. If you buy...