pwshub.com

ChatGPT Advanced Voice Mode impresses testers with sound effects, catching its breath

I Am the Very Model of a Modern Major-General —

AVM allows uncanny real-time voice conversations with ChatGPT that you can interrupt.

Stock Photo: AI Cyborg Robot Whispering Secret Or Interesting Gossip

Enlarge / A stock photo of a robot whispering to a man.

On Tuesday, OpenAI began rolling out an alpha version of its new Advanced Voice Mode to a small group of ChatGPT Plus subscribers. This feature, which OpenAI previewed in May with the launch of GPT-4o, aims to make conversations with the AI more natural and responsive. In May, the feature triggered criticism of its simulated emotional expressiveness and prompted a public dispute with actress Scarlett Johansson over accusations that OpenAI copied her voice. Even so, early tests of the new feature shared by users on social media have been largely enthusiastic.

In early tests reported by users with access, Advanced Voice Mode allows them to have real-time conversations with ChatGPT, including the ability to interrupt the AI mid-sentence almost instantly. It can sense and respond to a user's emotional cues through vocal tone and delivery, and provide sound effects while telling stories.

But what has caught many people off-guard initially is how the voices simulate taking a breath while speaking.

"ChatGPT Advanced Voice Mode counting as fast as it can to 10, then to 50 (this blew my mind—it stopped to catch its breath like a human would)," wrote tech writer Cristiano Giardina on X.

Advanced Voice Mode simulates audible pauses for breath because it was trained on audio samples of humans speaking that included the same feature. The model has learned to simulate inhalations at seemingly appropriate times after being exposed to hundreds of thousands, if not millions, of examples of human speech. Large language models (LLMs) like GPT-4o are master imitators, and that skill has now extended to the audio domain.

Giardina shared his other impressions about Advanced Voice Mode on X, including observations about accents in other languages and sound effects.

"It’s very fast, there’s virtually no latency from when you stop speaking to when it responds," he wrote. "When you ask it to make noises it always has the voice “perform” the noises (with funny results). It can do accents, but when speaking other languages it always has an American accent. (In the video, ChatGPT is acting as a soccer match commentator)"

Speaking of sound effects, X user Kesku, who is a moderator of OpenAI's Discord server, shared an example of ChatGPT playing multiple parts with different voices and another of a voice recounting an audiobook-sounding sci-fi story from the prompt, "Tell me an exciting action story with sci-fi elements and create atmosphere by making appropriate noises of the things happening using onomatopoeia."

Kesku also ran a few example prompts for us, including a story about the Ars Technica mascot "Moonshark."

He also asked it to sing the "Major-General's Song" from Gilbert and Sullivan's 1879 comic opera The Pirates of Penzance:

Frequent AI advocate Manuel Sainsily posted a video of Advanced Voice Mode reacting to camera input, giving advice about how to care for a kitten. "It feels like face-timing a super knowledgeable friend, which in this case was super helpful—reassuring us with our new kitten," he wrote. "It can answer questions in real-time and use the camera as input too!"

Of course, being based on an LLM, it may occasionally confabulate incorrect responses on topics or in situations where its "knowledge" (which comes from GPT-4o's training data set) is lacking. But if considered a tech demo or an AI-powered amusement and you're aware of the limitations, Advanced Voice Mode seems to successfully execute many of the tasks shown by OpenAI's demo in May.

Safety

An OpenAI spokesperson told Ars Technica that the company worked with more than 100 external testers on the Advanced Voice Mode release, collectively speaking 45 different languages and representing 29 geographical areas. The system is reportedly designed to prevent impersonation of individuals or public figures by blocking outputs that differ from OpenAI's four chosen preset voices.

OpenAI has also added filters to recognize and block requests to generate music or other copyrighted audio, which has gotten other AI companies in trouble. Giardina reported audio "leakage" in some audio outputs that have unintentional music in the background, showing that OpenAI trained the AVM voice model on a wide variety of audio sources, likely both from licensed material and audio scraped from online video platforms.

Availability

OpenAI plans to expand access to more ChatGPT Plus users in the coming weeks, with a full launch to all Plus subscribers expected this fall. A company spokesperson told Ars that users in the alpha test group will receive a notice in the ChatGPT app and an email with usage instructions.

Since the initial preview of GPT-4o voice in May, OpenAI claims to have enhanced the model's ability to support millions of simultaneous, real-time voice conversations while maintaining low latency and high quality. In other words, they are gearing up for a rush that will take a lot of back-end computation to accommodate.

Source: arstechnica.com

Related stories
3 weeks ago - Get up to speed on the rapidly evolving world of AI with our roundup of the week's developments.
1 month ago - On Thursday, OpenAI released the "system card" for ChatGPT's new GPT-4o AI model that details model limitations and safety testing procedures. Among...
2 weeks ago - I spoke with the musician and entrepreneur about Radio.fyi, which will let you interact with AI personas about music, news, sports and culture.
1 month ago - I don't have a relationship with ChatGPT despite lots of time spent using it. After all, it's just a generative AI chatbot with a knack for...
1 month ago - AI company OpenAI is beginning to roll out advanced voice features for its ChatGPT chatbot to a small number of ChatGPT Plus subscribers in an early...
Other stories
8 minutes ago - The Indian government has approved $2.7 billion in new spending for its space program.
8 minutes ago - heard you like apps — Windows App replaces Microsoft Remote Desktop on macOS, iOS, and Android. Enlarge / The...
8 minutes ago - LinkedIn limits opt-outs to future training, warns AI models may spout personal data.
8 minutes ago - BUSTED — iServer provided a simple service for phishing credentials to unlock phones. Getty Images ...
35 minutes ago - European regulators want Apple to open up device pairing, notifications and more to other companies' products.