tech 1 min read

OpenAI Rolls Out Three Audio Models for Real-Time Voice Agents

OpenAI launches GPT-Realtime-2, Translate, and Whisper APIs, enabling developers to build conversational voice agents with live translation and transcription.

2026-05-07T18:45:56.585Z

OpenAI Rolls Out Three Audio Models for Real-Time Voice Agents

OpenAI introduced three new audio models on Thursday, designed to power real-time voice agents.

The models move the company beyond simple transcription toward systems that can listen, translate, and act during live conversations.

GPT-Realtime-2 handles complex requests, maintains context across long sessions, and manages interruptions. GPT-Realtime-Translate supports live translation from over 70 languages into 13 output languages. GPT-Realtime-Whisper provides live speech-to-text for captions and meeting notes.

Early testers include Zillow, Priceline, and Deutsche Telekom.

Pricing starts at $32 per million audio input tokens for GPT-Realtime-2, $0.034 per minute for Translate, and $0.017 per minute for Whisper.

Topics OpenAI GPT-Realtime-2 GPT-Realtime-Translate GPT-Realtime-Whisper Zillow Priceline