OpenAI introduced three new audio models on Thursday, designed to power real-time voice agents.

The models move the company beyond simple transcription toward systems that can listen, translate, and act during live conversations.

GPT-Realtime-2 handles complex requests, maintains context across long sessions, and manages interruptions. GPT-Realtime-Translate supports live translation from over 70 languages into 13 output languages. GPT-Realtime-Whisper provides live speech-to-text for captions and meeting notes.

Early testers include Zillow, Priceline, and Deutsche Telekom.

Pricing starts at $32 per million audio input tokens for GPT-Realtime-2, $0.034 per minute for Translate, and $0.017 per minute for Whisper.