OpenAI Unveils AI Voice Models That Can Talk, Translate, and Transcribe Live

May 8, 2026
OpenAI Unveils AI Voice Models That Can Talk, Translate, and Transcribe Live

OpenAI on Thursday introduced three new audio models for its API, expanding its push into real-time voice applications with tools for conversational AI, live translation, and streaming transcription.

The company launched GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper through its Realtime API, giving developers new ways to build voice-based products that can speak, transcribe, translate, and respond during live conversations.

GPT-Realtime-2 is OpenAI’s latest voice model and the first in the lineup built with what the company describes as “GPT-5-class reasoning.” OpenAI said the model is designed to handle more complex spoken interactions, maintain conversational context, recover from interruptions, and use external tools while continuing a live dialogue.

The company said the model supports features such as parallel tool calls, longer context windows, controllable tone adjustments, and selectable reasoning levels ranging from minimal to “xhigh.” OpenAI also increased the model’s context window from 32K to 128K tokens to support longer sessions and more complex workflows.

“Together, the models we are launching move realtime audio from simple call-and-response toward voice interfaces that can actually do work: listen, reason, translate, transcribe, and take action as a conversation unfolds,” OpenAI said in its announcement.

OpenAI said GPT-Realtime-2 showed measurable gains over GPT-Realtime-1.5 in internal and external audio benchmarks. On Big Bench Audio, GPT-Realtime-2 scored 96.6% accuracy compared with 81.4% for the earlier model. On Audio MultiChallenge instruction-following tests, the company said GPT-Realtime-2 achieved a 48.5% pass rate versus 34.7% for GPT-Realtime-1.5.

The second release, GPT-Realtime-Translate, is aimed at multilingual voice conversations. The model supports more than 70 input languages and 13 output languages, allowing users to speak naturally while conversations are translated in real time.

OpenAI said the translation model is designed for customer support, education, events, media, and creator platforms that need live multilingual interactions. Deutsche Telekom and Vimeo are among the companies testing the system for real-time translated experiences.

“Building voice AI for India means handling diverse regional phonetics,” said Prateek Sachan, co-founder and CTO at BolnaAI. “In our evals across Hindi, Tamil, and Telugu, GPT-Realtime-Translate delivered 12.5% lower Word Error Rates than any other model we tested.

The company also launched GPT-Realtime-Whisper, a streaming speech-to-text model that transcribes conversations as users speak. OpenAI said the model is intended for live captions, meeting notes, customer support systems, healthcare workflows, recruiting, and other real-time transcription use cases.

Several companies, including Zillow, Priceline, Intercom, Glean, and Deutsche Telekom, are already testing the new voice tools. OpenAI said Zillow used GPT-Realtime-2 to improve conversational reliability and tool-calling performance for voice-based customer interactions.

“What stood out about GPT-Realtime-2 was the intelligence and tool-calling reliability it brings to complex voice interactions,” said Josh Weisberg, Zillow’s SVP and head of AI. “The combination of agentic competence and guardrail strength is what makes it viable for production voice at Zillow.”

OpenAI said the Realtime API includes safeguards designed to detect harmful or abusive interactions. The company said some conversations can be halted automatically if they violate usage policies, and developers can add additional safety layers through the Agents SDK.

The new models are available immediately through the Realtime API. GPT-Realtime-2 is priced at $32 per one million audio input tokens and $64 per one million audio output tokens. GPT-Realtime-Translate costs $0.034 per minute, while GPT-Realtime-Whisper is priced at $0.017 per minute.

This analysis is based on reporting from OpenAI.

Image courtesy of OpenAI.

This article was generated with AI assistance and reviewed for accuracy and quality.

Last updated: May 8, 2026

About this article: This article was generated with AI assistance and reviewed by our editorial team to ensure it follows our editorial standards for accuracy and independence. We maintain strict fact-checking protocols and cite all sources.

Word count: 565Reading time: 0 minutes

AI Tools for this Article

📧 Stay Updated

Get the latest AI news delivered to your inbox every morning.

Browse All Articles
Share this article:
Next Article

AI News Daily

Breaking Intelligence • Since 2023

Join hundreds of thousands of AI professionals who start their day with our curated newsletter. Get breaking news, expert analysis, and exclusive insights.

Stay Ahead of AI

Get the latest AI breakthroughs, tools, and insights delivered to your inbox every week.

Free forever Unsubscribe anytime No spam guarantee

Go Premium

Unlock unlimited AI tools and an ad-free reading experience designed for AI professionals.

• Ad-free experience• Premium AI tools
Start Free Trial

14-day free trial • Cancel anytime
Plus $9/mo • Pro $90/yr (2 months free)

Follow Our Community

ChatAI

Breaking Intelligence

Your daily briefing on what matters in AI. Trusted by developers, researchers, executives, and AI enthusiasts worldwide.

© 2026 ChatAI. All rights reserved.