OpenAI Unveils AI Voice Models That Can Talk, Translate, and Transcribe Live

May 8, 2026

OpenAI on Thursday introduced three new audio models for its API, expanding its push into real-time voice applications with tools for conversational AI, live translation, and streaming transcription.

The company launched GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper through its Realtime API, giving developers new ways to build voice-based products that can speak, transcribe, translate, and respond during live conversations.

GPT-Realtime-2 is OpenAI’s latest voice model and the first in the lineup built with what the company describes as “GPT-5-class reasoning.” OpenAI said the model is designed to handle more complex spoken interactions, maintain conversational context, recover from interruptions, and use external tools while continuing a live dialogue.

The company said the model supports features such as parallel tool calls, longer context windows, controllable tone adjustments, and selectable reasoning levels ranging from minimal to “xhigh.” OpenAI also increased the model’s context window from 32K to 128K tokens to support longer sessions and more complex workflows.

“Together, the models we are launching move realtime audio from simple call-and-response toward voice interfaces that can actually do work: listen, reason, translate, transcribe, and take action as a conversation unfolds,” OpenAI said in its announcement.

OpenAI said GPT-Realtime-2 showed measurable gains over GPT-Realtime-1.5 in internal and external audio benchmarks. On Big Bench Audio, GPT-Realtime-2 scored 96.6% accuracy compared with 81.4% for the earlier model. On Audio MultiChallenge instruction-following tests, the company said GPT-Realtime-2 achieved a 48.5% pass rate versus 34.7% for GPT-Realtime-1.5.

The second release, GPT-Realtime-Translate, is aimed at multilingual voice conversations. The model supports more than 70 input languages and 13 output languages, allowing users to speak naturally while conversations are translated in real time.

OpenAI said the translation model is designed for customer support, education, events, media, and creator platforms that need live multilingual interactions. Deutsche Telekom and Vimeo are among the companies testing the system for real-time translated experiences.

“Building voice AI for India means handling diverse regional phonetics,” said Prateek Sachan, co-founder and CTO at BolnaAI. “In our evals across Hindi, Tamil, and Telugu, GPT-Realtime-Translate delivered 12.5% lower Word Error Rates than any other model we tested.

The company also launched GPT-Realtime-Whisper, a streaming speech-to-text model that transcribes conversations as users speak. OpenAI said the model is intended for live captions, meeting notes, customer support systems, healthcare workflows, recruiting, and other real-time transcription use cases.

Several companies, including Zillow, Priceline, Intercom, Glean, and Deutsche Telekom, are already testing the new voice tools. OpenAI said Zillow used GPT-Realtime-2 to improve conversational reliability and tool-calling performance for voice-based customer interactions.

“What stood out about GPT-Realtime-2 was the intelligence and tool-calling reliability it brings to complex voice interactions,” said Josh Weisberg, Zillow’s SVP and head of AI. “The combination of agentic competence and guardrail strength is what makes it viable for production voice at Zillow.”

OpenAI said the Realtime API includes safeguards designed to detect harmful or abusive interactions. The company said some conversations can be halted automatically if they violate usage policies, and developers can add additional safety layers through the Agents SDK.

The new models are available immediately through the Realtime API. GPT-Realtime-2 is priced at $32 per one million audio input tokens and $64 per one million audio output tokens. GPT-Realtime-Translate costs $0.034 per minute, while GPT-Realtime-Whisper is priced at $0.017 per minute.

This analysis is based on reporting from OpenAI.

Image courtesy of OpenAI.

This article was generated with AI assistance and reviewed for accuracy and quality.

Last updated: May 8, 2026

Report Error

About this article: This article was generated with AI assistance and reviewed by our editorial team to ensure it follows our editorial standards for accuracy and independence. We maintain strict fact-checking protocols and cite all sources.

Word count: 565Reading time: 0 minutes

Explore More AI Resources

Continue with high-value guides related to this topic.

Compare AI Models

See ChatGPT, Claude, and Gemini side-by-side in one place.

Best AI Newsletters

Find top AI newsletters and subscribe to ChatAI Weekly.

AI FAQ

Quick answers about ChatAI, AI tools, and multi-model chat.

AI Tools

Use free AI tools for summarization, translation, and more.

AI Tools for this Article

📧 Stay Updated

Get the latest AI news delivered to your inbox every morning.

Share this article:

Continue Reading

ElevenLabs’ New Dubbing v2 Brings Emotion-Preserving AI Translation to Video

ElevenLabs has introduced Dubbing v2, a new AI dubbing model designed to preserve a speaker’s original tone, pacing, and emotional delivery when translating speech into more than 90 languages. The...

May 29, 2026•5 min read

Microsoft Unveils Major Copilot Redesign Across Word, Excel, and PowerPoint

Microsoft is rolling out a redesigned Microsoft 365 Copilot experience, introducing a simplified interface, a new adaptive prompt workspace, and a more consistent AI experience across Word, Excel,...

May 29, 2026•5 min read

Asana Buys StackAI in Major Push Toward AI-Powered Enterprise Automation

Asana has acquired StackAI, a no-code AI agent platform designed to connect and automate workflows across enterprise systems, as the company pushes deeper into what it describes as “human-agent” work...

May 28, 2026•5 min read

Explore All Articles

OpenAI Unveils AI Voice Models That Can Talk, Translate, and Transcribe Live

Explore More AI Resources

Compare AI Models

Best AI Newsletters

AI FAQ

AI Tools

AI Tools for this Article

Settings

📧 Stay Updated

Related Articles

ElevenLabs’ New Dubbing v2 Brings Emotion-Preserving AI Translation to Video

Microsoft Unveils Major Copilot Redesign Across Word, Excel, and PowerPoint

Asana Buys StackAI in Major Push Toward AI-Powered Enterprise Automation

Continue Reading

ElevenLabs’ New Dubbing v2 Brings Emotion-Preserving AI Translation to Video

Microsoft Unveils Major Copilot Redesign Across Word, Excel, and PowerPoint

Asana Buys StackAI in Major Push Toward AI-Powered Enterprise Automation

AI News Daily

Stay Ahead of AI

Go Premium

Follow Our Community

ChatAI

AI News Daily

Go Premium

ChatAI

Follow Our Community