GPT-Realtime-2 is OpenAI’s latest voice model and the first in the lineup built with what the company describes as “GPT-5-class reasoning.” OpenAI said the model is designed to handle more complex spoken interactions, maintain conversational context, recover from interruptions, and use external tools while continuing a live dialogue.
The company said the model supports features such as parallel tool calls, longer context windows, controllable tone adjustments, and selectable reasoning levels ranging from minimal to “xhigh.” OpenAI also increased the model’s context window from 32K to 128K tokens to support longer sessions and more complex workflows.
“Together, the models we are launching move realtime audio from simple call-and-response toward voice interfaces that can actually do work: listen, reason, translate, transcribe, and take action as a conversation unfolds,” OpenAI said in its announcement.
OpenAI said GPT-Realtime-2 showed measurable gains over GPT-Realtime-1.5 in internal and external audio benchmarks. On Big Bench Audio, GPT-Realtime-2 scored 96.6% accuracy compared with 81.4% for the earlier model. On Audio MultiChallenge instruction-following tests, the company said GPT-Realtime-2 achieved a 48.5% pass rate versus 34.7% for GPT-Realtime-1.5.
The second release, GPT-Realtime-Translate, is aimed at multilingual voice conversations. The model supports more than 70 input languages and 13 output languages, allowing users to speak naturally while conversations are translated in real time.
OpenAI said the translation model is designed for customer support, education, events, media, and creator platforms that need live multilingual interactions. Deutsche Telekom and Vimeo are among the companies testing the system for real-time translated experiences.
“Building voice AI for India means handling diverse regional phonetics,” said Prateek Sachan, co-founder and CTO at BolnaAI. “In our evals across Hindi, Tamil, and Telugu, GPT-Realtime-Translate delivered 12.5% lower Word Error Rates than any other model we tested.
The company also launched GPT-Realtime-Whisper, a streaming speech-to-text model that transcribes conversations as users speak. OpenAI said the model is intended for live captions, meeting notes, customer support systems, healthcare workflows, recruiting, and other real-time transcription use cases.
Several companies, including Zillow, Priceline, Intercom, Glean, and Deutsche Telekom, are already testing the new voice tools. OpenAI said Zillow used GPT-Realtime-2 to improve conversational reliability and tool-calling performance for voice-based customer interactions.
“What stood out about GPT-Realtime-2 was the intelligence and tool-calling reliability it brings to complex voice interactions,” said Josh Weisberg, Zillow’s SVP and head of AI. “The combination of agentic competence and guardrail strength is what makes it viable for production voice at Zillow.”
OpenAI said the Realtime API includes safeguards designed to detect harmful or abusive interactions. The company said some conversations can be halted automatically if they violate usage policies, and developers can add additional safety layers through the Agents SDK.
The new models are available immediately through the Realtime API. GPT-Realtime-2 is priced at $32 per one million audio input tokens and $64 per one million audio output tokens. GPT-Realtime-Translate costs $0.034 per minute, while GPT-Realtime-Whisper is priced at $0.017 per minute.
This analysis is based on reporting from OpenAI.
Image courtesy of OpenAI.
This article was generated with AI assistance and reviewed for accuracy and quality.