Gemini 3.1 Flash Live builds on earlier versions with improved reasoning and task execution. On benchmarks that measure multi-step audio-based function calls and instruction-following, the model outperformed its predecessor, reflecting gains in reliability for developers building voice-driven applications. The system is also better at interpreting tone and speech patterns, allowing it to adjust responses based on cues like frustration or hesitation.
The update extends to enterprise use cases, where companies are using the model to power customer-facing agents. Google said the model performs more effectively in recognizing nuances in speech, including pitch and pacing, and can operate in noisy environments while maintaining accuracy. Businesses including Verizon, LiveKit, and The Home Depot have tested the model in their workflows.
For consumers, Gemini Live now responds more quickly and can maintain longer conversations without losing context. Google said the model can track a conversation thread for roughly twice as long as before, improving continuity during extended interactions.
The release also supports a broader expansion of Search Live, enabling real-time, multilingual conversations in more than 200 countries and territories. The model is designed to handle multiple languages natively, allowing users to interact in their preferred language without switching systems.
Google said all audio generated by the model includes a SynthID watermark, intended to help identify AI-generated content and reduce misuse.
The rollout reflects Google’s effort to improve the usability of voice interfaces as competition among AI platforms shifts toward more natural, real-time interaction.
This analysis is based on reporting from Google.
Image courtesy of Google.
This article was generated with AI assistance and reviewed for accuracy and quality.