Google is expanding Gemini’s generative AI capabilities into video with the launch of Gemini Omni, a new multimodal model family designed to create and edit videos using combinations of text, images, audio, and video inputs. The company said the first release, Gemini Omni Flash, is rolling out through the Gemini app, Google Flow, and YouTube Shorts.
The launch marks Google’s latest effort to push Gemini beyond text and image generation into more advanced media creation workflows. Omni is built to handle multiple input types at once, allowing users to combine reference images, voice audio, existing video clips, and written prompts to generate or modify video content through conversational commands.
Google says the system can make iterative edits while maintaining visual continuity across scenes. Users can adjust environments, camera angles, styles, or specific objects over multiple prompts without resetting the original composition.
The company is also emphasizing Gemini’s reasoning capabilities as part of the product’s core positioning. Google said Omni combines Gemini’s “real-world knowledge” with video generation, allowing the model to create scenes informed by concepts such as gravity, fluid motion, scientific topics, and cultural context. One demonstration showed a claymation-style explainer about protein folding, while another generated chain-reaction sequences involving rolling marbles.
Omni is designed to work with reference materials across formats. Users can upload sketches, photos, video clips, or audio files and use them to guide the visual style, pacing, or motion of generated content. At launch, audio support will focus on voice references, with broader audio input support planned later.
Google is also introducing avatar-based video generation tied to a user’s own voice and likeness. The company said users can create a digital version of themselves to generate videos that “look and sound like you,” though broader capabilities for editing speech and audio are still being tested.
To address authenticity concerns, Google said all videos generated with Omni will include SynthID, the company’s invisible digital watermarking system. Generated videos can also be verified through the Gemini app, Gemini in Chrome, and Google Search.
Gemini Omni Flash is rolling out to Google AI Plus, Pro, and Ultra subscribers globally through the Gemini app and Google Flow. Google also said the model will be available at no cost for YouTube Shorts and the YouTube Create app starting this week, with API access for developers and enterprise customers planned in the coming weeks.
About this article: This article was generated with AI assistance and reviewed by our editorial team to ensure it follows our editorial standards for accuracy and independence. We maintain strict fact-checking protocols and cite all sources.
Word count: 420Reading time: 0 minutes
Explore More AI Resources
Continue with high-value guides related to this topic.