Kling AI has launched its new 3.0 model lineup, rolling out Video 3.0, Video 3.0 Omni, Image 3.0, and Image 3.0 Omni as it pushes deeper into AI-powered video and image creation for creators. The models are now available in exclusive early access for Ultra subscribers, with a broader public release planned. Kling says the update brings stronger consistency, more precise narrative control, longer video generation of up to 15 seconds, and native audio generation across multiple languages and accents.
The 3.0 release is built around a unified multimodal training framework that supports text, image, audio, and video inputs and outputs in a single workflow. Kling says this allows creators to move between text-to-video, image-to-video, reference-based generation, and in-video editing without switching tools, while maintaining tighter adherence to prompts and visual continuity across scenes.
Video 3.0 focuses on cinematic control and consistency. Creators can upload reference images or videos to keep characters, objects, and environments visually coherent from shot to shot. The model also supports multi-shot storytelling, adjusting camera angles and transitions to follow structured narrative instructions such as dialogue scenes, cross-cutting, or voice-over sequences.
A key addition is native audio generation, which allows Video 3.0 to produce speech in multiple languages — including English, Chinese, Japanese, Korean, and Spanish — as well as different accents. Kling says the system can handle multi-character dialogue, with creators specifying speaking order and delivery. The model also improves text preservation within imagery, such as signs or logos, which Kling highlights as useful for advertising and branded content.
The Video 3.0 Omni variant expands on these capabilities with more advanced reference control. By uploading a reference video, creators can have the model extract both visual traits and voice characteristics and carry them consistently across new scenes. Omni also introduces a storyboard-style workflow, letting users define shot length, framing, perspective, narrative content, and camera movement on a per-shot basis.
Alongside video updates, Image 3.0 and Image 3.0 Omni add support for 2K and 4K output, targeting professional use cases such as virtual production, scene visualization, and high-resolution creative assets. Kling says the image models focus on realism, with improved handling of lighting, textures, and materials.
Kling AI says the 3.0 lineup builds on its earlier O1 and 2.6 series and reflects a shift toward what it calls “professional orchestration,” moving beyond basic generation toward more structured creative control. Since launching in June 2024, the company reports serving more than 60 million creators, generating over 600 million videos, and working with 30,000+ enterprise clients across industries including film, advertising, animation, and CGI.
With the 3.0 release, Kling is positioning its platform less as a novelty generator and more as a tool for end-to-end creative workflows, betting that improved control, consistency, and multimodal integration will matter as AI video creation becomes more widely adopted.
This analysis is based on reporting from prnewswire.
Image courtesy of Kling AI.
This article was generated with AI assistance and reviewed for accuracy and quality.