Google DeepMind has raised the stakes in the AI video arms race. At Google I/O on May 19, the company introduced Gemini Omni, a new family of multimodal AI models designed for generating and editing video content.

The first model in the lineup, Gemini Omni Flash, accepts text, images, audio, and existing video as inputs, then outputs video clips with audio up to 10 seconds long.

Gemini Omni Flash replaces Veo as Google’s default video generation model. Key improvements center on reasoning, physics simulation, continuity, and scene consistency. Natural-language editing is the headline feature: users describe changes in plain English, and the model executes them within the video’s existing context.

The model is already integrated into YouTube Shorts for Google AI Plus, Pro, and Ultra subscribers. Beyond YouTube, it is available to paid Gemini app users and through Google Flow, with API access planned for developers.