Google's Image and Video AI Models Receive Major Upgrades

Google has announced a major update to its image and video AI models, adding new features and fixing one of Gemini’s biggest limitations.

Google has been aggressively improving its AI models, as the company competes with OpenAI, Anthropic, Microsoft, and others. Google has been making major strides with Gemini, and has now drastically improved its Imagen image generator, Veo video editor, Flow filmmaking generator, and the Lyria music generator.

Veo

Veo 3’s biggest upgrade is the ability to generate videos with audio.

Veo 3, our new state-of-the-art video generation model, not only improves on the quality of Veo 2, but for the first time, can also generate videos with audio — traffic noises in the background of a city street scene, birds singing in a park, even dialogue between characters.

Across the board, Veo 3 excels from text and image prompting to real-world physics and accurate lip syncing. It’s great at understanding; you can tell a short story in your prompt, and the model gives you back a clip that brings it to life.

Interestingly, Google is continuing to develop Veo 2, adding a number of new features, including reference powered video capability, camera controls, outpainting, and the ability to add and remove objects from videos.

Flow

In addition to bringing reference powered video and camera controls from Veo 2 to Flow, the company has the filmmaking generator the ability to pull from Google DeepMind’s various components to create cinematic videos.

Built with and for creatives, Flow is an AI filmmaking tool that lets you seamlessly create cinematic clips, scenes and stories by bringing together Google DeepMind’s most advanced models: Veo, Imagen and Gemini. Use natural language to describe your shots to Flow, manage the ingredients for your story — cast, locations, objects and styles — in a single convenient place, and use Flow to weave your narrative into beautiful scenes.

Lyria

Google has brought a number of improvements to Lyria 2, including integration with YouTube Shorts.

In April, we expanded access to Music AI Sandbox, powered by Lyria 2. Music AI Sandbox offers musicians, producers and songwriters a set of experimental tools, which can spark new creative possibilities and help artists explore unique musical ideas. The expertise and valuable feedback from the music industry help us ensure our tools empower creators, while inviting creatives to realize the possibilities of AI in their art.

Lyria 2 brings powerful composition and endless exploration, and is now available for creators through YouTube Shorts and enterprises in Vertex AI. We’ve also made Lyria RealTime, our interactive music generation model which powers MusicFX DJ, available via an API and in AI Studio. Lyria RealTime allows anyone to interactively create, control, and perform generative music in real time.

Imagen

Imagen 4 has some of the most the most impressive upgrades, ones the vast majority of users will be able to benefit from.

Our latest Imagen model combines speed with precision to create stunning images. Imagen 4 has remarkable clarity in fine details like intricate fabrics, water droplets, and animal fur, and excels in both photorealistic and abstract styles. Imagen 4 can create images in a range of aspect ratios and up to 2k resolution – even better for printing or presentations. It is also significantly better at spelling and typography, making it easier to create your own greeting cards, posters and even comics.

The ability to create images in different aspect ratios is particularly welcome. Since Google added image generation to Gemini, its abilities have been limited to 1:1 ratio. In the example below, Gemini was prompted to create an image of a cabin by the lake, and specifically told us use a 16:9 ratio.

In contrast, after the Imagen 4 upgrade, Gemini created the following image using the same prompt.

Google is clearly making major headway with its AI models, and the its most recent upgrades represent one of its biggest leaps forward.

Google’s Image and Video AI Models Receive Major Upgrades

Notice an error?

Ready to get started?