Google updates Gemini with new image generation model, custom Gem chatbots

Google LLC is updating its Gemini artificial intelligence assistant with a set of new image generation and customization features.

The company first previewed the enhancements at its Google I/O product event in May. They are set to become available in both the consumer and business versions of Gemini.

Originally introduced last year as Bard, Gemini is a chatbot powered by an eponymous series of large language models. It can generate text, craft software code, solve math problems and perform related tasks. Gemini is available in a free version, a subscription-based tier for consumers that offers additional features and two other paid versions geared towards organizations.

As part of the update announced today, Google is equipping the chatbot with a new image generation model called Imagen 3. Compared with its predecessor, the model is better at generating photorealistic images and following long, complicated user instructions. If Imagen 3 nevertheless fails to generate an image in a manner that aligns with the provided instructions, users can ask it to make changes by entering a follow-up prompt.

Imagen 3 is a so-called latent diffusion model. It doesn’t process images in their raw form, but rather turns them into a mathematical structure called a latent space. Such structures contain only the most important data points from a file and discard the rest. This arrangement effectively compresses the files that an AI processes, which allows it to analyze them using less hardware than would otherwise be needed and thereby lowers costs.

In conjunction with the rollout of Imagen 3, Google plans to reactivate Gemini’s feature for generating images of people. The search giant disabled the capability in February after users discovered that it generated historically inaccurate images. At the time, Google pledged to “significantly” improve the feature before reactivating it.

Dave Citron, the senior director of product management for Gemini Experiences, detailed some of those improvements in a blog post today. He wrote that Gemini’s feature for generating images of people was evaluated using an improved version of Google’s AI reliability testing workflows. Additionally, the search giant has equipped Imagen 3 with guardrails designed to stop it from generating harmful content.

“We don’t support the generation of photorealistic, identifiable individuals, depictions of minors or excessively gory, violent or sexual scenes,” Citron wrote. “Of course, as with any generative AI tool, not every image Gemini creates will be perfect, but we’ll continue to listen to feedback from early users as we keep improving.”

Imagen 3 is rolling out alongside another new Gemini capability called Gems. The latter addition allows users to create customized versions of the chatbot optimized for a specific set of tasks.

The customization process involves providing Gemini with instructions that specify how it should generate prompt responses. A user can, for example, instruct Gemini to output text in a particular style and then save the introduction as a so-called Gem. When activated, the Gem ensures that Gemini always generates output text in the requested style, which removes the need to manually repeat the request with every prompt.

Google is releasing several premade Gems in conjunction with the feature’s release. They’re designed for tasks such as troubleshooting code and generating writing tips. Google has also built a more general-purpose Gem capable of explaining complicated topics.

Related stories

Other stories