Stability AI releases next-gen open-source Stable Diffusion 3.5 text-to-image AI model family

Generative artificial intelligence startup Stability AI Ltd. today announced the release of Stable Diffusion 3.5, which includes three next-generation open-source text-to-image AI model variants.

“In June, we released Stable Diffusion 3 Medium, the first open release from this series. This release didn’t fully meet our standards or our communities’ expectations,” the company said in the announcement. “After listening to the valuable community feedback, instead of a quick fix, we took the time to further develop a version that advances our mission to transform visual media.”

The three models in this release include Stable Diffusion 3.5 Large, 3.5 Large Turbo and 3.5 Medium. Each variant has been developed to meet the needs of scientific researchers, hobbyists and enterprise customers with increased levels of customization and accessibility for local and cloud deployments.

Large is an 8 billion-parameter model designed for prompt adherence and high-quality image production. It’s based on the standard Stable Diffusion family. The company said it’s ideal for professional users looking for 1-megapixel resolution graphics. It’s suitable for producing vivid images and digital assets for marketing campaigns and other similar enterprise use cases.

Large Turbo is a streamlined version of 3.5 Large that produces high-quality images while retaining exceptional prompt adherence with only four steps, which makes it much faster than the original version. It’s designed to produce images quickly without losing quality, making it good for rapid-generation workflows. Stability AI said that Turbo offers some of the fastest times to create images for its size in the industry and remains competitive for image quality and adherence to prompt, even compared with non-distilled models of its size.

The new Stable Diffusion 3.5 Medium weighs in at 2.6 billion parameters and Stability AI said it built an improved architecture and training method to provide a balance between quality and customization. The model is capable of efficiently producing images between 0.25- and 2-megapixel resolutions and is optimized to run on standard consumer hardware without heavy demands.

In developing the models, the company said, it used Query-Key Normalization in the AI transformers to help prioritize customizability and simplify fine-tuning. By doing this, developers will have an easier time customizing their model by tagging their inputs, but it also means that the model adheres better to specific natural language prompts. At the same time, prompts lacking specific wording are more likely to produce a broader range of image outputs.

“To support this level of downstream flexibility, we had to make some trade-offs,” the company said. “Greater variation in outputs from the same prompt with different seeds may occur, which is intentional as it helps preserve a broader knowledge base and diverse styles in the base models. However, as a result, prompts lacking specificity might lead to increased uncertainty in the output, and the aesthetic level may vary.”

Stability AI said the Stable Diffusion Medium 3.5 will be available on Oct. 29. All the models are open-source and available with the company’s community license, free for noncommercial use, and commercial use up to $1 million in annual revenue, after which companies must inquire about an enterprise license.

The model weights will be available shortly on Hugging Face for self-hosting. They can also be accessed through the Stability AI application programming interface as well as Replicate, Fireworks and ComfyUI. In the next few days, the ControlNets for the models will also be released to provide advanced control for the new models.

Images: Stability AI

Related stories

Other stories