Inception Labs' Mercury 2, launched in February 2026, outperforms Google DeepMind's DiffusionGemma in maintaining sophisticated reasoning while generating text in parallel.
Mercury 2 achieves approximately 1,009 tokens per second on NVIDIA Blackwell GPUs, pricing at $0.25 per million input tokens and $0.75 per million output tokens, making it competitive against other budget-friendly models.
Google’s DiffusionGemma, released on June 10, 2026, is built on a 26B-parameter architecture but lacks the reasoning quality seen in Mercury 2, which retains superior performance in reasoning benchmarks while delivering parallel generation speed.
Founded in 2024, Inception Labs received $50 million in funding in 2025, while Google adopted an open-source approach for faster iteration.
The advancements of Mercury 2 offer potential implications for real-time applications and could reshape the AI inference infrastructure market, emphasizing the need for a reevaluation of valuable hardware configurations.