Inception Labs has launched Mercury 2, claiming it as the fastest reasoning language model with a processing speed of approximately 1,000 tokens per second, surpassing competitors like Anthropic's Claude Haiku 4.5 and OpenAI's GPT-5 Mini.

While Google's DiffusionGemma approaches similar speeds, it performs significantly worse in benchmarks. On the AIME 2026, Mercury 2 scored an impressive 90%, whereas DiffusionGemma only managed 69.1%. Even Google's standard Gemma 4 performed better, at 88.3%.

Another evaluation on the GPQA benchmark saw Mercury 2 attaining 77% accuracy compared to DiffusionGemma's 73.2%. In practical applications, Augment Code reported an 82% latency decrease and a 90% reduction in costs after switching to Mercury 2, maintaining output quality.

Developed from research by Stefano Ermon at Stanford, Inception's model is supported by $50 million in funding, backed by Nvidia's venture arm and notable investors like Andrew Ng.

Mercury 2 stands out by enhancing user experience through seamless interactions, offering real-time coding and swift sub-agent systems, making it ideal for applications requiring high speed without sacrificing performance. However, it is not open-weight, available only as a paid API model, and may not yet outperform larger models in complex reasoning tasks.