Nvidia Launches Groq 3 Inference Chip for AI Agents

Nvidia launched the Groq 3 language processing unit at its GTC 2026 conference in San Jose, California. The chip is engineered specifically for AI inference in multi-agent systems, delivering ultra-low latency and massive throughput.

The Groq 3 LPU features 40 petabytes-per-second of bandwidth and integrates into dedicated LPX server racks with 256 processors. It’s designed to handle million-token contexts and support up to 1,500 tokens-per-second-critical for machine-to-machine AI communication.

Nvidia acquired Groq Inc.'s technology in a $20 billion deal, bringing on founder Jonathan Ross and President Sunny Madra. The Groq 3 acts as a co-processor to Nvidia’s new Rubin GPUs, enhancing performance across trillion-parameter models.

The Vera Rubin NVL72 rack combines Rubin GPUs and new Vera CPUs, optimized for agentic AI. Paired with Groq 3, the system delivers 35 times higher throughput per megawatt and 10 times greater revenue potential, according to Nvidia's Ian Buck.

Five new server racks were announced, including dedicated CPU, storage (Bluefield-4 STX), and networking (Spectrum-6 SPX) systems. Demand is surging as hyperscalers like AWS, Google, Microsoft, and Meta plan $650 billion in data center spending this year.

Nvidia’s data center revenue hit $193.5 billion in fiscal 2026, up from $116.2 billion the prior year. The Groq 3 positions Nvidia to dominate the next wave of AI infrastructure.