OpenAI and Broadcom have introduced a new custom chip, named Jalapeño, built specifically for large language model inference. The processor is designed solely for inference workloads, unlike Nvidia's Rubin GPUs which handle both training and inference. Early tests show Jalapeño delivers significantly better performance per watt than existing state-of-the-art chips.

Jalapeño's architecture is designed to minimize data movement, a key bottleneck in inference systems. OpenAI will pair the chip with Broadcom's Tomahawk 6 switches, which can process up to 1.6 terabits per second and manage network congestion. Custom servers, developed with Toronto-based Celestia Inc., will house the new chip.

The first Jalapeño-powered servers will go online before year-end. OpenAI calls it the "first step in a multi-generation compute platform," signaling plans for more inference or training chips in the future. The company may eventually sell Jalapeño-based appliances, similar to Nvidia's DGX systems, opening a new hardware revenue stream. This could boost investor interest ahead of its anticipated IPO, and differentiate it from rival Anthropic, which has also filed to go public.