Nvidia unveiled the BlueField-4 STX reference architecture at its GTC developer event. Designed for AI clusters, it redefines storage infrastructure for systems that process massive context and learn continuously.

The architecture centers on the BlueField-4 data processing unit (DPU), offloading infrastructure tasks from CPUs to free GPU capacity. It integrates Spectrum-X Ethernet switches and ConnectX-9 SuperNICs, leveraging RDMA to bypass CPU and OS bottlenecks.

Nvidia claims BlueField-4 STX delivers up to five times faster token processing and fourfold energy efficiency gains over prior storage stacks.

Its first rack-scale implementation is CMX - a high-speed flash storage system optimized for key-value caches used by large language models. CMX skips redundant data protection algorithms, reducing hardware overhead without compromising AI workload integrity.

Early adopters include Oracle Corp., Mistral AI SAS, and CoreWeave Inc. Systems are expected to ship in H2 2026.