Trillion-Parameter AI Runs on Budget GPU with Legacy Memory

A Chinese AI enthusiast known as APFrisco demonstrated Moonshot AI's Kimi K2.5 model, a Mixture-of-Experts large language model with 1 trillion total parameters, running on a single Nvidia RTX 3060 GPU paired with 768 GB of Intel Optane Persistent Memory. The setup achieved roughly four tokens per second.

Kimi K2.5 activates only 32 billion parameters per token. The full model weighs approximately 630 GB; quantized versions are around 381 GB. That's why APFrisco needed 768 GB of Intel Optane Persistent Memory. Intel discontinued its Optane line, making these modules legacy hardware from the second-hand market. They're slower than traditional DRAM but vastly cheaper per gigabyte.

The RTX 3060 launched in early 2021 with 12 GB of VRAM, designed for 1080p gaming, not frontier AI models. High-performance inference for Kimi K2.5 typically targets configurations with up to 8 high-end GPUs, delivering 10 to 300-plus tokens per second.

Kimi K2.5 was released on January 27, 2026, by Moonshot AI. It features multimodal capabilities and was trained on roughly 15 trillion mixed visual and text tokens. It's an open-weight model, enabling such experiments.