Two-year-old startup Mindbeam AI Inc. has released an open-source artificial intelligence inference framework, challenging the market’s heavy reliance on expensive graphics processing units.

The software library, called Litespark-Inference, enables ternary large language models to run efficiently on standard consumer processors from Apple, Intel, AMD, and Arm. The company claims throughput improvements ranging from 17- to 96-fold over conventional PyTorch implementations, while slashing memory requirements by more than 80%.

Mindbeam focuses on ternary neural networks that constrain weights to -1, 0, and +1, eliminating the overhead of large multiplication operations during inference. “We think from a different perspective,” said founder and CEO Nii Osae. “Is there a way that we can do inference with ternary bit models?”

As token inference costs rise, the firm argues that CPUs, sitting alongside GPUs in every system, are an underutilized resource. The technology positions the CPU inside the inference stack, acting as a complementary accelerator to help GPUs process more tokens rather than replacing them.

The framework supports two deployment models: fully local hardware inference without GPUs, and a disaggregated cloud architecture where processors work in tandem. In benchmarks, an Apple M5 processor reached nearly 40 tokens per second, up from just 2.3 in PyTorch. Systems utilizing Intel’s AVX-512 vector neural network instructions saw throughput hit nearly 34 tokens per second-a 96-fold improvement-while memory consumption fell from 4.6 gigabytes to under 800 megabytes.

Mindbeam released the source code on GitHub and is encouraging third-party benchmarks. The framework uses custom kernels that automatically detect and optimize for advanced single instruction, multiple data instructions. Initial support covers Apple Silicon, Intel, and AMD processors, with plans to target AWS Inferentia chips.

Future development will focus on power-sensitive robotics and edge computing. The company intends to commercialize cloud-focused versions later this year.