Ollama, a platform for running large language models locally, now supports Apple’s open-source MLX framework. The update improves caching performance and introduces support for Nvidia’s NVFP4 format, boosting memory efficiency.
These enhancements deliver faster performance on Macs equipped with Apple Silicon chips (M1 or later). The move comes as interest in local AI models grows beyond research circles.
Demand is rising as developers seek alternatives to costly cloud-based services like ChatGPT Codex and Claude Code. Ollama also recently expanded its Visual Studio Code integration.
The new capabilities are available in preview (Ollama 0.19) and currently support only one model: the 35-billion-parameter version of Alibaba’s Qwen3.5. Running it requires at least 32GB of RAM on an Apple Silicon Mac.