Google Unveils Gemini 3.1 Flash Lite: Speed and Savings in AI

Google has launched Gemini 3.1 Flash Lite, the latest addition to its AI model family. This new model is engineered for speed and cost efficiency, targeting high-volume workloads where response time and operational expenses are paramount.

Gemini 3.1 Flash Lite is now available to developers via the Gemini API and to enterprise clients through Vertex AI. Google states it is the fastest and most economical model within the Gemini 3 series.

Pricing begins at $0.25 per million input tokens and $1.50 per million output tokens, making it a budget-friendly option in Google's AI offerings. Benchmarks indicate a 2.5 times improvement in time to first answer token compared to Gemini 2.5 Flash, with output speeds 45 percent faster while maintaining quality.

Performance metrics show competitive results, including an Elo score of 1432 on the Arena AI leaderboard and high scores on reasoning and multimodal benchmarks. The model is designed for tasks like translation, content moderation, and instruction following, as well as more complex applications such as interface generation and structured data processing.

The release also features adjustable thinking levels, allowing developers to fine-tune the model's reasoning based on task complexity, balancing cost, speed, and accuracy for large-scale AI deployments.