Google is marketing Gemini 3.1 Flash Lite as the fastest and cheapest version of Gemini 3


Google today introduced Gemini 3.1 Flash Lite, a new model of artificial intelligence designed to provide faster response and lower operating costs in the Gemini 3 model family.

The preview model is available to developers via the Gemini API in Google AI Studio and to enterprise customers via Vertex AI.

Google described Gemini 3.1 Flash Lite as the fastest and most cost-effective model in the Gemini 3 series, built specifically for high-volume workloads where latency and cost are critical.

Pricing for the model starts at $0.25 per million input tokens and $1.50 per million output tokens, placing it as one of the cheapest options among Google’s current AI models.

According to benchmarks cited by Google, Gemini 3.1 Flash Lite delivers 2.5 times faster initial response time compared to Gemini 2.5 Flash and 45 percent faster performance while maintaining similar or better quality.

The performance indicators also make the model competitive with other lightweight AI models. Gemini 3.1 Flash Lite scored an Elo score of 1432 on the Arena AI leaderboard and recorded 86.9 percent on the GPQA Diamond benchmark and 76.8 percent on the MMMU Pro multimodal benchmark.

Google said the model is designed to handle high-frequency developer tasks such as translation, content moderation and large-scale instruction, while still supporting more complex workloads such as interface generation, simulation creation and data structure tasks.

The release also introduces configurable reasoning levels within AI Studio and Vertex AI, allowing developers to control how much reasoning the model performs based on the complexity of the task. This flexibility is designed to help teams balance cost, speed, and accuracy when deploying AI applications at scale.

Disclosure: This article was edited by Estefano Gómez. For more information on how to create and review content, see our Editorial Policy.

Add Comment