Google's DeepMind has launched Gemini 3.1 Flash TTS, a text-to-speech model designed for advanced voice control. Unlike previous iterations, this model allows users to direct vocal style, delivery, and pace using text commands.

Gemini 3.1 Flash TTS offers granular control over voice inflection and tone, with options like "enthusiastic," "positive surprise," and "informative." Users can select from various regional accents across major languages, including American "Valley" and "Southern," and multiple British variants like "Brixton" and "RP."

The model also features director-level controls for adjusting speaking style and pace, along with format templates for different applications such as podcast narration, audiobook reading, or news broadcasting.

Google stated that this "world-building context" enables characters to maintain consistency and interact naturally. These precise parameters can be exported as Gemini API code for cross-project voice application. The goal is to provide more natural-sounding speech in over 70 languages, including Japanese, Hindi, and German. All outputs are watermarked with SynthID for easy detection.

Gemini 3.1 Flash TTS achieved a strong second place on the Artificial Analysis TTS leaderboard, surpassing many other text-to-speech models. The model is available to developers via the Gemini API and Google AI Studio, and to enterprises through the Vertex AI platform.