Gemini 3.1 Flash TTS

Text-to-Speech • Google • Proxied

Usage

TypeScript
cURL

const response = await env.AI.run(
  'google/gemini-3.1-flash-tts',
  { text: 'Hello, welcome to Cloudflare AI Gateway!' },
)
console.log(response)

curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run \
  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "model": "google/gemini-3.1-flash-tts",
  "input": {
    "text": "Hello, welcome to Cloudflare AI Gateway!"
  }
}'

Output
Raw response

{
  "audio": "data:audio/l16;base64,...",
  "gatewayMetadata": {
    "keySource": "Unified"
  }
}

Examples

Custom Voice — Generate speech with a specific voice

TypeScript
cURL

const response = await env.AI.run(
  'google/gemini-3.1-flash-tts',
  { text: 'The quick brown fox jumps over the lazy dog.', voice: 'Puck' },
)
console.log(response)

curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run \
  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "model": "google/gemini-3.1-flash-tts",
  "input": {
    "text": "The quick brown fox jumps over the lazy dog.",
    "voice": "Puck"
  }
}'

Output
Raw response

{
  "audio": "data:audio/l16;base64,...",
  "gatewayMetadata": {
    "keySource": "Unified"
  }
}

Longer Text — Convert longer text to speech

TypeScript
cURL

const response = await env.AI.run(
  'google/gemini-3.1-flash-tts',
  {
    text: 'Artificial intelligence has transformed the way we interact with technology. From voice assistants to autonomous vehicles, AI is reshaping our daily lives and creating new possibilities for innovation.',
    voice: 'Charon',
  },
)
console.log(response)

curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run \
  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "model": "google/gemini-3.1-flash-tts",
  "input": {
    "text": "Artificial intelligence has transformed the way we interact with technology. From voice assistants to autonomous vehicles, AI is reshaping our daily lives and creating new possibilities for innovation.",
    "voice": "Charon"
  }
}'

Output
Raw response

{
  "audio": "data:audio/l16;base64,...",
  "gatewayMetadata": {
    "keySource": "Unified"
  }
}

Narrative Voice — Generate speech with a narrative voice style

TypeScript
cURL

const response = await env.AI.run(
  'google/gemini-3.1-flash-tts',
  {
    text: 'Once upon a time, in a kingdom far away, there lived a brave knight who sought to protect the realm from all dangers.',
    voice: 'Kore',
  },
)
console.log(response)

curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run \
  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "model": "google/gemini-3.1-flash-tts",
  "input": {
    "text": "Once upon a time, in a kingdom far away, there lived a brave knight who sought to protect the realm from all dangers.",
    "voice": "Kore"
  }
}'

Output
Raw response

{
  "audio": "data:audio/l16;base64,...",
  "gatewayMetadata": {
    "keySource": "Unified"
  }
}

maxOutputTokens

integerexclusiveMinimum: 0maximum: 9007199254740991Maximum number of tokens to generate

▶stopSequences[]

arraySequences where the model will stop generating further tokens

temperature

numbermaximum: 2minimum: 0Controls randomness in generation (0-2)

text

stringrequiredmaxLength: 10000The text to convert to speech. Maximum 10,000 characters.

topK

integerexclusiveMinimum: 0maximum: 9007199254740991Only sample from the top K tokens. Smaller K = more focused, larger K = more diverse

topP

numbermaximum: 1minimum: 0Nucleus sampling threshold (0-1). Tokens with cumulative probability up to topP are considered

voice

stringenum: Zephyr, Puck, Charon, Kore, Fenrir, Leda, Orus, Aoede, Callirrhoe, Autonoe, Enceladus, Iapetus, Umbriel, Algieba, Despina, Erinome, Algenib, Rasalgethi, Laomedeia, Achernar, Alnilam, Schedar, Gacrux, Pulcherrima, Achird, Zubenelgenubi, Vindemiatrix, Sadachbia, Sadaltager, SulafatThe voice to use for speech synthesis

audio

stringBase64-encoded audio data (WAV format)

API Schemas (Raw)

Input

Output