TTS 1.5 Max

Text-to-Speech • Inworld • Proxied

Highest-quality text-to-speech with under 200ms latency, emotion control, and 15-language support.

Model Info
Terms and License	link ↗
More information	link ↗

Usage

const response = await env.AI.run(
  'inworld/tts-1.5-max',
  {
    text: 'Hello! Welcome to Cloudflare AI Gateway. Let me show you what we can do.',
  },
  {
    gateway: { id: 'default' },
  }
)
console.log(response)

Response 200

Examples

Slow Narration — Slower speech for narration

const response = await env.AI.run(
  'inworld/tts-1.5-max',
  {
    text: 'In the beginning, the universe was a singularity of infinite density. Then, in a fraction of a second, it expanded into everything we know today.',
    speaking_rate: 0.85,
  },
  {
    gateway: { id: 'default' },
  }
)
console.log(response)

Response 200

High Quality Audio — Higher sample rate for studio quality

const response = await env.AI.run(
  'inworld/tts-1.5-max',
  {
    text: 'This recording is generated at studio quality for the best possible listening experience.',
    sample_rate: 48000,
  },
  {
    gateway: { id: 'default' },
  }
)
console.log(response)

Response 200

With Text Normalization — Expand numbers and abbreviations before synthesis

const response = await env.AI.run(
  'inworld/tts-1.5-max',
  {
    text: 'The meeting is at 3:30 PM on Jan 15th, 2026. Please confirm by calling 555-0123.',
    apply_text_normalization: true,
  },
  {
    gateway: { id: 'default' },
  }
)
console.log(response)

Response 200

text

stringrequiredmaxLength: 2000The text to be synthesized into speech. Maximum input of 2,000 characters.

voice_id

stringdefault: Dennisenum: Loretta, Darlene, Marlene, Hank, Evelyn, Celeste, Pippa, Tessa, Liam, Callum, Hamish, Abby, Graham, Rupert, Mortimer, Snik, Anjali, Saanvi, Arjun, Claire, Oliver, Simon, Elliot, James, Serena, Gareth, Vinny, Lauren, Jessica, Ethan, Tyler, Jason, Chloe, Veronica, Victoria, Miranda, Sebastian, Victor, Malcolm, Nate, Brian, Amina, Kelsey, Derek, Evan, Kayla, Jake, Grant, Tristan, Nadia, Selene, Marcus, Riley, Damon, Cedric, Mia, Naomi, Jonah, Levi, Avery, Brandon, Conrad, Bianca, Lucian, Trevor, Alex, Ashley, Craig, Deborah, Dennis, Edward, Elizabeth, Hades, Julia, Pixie, Mark, Olivia, Priya, Ronald, Sarah, Shaun, Theodore, Timothy, Wendy, Dominus, Hana, Clive, Carter, Blake, Luna, Reed, Duncan, Felix, Eleanor, SophieThe ID of the voice to use for synthesizing speech. Defaults to Dennis.

output_format

stringdefault: mp3enum: mp3, opus, wav, flacThe output format for the audio. Supported formats are mp3, opus, wav, and flac. Defaults to mp3.

bit_rate

integerminimum: 0Bits per second of the audio. Only for compressed audio formats (mp3, opus). The default is 128,000.

sample_rate

integerenum: 8000, 16000, 22050, 24000, 32000, 44100, 48000The synthesis sample rate in hertz. Accepts: 8000, 16000, 22050, 24000, 32000, 44100, 48000. The default is 48,000.

speaking_rate

numberminimum: 0.5maximum: 1.5Speaking rate/speed, in the range [0.5, 1.5]. The default is 1.0. We recommend using values above 0.8 to ensure high quality.

temperature

numberdefault: 1minimum: 0.01maximum: 2Determines the degree of randomness when sampling audio tokens. Defaults to 1.0. Accepts values between 0 (exclusive) and 2 (inclusive). Higher values = more expressive, lower values = more deterministic.

timestamp_type

stringdefault: noneenum: none, word, characterControls timestamp metadata returned with the audio. "word" returns word-level timing, "character" returns character-level timing. Note: adds latency. Defaults to none.

apply_text_normalization

booleanWhen enabled, text normalization expands numbers, dates, times, and abbreviations before converting to speech. Turning this off may reduce latency.

audio

stringURL to the generated audio file

API Schemas

Input
Output

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "text": {
      "description": "The text to be synthesized into speech. Maximum input of 2,000 characters.",
      "type": "string",
      "maxLength": 2000
    },
    "voice_id": {
      "description": "The ID of the voice to use for synthesizing speech. Defaults to Dennis.",
      "default": "Dennis",
      "type": "string",
      "enum": [
        "Loretta",
        "Darlene",
        "Marlene",
        "Hank",
        "Evelyn",
        "Celeste",
        "Pippa",
        "Tessa",
        "Liam",
        "Callum",
        "Hamish",
        "Abby",
        "Graham",
        "Rupert",
        "Mortimer",
        "Snik",
        "Anjali",
        "Saanvi",
        "Arjun",
        "Claire",
        "Oliver",
        "Simon",
        "Elliot",
        "James",
        "Serena",
        "Gareth",
        "Vinny",
        "Lauren",
        "Jessica",
        "Ethan",
        "Tyler",
        "Jason",
        "Chloe",
        "Veronica",
        "Victoria",
        "Miranda",
        "Sebastian",
        "Victor",
        "Malcolm",
        "Nate",
        "Brian",
        "Amina",
        "Kelsey",
        "Derek",
        "Evan",
        "Kayla",
        "Jake",
        "Grant",
        "Tristan",
        "Nadia",
        "Selene",
        "Marcus",
        "Riley",
        "Damon",
        "Cedric",
        "Mia",
        "Naomi",
        "Jonah",
        "Levi",
        "Avery",
        "Brandon",
        "Conrad",
        "Bianca",
        "Lucian",
        "Trevor",
        "Alex",
        "Ashley",
        "Craig",
        "Deborah",
        "Dennis",
        "Edward",
        "Elizabeth",
        "Hades",
        "Julia",
        "Pixie",
        "Mark",
        "Olivia",
        "Priya",
        "Ronald",
        "Sarah",
        "Shaun",
        "Theodore",
        "Timothy",
        "Wendy",
        "Dominus",
        "Hana",
        "Clive",
        "Carter",
        "Blake",
        "Luna",
        "Reed",
        "Duncan",
        "Felix",
        "Eleanor",
        "Sophie"
      ]
    },
    "output_format": {
      "description": "The output format for the audio. Supported formats are mp3, opus, wav, and flac. Defaults to mp3.",
      "default": "mp3",
      "type": "string",
      "enum": [
        "mp3",
        "opus",
        "wav",
        "flac"
      ]
    },
    "bit_rate": {
      "description": "Bits per second of the audio. Only for compressed audio formats (mp3, opus). The default is 128,000.",
      "type": "integer",
      "minimum": 0
    },
    "sample_rate": {
      "description": "The synthesis sample rate in hertz. Accepts: 8000, 16000, 22050, 24000, 32000, 44100, 48000. The default is 48,000.",
      "type": "integer",
      "enum": [
        8000,
        16000,
        22050,
        24000,
        32000,
        44100,
        48000
      ]
    },
    "speaking_rate": {
      "description": "Speaking rate/speed, in the range [0.5, 1.5]. The default is 1.0. We recommend using values above 0.8 to ensure high quality.",
      "type": "number",
      "minimum": 0.5,
      "maximum": 1.5
    },
    "temperature": {
      "description": "Determines the degree of randomness when sampling audio tokens. Defaults to 1.0. Accepts values between 0 (exclusive) and 2 (inclusive). Higher values = more expressive, lower values = more deterministic.",
      "default": 1,
      "type": "number",
      "minimum": 0.01,
      "maximum": 2
    },
    "timestamp_type": {
      "description": "Controls timestamp metadata returned with the audio. \"word\" returns word-level timing, \"character\" returns character-level timing. Note: adds latency. Defaults to none.",
      "default": "none",
      "type": "string",
      "enum": [
        "none",
        "word",
        "character"
      ]
    },
    "apply_text_normalization": {
      "description": "When enabled, text normalization expands numbers, dates, times, and abbreviations before converting to speech. Turning this off may reduce latency.",
      "type": "boolean"
    }
  },
  "required": [
    "text"
  ],
  "additionalProperties": false
}

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "audio": {
      "description": "URL to the generated audio file",
      "type": "string"
    }
  },
  "required": [
    "audio"
  ],
  "additionalProperties": false
}