Skip to content
Inworld logo

TTS 1.5 Max

Text-to-SpeechInworldProxied

Highest-quality text-to-speech with under 200ms latency, emotion control, and 15-language support.

Model Info
Terms and Licenselink
More informationlink

Usage

TypeScript
const response = await env.AI.run(
'inworld/tts-1.5-max',
{
text: 'Hello! Welcome to Cloudflare AI Gateway. Let me show you what we can do.',
},
{
gateway: { id: 'default' },
}
)
console.log(response)
Response 200

Examples

Slow Narration — Slower speech for narration
TypeScript
const response = await env.AI.run(
'inworld/tts-1.5-max',
{
text: 'In the beginning, the universe was a singularity of infinite density. Then, in a fraction of a second, it expanded into everything we know today.',
speaking_rate: 0.85,
},
{
gateway: { id: 'default' },
}
)
console.log(response)
Response 200
High Quality Audio — Higher sample rate for studio quality
TypeScript
const response = await env.AI.run(
'inworld/tts-1.5-max',
{
text: 'This recording is generated at studio quality for the best possible listening experience.',
sample_rate: 48000,
},
{
gateway: { id: 'default' },
}
)
console.log(response)
Response 200
With Text Normalization — Expand numbers and abbreviations before synthesis
TypeScript
const response = await env.AI.run(
'inworld/tts-1.5-max',
{
text: 'The meeting is at 3:30 PM on Jan 15th, 2026. Please confirm by calling 555-0123.',
apply_text_normalization: true,
},
{
gateway: { id: 'default' },
}
)
console.log(response)
Response 200

Parameters

text
stringrequiredmaxLength: 2000The text to be synthesized into speech. Maximum input of 2,000 characters.
voice_id
stringdefault: Dennisenum: Loretta, Darlene, Marlene, Hank, Evelyn, Celeste, Pippa, Tessa, Liam, Callum, Hamish, Abby, Graham, Rupert, Mortimer, Snik, Anjali, Saanvi, Arjun, Claire, Oliver, Simon, Elliot, James, Serena, Gareth, Vinny, Lauren, Jessica, Ethan, Tyler, Jason, Chloe, Veronica, Victoria, Miranda, Sebastian, Victor, Malcolm, Nate, Brian, Amina, Kelsey, Derek, Evan, Kayla, Jake, Grant, Tristan, Nadia, Selene, Marcus, Riley, Damon, Cedric, Mia, Naomi, Jonah, Levi, Avery, Brandon, Conrad, Bianca, Lucian, Trevor, Alex, Ashley, Craig, Deborah, Dennis, Edward, Elizabeth, Hades, Julia, Pixie, Mark, Olivia, Priya, Ronald, Sarah, Shaun, Theodore, Timothy, Wendy, Dominus, Hana, Clive, Carter, Blake, Luna, Reed, Duncan, Felix, Eleanor, SophieThe ID of the voice to use for synthesizing speech. Defaults to Dennis.
output_format
stringdefault: mp3enum: mp3, opus, wav, flacThe output format for the audio. Supported formats are mp3, opus, wav, and flac. Defaults to mp3.
bit_rate
integerminimum: 0Bits per second of the audio. Only for compressed audio formats (mp3, opus). The default is 128,000.
sample_rate
integerenum: 8000, 16000, 22050, 24000, 32000, 44100, 48000The synthesis sample rate in hertz. Accepts: 8000, 16000, 22050, 24000, 32000, 44100, 48000. The default is 48,000.
speaking_rate
numberminimum: 0.5maximum: 1.5Speaking rate/speed, in the range [0.5, 1.5]. The default is 1.0. We recommend using values above 0.8 to ensure high quality.
temperature
numberdefault: 1minimum: 0.01maximum: 2Determines the degree of randomness when sampling audio tokens. Defaults to 1.0. Accepts values between 0 (exclusive) and 2 (inclusive). Higher values = more expressive, lower values = more deterministic.
timestamp_type
stringdefault: noneenum: none, word, characterControls timestamp metadata returned with the audio. "word" returns word-level timing, "character" returns character-level timing. Note: adds latency. Defaults to none.
apply_text_normalization
booleanWhen enabled, text normalization expands numbers, dates, times, and abbreviations before converting to speech. Turning this off may reduce latency.

API Schemas

{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"text": {
"description": "The text to be synthesized into speech. Maximum input of 2,000 characters.",
"type": "string",
"maxLength": 2000
},
"voice_id": {
"description": "The ID of the voice to use for synthesizing speech. Defaults to Dennis.",
"default": "Dennis",
"type": "string",
"enum": [
"Loretta",
"Darlene",
"Marlene",
"Hank",
"Evelyn",
"Celeste",
"Pippa",
"Tessa",
"Liam",
"Callum",
"Hamish",
"Abby",
"Graham",
"Rupert",
"Mortimer",
"Snik",
"Anjali",
"Saanvi",
"Arjun",
"Claire",
"Oliver",
"Simon",
"Elliot",
"James",
"Serena",
"Gareth",
"Vinny",
"Lauren",
"Jessica",
"Ethan",
"Tyler",
"Jason",
"Chloe",
"Veronica",
"Victoria",
"Miranda",
"Sebastian",
"Victor",
"Malcolm",
"Nate",
"Brian",
"Amina",
"Kelsey",
"Derek",
"Evan",
"Kayla",
"Jake",
"Grant",
"Tristan",
"Nadia",
"Selene",
"Marcus",
"Riley",
"Damon",
"Cedric",
"Mia",
"Naomi",
"Jonah",
"Levi",
"Avery",
"Brandon",
"Conrad",
"Bianca",
"Lucian",
"Trevor",
"Alex",
"Ashley",
"Craig",
"Deborah",
"Dennis",
"Edward",
"Elizabeth",
"Hades",
"Julia",
"Pixie",
"Mark",
"Olivia",
"Priya",
"Ronald",
"Sarah",
"Shaun",
"Theodore",
"Timothy",
"Wendy",
"Dominus",
"Hana",
"Clive",
"Carter",
"Blake",
"Luna",
"Reed",
"Duncan",
"Felix",
"Eleanor",
"Sophie"
]
},
"output_format": {
"description": "The output format for the audio. Supported formats are mp3, opus, wav, and flac. Defaults to mp3.",
"default": "mp3",
"type": "string",
"enum": [
"mp3",
"opus",
"wav",
"flac"
]
},
"bit_rate": {
"description": "Bits per second of the audio. Only for compressed audio formats (mp3, opus). The default is 128,000.",
"type": "integer",
"minimum": 0
},
"sample_rate": {
"description": "The synthesis sample rate in hertz. Accepts: 8000, 16000, 22050, 24000, 32000, 44100, 48000. The default is 48,000.",
"type": "integer",
"enum": [
8000,
16000,
22050,
24000,
32000,
44100,
48000
]
},
"speaking_rate": {
"description": "Speaking rate/speed, in the range [0.5, 1.5]. The default is 1.0. We recommend using values above 0.8 to ensure high quality.",
"type": "number",
"minimum": 0.5,
"maximum": 1.5
},
"temperature": {
"description": "Determines the degree of randomness when sampling audio tokens. Defaults to 1.0. Accepts values between 0 (exclusive) and 2 (inclusive). Higher values = more expressive, lower values = more deterministic.",
"default": 1,
"type": "number",
"minimum": 0.01,
"maximum": 2
},
"timestamp_type": {
"description": "Controls timestamp metadata returned with the audio. \"word\" returns word-level timing, \"character\" returns character-level timing. Note: adds latency. Defaults to none.",
"default": "none",
"type": "string",
"enum": [
"none",
"word",
"character"
]
},
"apply_text_normalization": {
"description": "When enabled, text normalization expands numbers, dates, times, and abbreviations before converting to speech. Turning this off may reduce latency.",
"type": "boolean"
}
},
"required": [
"text"
],
"additionalProperties": false
}