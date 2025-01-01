aura-2-es Text-to-Speech • Deepgram

@cf/deepgram/aura-2-es

Aura-2 is a context-aware text-to-speech (TTS) model that applies natural pacing, expressiveness, and fillers based on the context of the provided text. The quality of your text input directly impacts the naturalness of the audio output.

Model Info Batch Yes Partner Yes Real-time Yes Unit Pricing $0.03 per 1k chars input

Parameters

* indicates a required field

Input

speaker string default aquila Speaker used to produce the audio.

encoding string Encoding of the output audio.

container string Container specifies the file format wrapper for the output audio. The available options depend on the encoding type..

text string required The text content to be converted to speech

sample_rate number Sample Rate specifies the sample rate for the output audio. Based on the encoding, different sample rates are supported. For some encodings, the sample rate is not configurable

bit_rate number The bitrate of the audio in bits per second. Choose from predefined ranges or specific values based on the encoding type.

Output

The binding returns a ReadableStream with the image in JPEG or PNG format (check the model's output schema).

API Schemas

The following schemas are based on JSON Schema