aura-1

Text-to-Speech • Deepgram

Aura is a context-aware text-to-speech (TTS) model that applies natural pacing, expressiveness, and fillers based on the context of the provided text. The quality of your text input directly impacts the naturalness of the audio output.

Model Info
Terms and License	link ↗
Batch	Yes
Partner	Yes
Real-time	Yes
Unit Pricing	$0.015 per 1k characters

export default {
  async fetch(request, env, ctx): Promise<Response> {
      const resp = await env.AI.run("@cf/deepgram/aura-1", {
        "text":"Hello World!"
      }, {
        returnRawResponse: true
      });

      return resp;
  },
} satisfies ExportedHandler<Env>;

curl --request POST   --url 'https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/deepgram/aura-1'   --header 'Authorization: Bearer {TOKEN}'   --header 'Content-Type: application/json'   --data '{
    "text":"Hello world!"
}'

Parameters

Input
Output

speaker

stringdefault: angusenum: angus, asteria, arcas, orion, orpheus, athena, luna, zeus, perseus, helios, hera, stellaSpeaker used to produce the audio.

encoding

stringenum: linear16, flac, mulaw, alaw, mp3, opus, aacEncoding of the output audio.

container

stringenum: none, wav, oggContainer specifies the file format wrapper for the output audio. The available options depend on the encoding type..

text

stringrequiredThe text content to be converted to speech

sample_rate

numberSample Rate specifies the sample rate for the output audio. Based on the encoding, different sample rates are supported. For some encodings, the sample rate is not configurable

bit_rate

numberThe bitrate of the audio in bits per second. Choose from predefined ranges or specific values based on the encoding type.

The binding returns a ReadableStream with the audio in MPEG format (check the model's output schema).

API Schemas (Raw)

Input

Output