nova-3

Automatic Speech Recognition • Deepgram

Transcribe audio using Deepgram’s speech-to-text model

Model Info
Terms and License	link ↗
Batch	Yes
Partner	Yes
Real-time	Yes
Unit Pricing	$0.0052 per audio minute, $0.0092 per audio minute (websocket)

Supported languages

Nova-3 on Workers AI supports the following languages for transcription:

Language	Code(s)
English	`en`, `en-US`, `en-AU`, `en-GB`, `en-IN`, `en-NZ`
Spanish	`es`, `es-419`
French	`fr`, `fr-CA`
German	`de`, `de-CH`
Hindi	`hi`
Russian	`ru`
Portuguese	`pt`, `pt-BR`, `pt-PT`
Japanese	`ja`
Italian	`it`
Dutch	`nl`

Use multi for automatic multilingual detection across all of the languages listed above.

If no language is specified, the model defaults to en-US. For best accuracy, explicitly set the language code matching your audio.

export default {
  async fetch(request, env, ctx): Promise<Response> {
    const URL = "https://URL_TO_MP3_FILE/audio.mp3";
    const mp3 = await fetch(URL);


    const resp = await env.AI.run("@cf/deepgram/nova-3", {
      "audio": {
        body: mp3.body,
        contentType: "audio/mpeg"
      },
      "detect_language": true
    }, {
      returnRawResponse: true
    });
    return resp;
  },
} satisfies ExportedHandler<Env>;

curl --request POST   --url 'https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/deepgram/nova-3?detect_language=true'   --header 'Authorization: Bearer {TOKEN}'   --header 'Content-Type: audio/mpeg'   --data-binary "@/path/to/your-mp3-file.mp3"

Parameters

Input
Output

▶audio{}

objectrequired

custom_topic_mode

stringenum: extended, strictSets how the model will interpret strings submitted to the custom_topic param. When strict, the model will only return topics submitted using the custom_topic param. When extended, the model will return its own detected topics in addition to those submitted using the custom_topic param.

custom_topic

stringCustom topics you want the model to detect within your input audio or text if present Submit up to 100

custom_intent_mode

stringenum: extended, strictSets how the model will interpret intents submitted to the custom_intent param. When strict, the model will only return intents submitted using the custom_intent param. When extended, the model will return its own detected intents in addition those submitted using the custom_intents param

custom_intent

stringCustom intents you want the model to detect within your input audio if present

detect_entities

booleanIdentifies and extracts key entities from content in submitted audio

detect_language

booleanIdentifies the dominant language spoken in submitted audio

diarize

booleanRecognize speaker changes. Each word in the transcript will be assigned a speaker number starting at 0

dictation

booleanIdentify and extract key entities from content in submitted audio

encoding

stringenum: linear16, flac, mulaw, amr-nb, amr-wb, opus, speex, g729Specify the expected encoding of your submitted audio

extra

stringArbitrary key-value pairs that are attached to the API response for usage in downstream processing

filler_words

booleanFiller Words can help transcribe interruptions in your audio, like 'uh' and 'um'

keyterm

stringKey term prompting can boost or suppress specialized terminology and brands.

keywords

stringKeywords can boost or suppress specialized terminology and brands.

language

stringThe BCP-47 language tag that hints at the primary spoken language. Depending on the Model and API endpoint you choose only certain languages are available.

measurements

booleanSpoken measurements will be converted to their corresponding abbreviations.

mip_opt_out

booleanOpts out requests from the Deepgram Model Improvement Program. Refer to our Docs for pricing impacts before setting this to true. https://dpgr.am/deepgram-mip.

mode

stringenum: general, medical, financeMode of operation for the model representing broad area of topic that will be talked about in the supplied audio

multichannel

booleanTranscribe each audio channel independently.

numerals

booleanNumerals converts numbers from written format to numerical format.

paragraphs

booleanSplits audio into paragraphs to improve transcript readability.

profanity_filter

booleanProfanity Filter looks for recognized profanity and converts it to the nearest recognized non-profane word or removes it from the transcript completely.

punctuate

booleanAdd punctuation and capitalization to the transcript.

redact

stringRedaction removes sensitive information from your transcripts.

replace

stringSearch for terms or phrases in submitted audio and replaces them.

stringSearch for terms or phrases in submitted audio.

sentiment

booleanRecognizes the sentiment throughout a transcript or text.

smart_format

booleanApply formatting to transcript output. When set to true, additional formatting will be applied to transcripts to improve readability.

topics

booleanDetect topics throughout a transcript or text.

utterances

booleanSegments speech into meaningful semantic units.

utt_split

numberSeconds to wait before detecting a pause between words in submitted audio.

channels

numberThe number of channels in the submitted audio

interim_results

booleanSpecifies whether the streaming endpoint should provide ongoing transcription updates as more audio is received. When set to true, the endpoint sends continuous updates, meaning transcription results may evolve over time. Note: Supported only for webosockets.

endpointing

stringIndicates how long model will wait to detect whether a speaker has finished speaking or pauses for a significant period of time. When set to a value, the streaming endpoint immediately finalizes the transcription for the processed time range and returns the transcript with a speech_final parameter set to true. Can also be set to false to disable endpointing

vad_events

booleanIndicates that speech has started. You'll begin receiving Speech Started messages upon speech starting. Note: Supported only for webosockets.

utterance_end_ms

booleanIndicates how long model will wait to send an UtteranceEnd message after a word has been transcribed. Use with interim_results. Note: Supported only for webosockets.

▶results{}

object

API Schemas (Raw)

Input

Output