Skip to content
Cloudflare Docs
Deepgram logo

nova-3

Automatic Speech RecognitionDeepgram
@cf/deepgram/nova-3

Transcribe audio using Deepgram’s speech-to-text model

Model Info
BatchYes
PartnerYes
Real-timeYes
Unit Pricing$0.0052 per audio minute

Usage

Worker

export default {
async fetch(request, env, ctx): Promise<Response> {
const URL = "https://URL_TO_MP3_FILE/audio.mp3";
const mp3 = await fetch(URL);
const resp = await env.AI.run("@cf/deepgram/nova-3", {
"audio": {
body: mp3.body,
contentType: "audio/mpeg"
},
"detect_language": true
}, {
returnRawResponse: true
});
return resp;
},
} satisfies ExportedHandler<Env>;

curl

Terminal window
curl --request POST --url 'https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/deepgram/nova-3?detect_language=true' --header 'Authorization: Bearer {TOKEN}' --header 'Content-Type: audio/mpeg' --data-binary "@/path/to/your-mp3-file.mp3"

Parameters

* indicates a required field

Input

  • audio object required

    • body object required

    • contentType string required

  • custom_topic_mode string

    Sets how the model will interpret strings submitted to the custom_topic param. When strict, the model will only return topics submitted using the custom_topic param. When extended, the model will return its own detected topics in addition to those submitted using the custom_topic param.

  • custom_topic string

    Custom topics you want the model to detect within your input audio or text if present Submit up to 100

  • custom_intent_mode string

    Sets how the model will interpret intents submitted to the custom_intent param. When strict, the model will only return intents submitted using the custom_intent param. When extended, the model will return its own detected intents in addition those submitted using the custom_intents param

  • custom_intent string

    Custom intents you want the model to detect within your input audio if present

  • detect_entities boolean

    Identifies and extracts key entities from content in submitted audio

  • detect_language boolean

    Identifies the dominant language spoken in submitted audio

  • diarize boolean

    Recognize speaker changes. Each word in the transcript will be assigned a speaker number starting at 0

  • dictation boolean

    Identify and extract key entities from content in submitted audio

  • encoding string

    Specify the expected encoding of your submitted audio

  • extra string

    Arbitrary key-value pairs that are attached to the API response for usage in downstream processing

  • filter_words boolean

    Filler Words can help transcribe interruptions in your audio, like 'uh' and 'um'

  • keyterm string

    Key term prompting can boost or suppress specialized terminology and brands.

  • keywords string

    Keywords can boost or suppress specialized terminology and brands.

  • language string

    The BCP-47 language tag that hints at the primary spoken language. Depending on the Model and API endpoint you choose only certain languages are available.

  • measurements boolean

    Spoken measurements will be converted to their corresponding abbreviations.

  • mip_opt_out boolean

    Opts out requests from the Deepgram Model Improvement Program. Refer to our Docs for pricing impacts before setting this to true. https://dpgr.am/deepgram-mip.

  • mode string

    Mode of operation for the model representing broad area of topic that will be talked about in the supplied audio

  • multichannel boolean

    Transcribe each audio channel independently.

  • numerals boolean

    Numerals converts numbers from written format to numerical format.

  • paragraphs boolean

    Splits audio into paragraphs to improve transcript readability.

  • profanity_filter boolean

    Profanity Filter looks for recognized profanity and converts it to the nearest recognized non-profane word or removes it from the transcript completely.

  • punctuate boolean

    Add punctuation and capitalization to the transcript.

  • redact string

    Redaction removes sensitive information from your transcripts.

  • replace string

    Search for terms or phrases in submitted audio and replaces them.

  • search string

    Search for terms or phrases in submitted audio.

  • sentiment boolean

    Recognizes the sentiment throughout a transcript or text.

  • smart_format boolean

    Apply formatting to transcript output. When set to true, additional formatting will be applied to transcripts to improve readability.

  • topics boolean

    Detect topics throughout a transcript or text.

  • utterances boolean

    Segments speech into meaningful semantic units.

  • utt_split number

    Seconds to wait before detecting a pause between words in submitted audio.

Output

  • results object

    • channels array

      • items object

        • alternatives array

          • items object

            • confidence number

            • transcript string

            • words array

              • items object

                • confidence number

                • end number

                • start number

                • word string

    • summary object

      • result string

      • short string

    • sentiments object

      • segments array

        • items object

          • text string

          • start_word number

          • end_word number

          • sentiment string

          • sentiment_score number

      • average object

        • sentiment string

        • sentiment_score number

API Schemas

The following schemas are based on JSON Schema

{
"type": "object",
"properties": {
"audio": {
"type": "object",
"properties": {
"body": {
"type": "object"
},
"contentType": {
"type": "string"
}
},
"required": [
"body",
"contentType"
]
},
"custom_topic_mode": {
"type": "string",
"enum": [
"extended",
"strict"
],
"description": "Sets how the model will interpret strings submitted to the custom_topic param. When strict, the model will only return topics submitted using the custom_topic param. When extended, the model will return its own detected topics in addition to those submitted using the custom_topic param."
},
"custom_topic": {
"type": "string",
"description": "Custom topics you want the model to detect within your input audio or text if present Submit up to 100"
},
"custom_intent_mode": {
"type": "string",
"description": "Sets how the model will interpret intents submitted to the custom_intent param. When strict, the model will only return intents submitted using the custom_intent param. When extended, the model will return its own detected intents in addition those submitted using the custom_intents param",
"enum": [
"extended",
"strict"
]
},
"custom_intent": {
"type": "string",
"description": "Custom intents you want the model to detect within your input audio if present"
},
"detect_entities": {
"type": "boolean",
"description": "Identifies and extracts key entities from content in submitted audio"
},
"detect_language": {
"type": "boolean",
"description": "Identifies the dominant language spoken in submitted audio"
},
"diarize": {
"type": "boolean",
"description": "Recognize speaker changes. Each word in the transcript will be assigned a speaker number starting at 0"
},
"dictation": {
"type": "boolean",
"description": "Identify and extract key entities from content in submitted audio"
},
"encoding": {
"type": "string",
"description": "Specify the expected encoding of your submitted audio",
"enum": [
"linear16",
"flac",
"mulaw",
"amr-nb",
"amr-wb",
"opus",
"speex",
"g729"
]
},
"extra": {
"type": "string",
"description": "Arbitrary key-value pairs that are attached to the API response for usage in downstream processing"
},
"filter_words": {
"type": "boolean",
"description": "Filler Words can help transcribe interruptions in your audio, like 'uh' and 'um'"
},
"keyterm": {
"type": "string",
"description": "Key term prompting can boost or suppress specialized terminology and brands."
},
"keywords": {
"type": "string",
"description": "Keywords can boost or suppress specialized terminology and brands."
},
"language": {
"type": "string",
"description": "The BCP-47 language tag that hints at the primary spoken language. Depending on the Model and API endpoint you choose only certain languages are available."
},
"measurements": {
"type": "boolean",
"description": "Spoken measurements will be converted to their corresponding abbreviations."
},
"mip_opt_out": {
"type": "boolean",
"description": "Opts out requests from the Deepgram Model Improvement Program. Refer to our Docs for pricing impacts before setting this to true. https://dpgr.am/deepgram-mip."
},
"mode": {
"type": "string",
"description": "Mode of operation for the model representing broad area of topic that will be talked about in the supplied audio",
"enum": [
"general",
"medical",
"finance"
]
},
"multichannel": {
"type": "boolean",
"description": "Transcribe each audio channel independently."
},
"numerals": {
"type": "boolean",
"description": "Numerals converts numbers from written format to numerical format."
},
"paragraphs": {
"type": "boolean",
"description": "Splits audio into paragraphs to improve transcript readability."
},
"profanity_filter": {
"type": "boolean",
"description": "Profanity Filter looks for recognized profanity and converts it to the nearest recognized non-profane word or removes it from the transcript completely."
},
"punctuate": {
"type": "boolean",
"description": "Add punctuation and capitalization to the transcript."
},
"redact": {
"type": "string",
"description": "Redaction removes sensitive information from your transcripts."
},
"replace": {
"type": "string",
"description": "Search for terms or phrases in submitted audio and replaces them."
},
"search": {
"type": "string",
"description": "Search for terms or phrases in submitted audio."
},
"sentiment": {
"type": "boolean",
"description": "Recognizes the sentiment throughout a transcript or text."
},
"smart_format": {
"type": "boolean",
"description": "Apply formatting to transcript output. When set to true, additional formatting will be applied to transcripts to improve readability."
},
"topics": {
"type": "boolean",
"description": "Detect topics throughout a transcript or text."
},
"utterances": {
"type": "boolean",
"description": "Segments speech into meaningful semantic units."
},
"utt_split": {
"type": "number",
"description": "Seconds to wait before detecting a pause between words in submitted audio."
}
},
"required": [
"audio"
]
}