nova-3
Automatic Speech Recognition • DeepgramTranscribe audio using Deepgram’s speech-to-text model
Model Info | |
---|---|
Batch | Yes |
Partner | Yes |
Real-time | Yes |
Unit Pricing | $0.0052 per audio minute |
Usage
Worker
export default { async fetch(request, env, ctx): Promise<Response> { const URL = "https://URL_TO_MP3_FILE/audio.mp3"; const mp3 = await fetch(URL);
const resp = await env.AI.run("@cf/deepgram/nova-3", { "audio": { body: mp3.body, contentType: "audio/mpeg" }, "detect_language": true }, { returnRawResponse: true }); return resp; },} satisfies ExportedHandler<Env>;
curl
curl --request POST --url 'https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/deepgram/nova-3?detect_language=true' --header 'Authorization: Bearer {TOKEN}' --header 'Content-Type: audio/mpeg' --data-binary "@/path/to/your-mp3-file.mp3"
Parameters
* indicates a required field
Input
-
audio
object required-
body
object required -
contentType
string required
-
-
custom_topic_mode
stringSets how the model will interpret strings submitted to the custom_topic param. When strict, the model will only return topics submitted using the custom_topic param. When extended, the model will return its own detected topics in addition to those submitted using the custom_topic param.
-
custom_topic
stringCustom topics you want the model to detect within your input audio or text if present Submit up to 100
-
custom_intent_mode
stringSets how the model will interpret intents submitted to the custom_intent param. When strict, the model will only return intents submitted using the custom_intent param. When extended, the model will return its own detected intents in addition those submitted using the custom_intents param
-
custom_intent
stringCustom intents you want the model to detect within your input audio if present
-
detect_entities
booleanIdentifies and extracts key entities from content in submitted audio
-
detect_language
booleanIdentifies the dominant language spoken in submitted audio
-
diarize
booleanRecognize speaker changes. Each word in the transcript will be assigned a speaker number starting at 0
-
dictation
booleanIdentify and extract key entities from content in submitted audio
-
encoding
stringSpecify the expected encoding of your submitted audio
-
extra
stringArbitrary key-value pairs that are attached to the API response for usage in downstream processing
-
filter_words
booleanFiller Words can help transcribe interruptions in your audio, like 'uh' and 'um'
-
keyterm
stringKey term prompting can boost or suppress specialized terminology and brands.
-
keywords
stringKeywords can boost or suppress specialized terminology and brands.
-
language
stringThe BCP-47 language tag that hints at the primary spoken language. Depending on the Model and API endpoint you choose only certain languages are available.
-
measurements
booleanSpoken measurements will be converted to their corresponding abbreviations.
-
mip_opt_out
booleanOpts out requests from the Deepgram Model Improvement Program. Refer to our Docs for pricing impacts before setting this to true. https://dpgr.am/deepgram-mip.
-
mode
stringMode of operation for the model representing broad area of topic that will be talked about in the supplied audio
-
multichannel
booleanTranscribe each audio channel independently.
-
numerals
booleanNumerals converts numbers from written format to numerical format.
-
paragraphs
booleanSplits audio into paragraphs to improve transcript readability.
-
profanity_filter
booleanProfanity Filter looks for recognized profanity and converts it to the nearest recognized non-profane word or removes it from the transcript completely.
-
punctuate
booleanAdd punctuation and capitalization to the transcript.
-
redact
stringRedaction removes sensitive information from your transcripts.
-
replace
stringSearch for terms or phrases in submitted audio and replaces them.
-
search
stringSearch for terms or phrases in submitted audio.
-
sentiment
booleanRecognizes the sentiment throughout a transcript or text.
-
smart_format
booleanApply formatting to transcript output. When set to true, additional formatting will be applied to transcripts to improve readability.
-
topics
booleanDetect topics throughout a transcript or text.
-
utterances
booleanSegments speech into meaningful semantic units.
-
utt_split
numberSeconds to wait before detecting a pause between words in submitted audio.
Output
-
results
object-
channels
array-
items
object-
alternatives
array-
items
object-
confidence
number -
transcript
string -
words
array-
items
object-
confidence
number -
end
number -
start
number -
word
string
-
-
-
-
-
-
-
summary
object-
result
string -
short
string
-
-
sentiments
object-
segments
array-
items
object-
text
string -
start_word
number -
end_word
number -
sentiment
string -
sentiment_score
number
-
-
-
average
object-
sentiment
string -
sentiment_score
number
-
-
-
API Schemas
The following schemas are based on JSON Schema
{ "type": "object", "properties": { "audio": { "type": "object", "properties": { "body": { "type": "object" }, "contentType": { "type": "string" } }, "required": [ "body", "contentType" ] }, "custom_topic_mode": { "type": "string", "enum": [ "extended", "strict" ], "description": "Sets how the model will interpret strings submitted to the custom_topic param. When strict, the model will only return topics submitted using the custom_topic param. When extended, the model will return its own detected topics in addition to those submitted using the custom_topic param." }, "custom_topic": { "type": "string", "description": "Custom topics you want the model to detect within your input audio or text if present Submit up to 100" }, "custom_intent_mode": { "type": "string", "description": "Sets how the model will interpret intents submitted to the custom_intent param. When strict, the model will only return intents submitted using the custom_intent param. When extended, the model will return its own detected intents in addition those submitted using the custom_intents param", "enum": [ "extended", "strict" ] }, "custom_intent": { "type": "string", "description": "Custom intents you want the model to detect within your input audio if present" }, "detect_entities": { "type": "boolean", "description": "Identifies and extracts key entities from content in submitted audio" }, "detect_language": { "type": "boolean", "description": "Identifies the dominant language spoken in submitted audio" }, "diarize": { "type": "boolean", "description": "Recognize speaker changes. Each word in the transcript will be assigned a speaker number starting at 0" }, "dictation": { "type": "boolean", "description": "Identify and extract key entities from content in submitted audio" }, "encoding": { "type": "string", "description": "Specify the expected encoding of your submitted audio", "enum": [ "linear16", "flac", "mulaw", "amr-nb", "amr-wb", "opus", "speex", "g729" ] }, "extra": { "type": "string", "description": "Arbitrary key-value pairs that are attached to the API response for usage in downstream processing" }, "filter_words": { "type": "boolean", "description": "Filler Words can help transcribe interruptions in your audio, like 'uh' and 'um'" }, "keyterm": { "type": "string", "description": "Key term prompting can boost or suppress specialized terminology and brands." }, "keywords": { "type": "string", "description": "Keywords can boost or suppress specialized terminology and brands." }, "language": { "type": "string", "description": "The BCP-47 language tag that hints at the primary spoken language. Depending on the Model and API endpoint you choose only certain languages are available." }, "measurements": { "type": "boolean", "description": "Spoken measurements will be converted to their corresponding abbreviations." }, "mip_opt_out": { "type": "boolean", "description": "Opts out requests from the Deepgram Model Improvement Program. Refer to our Docs for pricing impacts before setting this to true. https://dpgr.am/deepgram-mip." }, "mode": { "type": "string", "description": "Mode of operation for the model representing broad area of topic that will be talked about in the supplied audio", "enum": [ "general", "medical", "finance" ] }, "multichannel": { "type": "boolean", "description": "Transcribe each audio channel independently." }, "numerals": { "type": "boolean", "description": "Numerals converts numbers from written format to numerical format." }, "paragraphs": { "type": "boolean", "description": "Splits audio into paragraphs to improve transcript readability." }, "profanity_filter": { "type": "boolean", "description": "Profanity Filter looks for recognized profanity and converts it to the nearest recognized non-profane word or removes it from the transcript completely." }, "punctuate": { "type": "boolean", "description": "Add punctuation and capitalization to the transcript." }, "redact": { "type": "string", "description": "Redaction removes sensitive information from your transcripts." }, "replace": { "type": "string", "description": "Search for terms or phrases in submitted audio and replaces them." }, "search": { "type": "string", "description": "Search for terms or phrases in submitted audio." }, "sentiment": { "type": "boolean", "description": "Recognizes the sentiment throughout a transcript or text." }, "smart_format": { "type": "boolean", "description": "Apply formatting to transcript output. When set to true, additional formatting will be applied to transcripts to improve readability." }, "topics": { "type": "boolean", "description": "Detect topics throughout a transcript or text." }, "utterances": { "type": "boolean", "description": "Segments speech into meaningful semantic units." }, "utt_split": { "type": "number", "description": "Seconds to wait before detecting a pause between words in submitted audio." } }, "required": [ "audio" ]}
{ "type": "object", "contentType": "application/json", "properties": { "results": { "type": "object", "properties": { "channels": { "type": "array", "items": { "type": "object", "properties": { "alternatives": { "type": "array", "items": { "type": "object", "properties": { "confidence": { "type": "number" }, "transcript": { "type": "string" }, "words": { "type": "array", "items": { "type": "object", "properties": { "confidence": { "type": "number" }, "end": { "type": "number" }, "start": { "type": "number" }, "word": { "type": "string" } } } } } } } } } }, "summary": { "type": "object", "properties": { "result": { "type": "string" }, "short": { "type": "string" } } }, "sentiments": { "type": "object", "properties": { "segments": { "type": "array", "items": { "type": "object", "properties": { "text": { "type": "string" }, "start_word": { "type": "number" }, "end_word": { "type": "number" }, "sentiment": { "type": "string" }, "sentiment_score": { "type": "number" } } } }, "average": { "type": "object", "properties": { "sentiment": { "type": "string" }, "sentiment_score": { "type": "number" } } } } } } } }}
Was this helpful?
- Resources
- API
- New to Cloudflare?
- Directory
- Sponsorships
- Open Source
- Support
- Help Center
- System Status
- Compliance
- GDPR
- Company
- cloudflare.com
- Our team
- Careers
- © 2025 Cloudflare, Inc.
- Privacy Policy
- Terms of Use
- Report Security Issues
- Trademark
-