Deepgram logo

nova-3

Automatic Speech RecognitionDeepgram
@cf/deepgram/nova-3

Transcribe audio using Deepgram’s speech-to-text model

Model Info
BatchYes
PartnerYes
Real-timeYes
Unit Pricing$0.0052 per audio minute

Usage

Worker

 
export default {
  async fetch(request, env, ctx): Promise<Response> {
    const URL = "https://URL_TO_MP3_FILE/audio.mp3";
    const mp3 = await fetch(URL);




    const resp = await env.AI.run("@cf/deepgram/nova-3", {
      "audio": {
        body: mp3.body,
        contentType: "audio/mpeg"
      },
      "detect_language": true
    }, {
      returnRawResponse: true
    });
    return resp;
  },
} satisfies ExportedHandler<Env>;

curl
Terminal window
curl --request POST   --url 'https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/deepgram/nova-3?detect_language=true'   --header 'Authorization: Bearer {TOKEN}'   --header 'Content-Type: audio/mpeg'   --data-binary "@/path/to/your-mp3-file.mp3"

Parameters

* indicates a required field

Input

  • audio object required

    • body object required

    • contentType string required

  • custom_topic_mode string

    Sets how the model will interpret strings submitted to the custom_topic param. When strict, the model will only return topics submitted using the custom_topic param. When extended, the model will return its own detected topics in addition to those submitted using the custom_topic param.

  • custom_topic string

    Custom topics you want the model to detect within your input audio or text if present Submit up to 100

  • custom_intent_mode string

    Sets how the model will interpret intents submitted to the custom_intent param. When strict, the model will only return intents submitted using the custom_intent param. When extended, the model will return its own detected intents in addition those submitted using the custom_intents param

  • custom_intent string

    Custom intents you want the model to detect within your input audio if present

  • detect_entities boolean

    Identifies and extracts key entities from content in submitted audio

  • detect_language boolean

    Identifies the dominant language spoken in submitted audio

  • diarize boolean

    Recognize speaker changes. Each word in the transcript will be assigned a speaker number starting at 0

  • dictation boolean

    Identify and extract key entities from content in submitted audio

  • encoding string

    Specify the expected encoding of your submitted audio

  • extra string

    Arbitrary key-value pairs that are attached to the API response for usage in downstream processing

  • filter_words boolean

    Filler Words can help transcribe interruptions in your audio, like 'uh' and 'um'

  • keyterm string

    Key term prompting can boost or suppress specialized terminology and brands.

  • keywords string

    Keywords can boost or suppress specialized terminology and brands.

  • language string

    The BCP-47 language tag that hints at the primary spoken language. Depending on the Model and API endpoint you choose only certain languages are available.

  • measurements boolean

    Spoken measurements will be converted to their corresponding abbreviations.

  • mip_opt_out boolean

    Opts out requests from the Deepgram Model Improvement Program. Refer to our Docs for pricing impacts before setting this to true. https://dpgr.am/deepgram-mip.

  • mode string

    Mode of operation for the model representing broad area of topic that will be talked about in the supplied audio

  • multichannel boolean

    Transcribe each audio channel independently.

  • numerals boolean

    Numerals converts numbers from written format to numerical format.

  • paragraphs boolean

    Splits audio into paragraphs to improve transcript readability.

  • profanity_filter boolean

    Profanity Filter looks for recognized profanity and converts it to the nearest recognized non-profane word or removes it from the transcript completely.

  • punctuate boolean

    Add punctuation and capitalization to the transcript.

  • redact string

    Redaction removes sensitive information from your transcripts.

  • replace string

    Search for terms or phrases in submitted audio and replaces them.

  • search string

    Search for terms or phrases in submitted audio.

  • sentiment boolean

    Recognizes the sentiment throughout a transcript or text.

  • smart_format boolean

    Apply formatting to transcript output. When set to true, additional formatting will be applied to transcripts to improve readability.

  • topics boolean

    Detect topics throughout a transcript or text.

  • utterances boolean

    Segments speech into meaningful semantic units.

  • utt_split number

    Seconds to wait before detecting a pause between words in submitted audio.

Output

  • results object

    • channels array

      • items object

        • alternatives array

          • items object

            • confidence number

            • transcript string

            • words array

              • items object

                • confidence number

                • end number

                • start number

                • word string

    • summary object

      • result string

      • short string

    • sentiments object

      • segments array

        • items object

          • text string

          • start_word number

          • end_word number

          • sentiment string

          • sentiment_score number

      • average object

        • sentiment string

        • sentiment_score number

API Schemas

The following schemas are based on JSON Schema

{
    "type": "object",
    "properties": {
        "audio": {
            "type": "object",
            "properties": {
                "body": {
                    "type": "object"
                },
                "contentType": {
                    "type": "string"
                }
            },
            "required": [
                "body",
                "contentType"
            ]
        },
        "custom_topic_mode": {
            "type": "string",
            "enum": [
                "extended",
                "strict"
            ],
            "description": "Sets how the model will interpret strings submitted to the custom_topic param. When strict, the model will only return topics submitted using the custom_topic param. When extended, the model will return its own detected topics in addition to those submitted using the custom_topic param."
        },
        "custom_topic": {
            "type": "string",
            "description": "Custom topics you want the model to detect within your input audio or text if present Submit up to 100"
        },
        "custom_intent_mode": {
            "type": "string",
            "description": "Sets how the model will interpret intents submitted to the custom_intent param. When strict, the model will only return intents submitted using the custom_intent param. When extended, the model will return its own detected intents in addition those submitted using the custom_intents param",
            "enum": [
                "extended",
                "strict"
            ]
        },
        "custom_intent": {
            "type": "string",
            "description": "Custom intents you want the model to detect within your input audio if present"
        },
        "detect_entities": {
            "type": "boolean",
            "description": "Identifies and extracts key entities from content in submitted audio"
        },
        "detect_language": {
            "type": "boolean",
            "description": "Identifies the dominant language spoken in submitted audio"
        },
        "diarize": {
            "type": "boolean",
            "description": "Recognize speaker changes. Each word in the transcript will be assigned a speaker number starting at 0"
        },
        "dictation": {
            "type": "boolean",
            "description": "Identify and extract key entities from content in submitted audio"
        },
        "encoding": {
            "type": "string",
            "description": "Specify the expected encoding of your submitted audio",
            "enum": [
                "linear16",
                "flac",
                "mulaw",
                "amr-nb",
                "amr-wb",
                "opus",
                "speex",
                "g729"
            ]
        },
        "extra": {
            "type": "string",
            "description": "Arbitrary key-value pairs that are attached to the API response for usage in downstream processing"
        },
        "filter_words": {
            "type": "boolean",
            "description": "Filler Words can help transcribe interruptions in your audio, like 'uh' and 'um'"
        },
        "keyterm": {
            "type": "string",
            "description": "Key term prompting can boost or suppress specialized terminology and brands."
        },
        "keywords": {
            "type": "string",
            "description": "Keywords can boost or suppress specialized terminology and brands."
        },
        "language": {
            "type": "string",
            "description": "The BCP-47 language tag that hints at the primary spoken language. Depending on the Model and API endpoint you choose only certain languages are available."
        },
        "measurements": {
            "type": "boolean",
            "description": "Spoken measurements will be converted to their corresponding abbreviations."
        },
        "mip_opt_out": {
            "type": "boolean",
            "description": "Opts out requests from the Deepgram Model Improvement Program. Refer to our Docs for pricing impacts before setting this to true. https://dpgr.am/deepgram-mip."
        },
        "mode": {
            "type": "string",
            "description": "Mode of operation for the model representing broad area of topic that will be talked about in the supplied audio",
            "enum": [
                "general",
                "medical",
                "finance"
            ]
        },
        "multichannel": {
            "type": "boolean",
            "description": "Transcribe each audio channel independently."
        },
        "numerals": {
            "type": "boolean",
            "description": "Numerals converts numbers from written format to numerical format."
        },
        "paragraphs": {
            "type": "boolean",
            "description": "Splits audio into paragraphs to improve transcript readability."
        },
        "profanity_filter": {
            "type": "boolean",
            "description": "Profanity Filter looks for recognized profanity and converts it to the nearest recognized non-profane word or removes it from the transcript completely."
        },
        "punctuate": {
            "type": "boolean",
            "description": "Add punctuation and capitalization to the transcript."
        },
        "redact": {
            "type": "string",
            "description": "Redaction removes sensitive information from your transcripts."
        },
        "replace": {
            "type": "string",
            "description": "Search for terms or phrases in submitted audio and replaces them."
        },
        "search": {
            "type": "string",
            "description": "Search for terms or phrases in submitted audio."
        },
        "sentiment": {
            "type": "boolean",
            "description": "Recognizes the sentiment throughout a transcript or text."
        },
        "smart_format": {
            "type": "boolean",
            "description": "Apply formatting to transcript output. When set to true, additional formatting will be applied to transcripts to improve readability."
        },
        "topics": {
            "type": "boolean",
            "description": "Detect topics throughout a transcript or text."
        },
        "utterances": {
            "type": "boolean",
            "description": "Segments speech into meaningful semantic units."
        },
        "utt_split": {
            "type": "number",
            "description": "Seconds to wait before detecting a pause between words in submitted audio."
        }
    },
    "required": [
        "audio"
    ]
}