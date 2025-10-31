 Skip to content
Automatic Speech RecognitionDeepgram
@cf/deepgram/flux

Flux is the first conversational speech recognition model built specifically for voice agents.

Model Info
PartnerYes
Real-timeYes

Usage

Step 1: Create a Worker that establishes a WebSocket connection
TypeScript
export default {
  async fetch(request, env, ctx): Promise<Response> {
    const resp = await env.AI.run("@cf/deepgram/flux", {
      encoding: "linear16",
      sample_rate: "16000"
    }, {
      websocket: true
    });
    return resp;
  },
} satisfies ExportedHandler<Env>;

Step 2: Deploy your Worker
Terminal window
npx wrangler deploy

Step 3: Write a client script to connect to your Worker and send audio
JavaScript
const ws = new WebSocket('wss://<your-worker-url.com>');


ws.onopen = () => {
  console.log('Connected to WebSocket');


  // Generate and send random audio bytes
  // You can replace this part with a function
  // that reads from your mic or other audio source
  const audioData = generateRandomAudio();
  ws.send(audioData);
  console.log('Audio data sent');
};


ws.onmessage = (event) => {
  // Transcription will be received here
  // Add your custom logic to parse the data
  console.log('Received:', event.data);
};


ws.onerror = (error) => {
  console.error('WebSocket error:', error);
};


ws.onclose = () => {
  console.log('WebSocket closed');
};


// Generate random audio data (1 second of noise at 44.1kHz, mono)
function generateRandomAudio() {
  const sampleRate = 44100;
  const duration = 1;
  const numSamples = sampleRate * duration;
  const buffer = new ArrayBuffer(numSamples * 2);
  const view = new Int16Array(buffer);


  for (let i = 0; i < numSamples; i++) {
    view[i] = Math.floor(Math.random() * 65536 - 32768);
  }


  return buffer;
}

Parameters

* indicates a required field

Input

  • encoding string

    Encoding of the audio stream. Currently only supports raw signed little-endian 16-bit PCM.

  • sample_rate string

    Sample rate of the audio stream in Hz.

  • eager_eot_threshold string

    End-of-turn confidence required to fire an eager end-of-turn event. When set, enables EagerEndOfTurn and TurnResumed events. Valid Values 0.3 - 0.9.

  • eot_threshold string default 0.7

    End-of-turn confidence required to finish a turn. Valid Values 0.5 - 0.9.

  • eot_timeout_ms string default 5000

    A turn will be finished when this much time has passed after speech, regardless of EOT confidence.

  • keyterm string

    Keyterm prompting can improve recognition of specialized terminology. Pass multiple keyterm query parameters to boost multiple keyterms.

  • mip_opt_out string default false

    Opts out requests from the Deepgram Model Improvement Program. Refer to Deepgram Docs for pricing impacts before setting this to true. https://dpgr.am/deepgram-mip

  • tag string

    Label your requests for the purpose of identification during usage reporting

Output

  • request_id string

    The unique identifier of the request (uuid)

  • sequence_id integer min 0

    Starts at 0 and increments for each message the server sends to the client.

  • event string

    The type of event being reported.

  • turn_index integer min 0

    The index of the current turn

  • audio_window_start number

    Start time in seconds of the audio range that was transcribed

  • audio_window_end number

    End time in seconds of the audio range that was transcribed

  • transcript string

    Text that was said over the course of the current turn

  • words array

    The words in the transcript

    • items object

      • word string required

        The individual punctuated, properly-cased word from the transcript

      • confidence number required

        Confidence that this word was transcribed correctly

  • end_of_turn_confidence number

    Confidence that no more speech is coming in this turn

API Schemas

The following schemas are based on JSON Schema

{
    "type": "object",
    "properties": {
        "encoding": {
            "type": "string",
            "description": "Encoding of the audio stream. Currently only supports raw signed little-endian 16-bit PCM.",
            "enum": [
                "linear16"
            ]
        },
        "sample_rate": {
            "type": "string",
            "description": "Sample rate of the audio stream in Hz.",
            "pattern": "^[0-9]+$"
        },
        "eager_eot_threshold": {
            "type": "string",
            "description": "End-of-turn confidence required to fire an eager end-of-turn event. When set, enables EagerEndOfTurn and TurnResumed events. Valid Values 0.3 - 0.9."
        },
        "eot_threshold": {
            "type": "string",
            "description": "End-of-turn confidence required to finish a turn. Valid Values 0.5 - 0.9.",
            "default": "0.7"
        },
        "eot_timeout_ms": {
            "type": "string",
            "description": "A turn will be finished when this much time has passed after speech, regardless of EOT confidence.",
            "default": "5000",
            "pattern": "^[0-9]+$"
        },
        "keyterm": {
            "type": "string",
            "description": "Keyterm prompting can improve recognition of specialized terminology. Pass multiple keyterm query parameters to boost multiple keyterms."
        },
        "mip_opt_out": {
            "type": "string",
            "description": "Opts out requests from the Deepgram Model Improvement Program. Refer to Deepgram Docs for pricing impacts before setting this to true. https://dpgr.am/deepgram-mip",
            "enum": [
                "true",
                "false"
            ],
            "default": "false"
        },
        "tag": {
            "type": "string",
            "description": "Label your requests for the purpose of identification during usage reporting"
        },
        "required": [
            "sample_rate",
            "encoding"
        ]
    }
}