Skip to content
Deepgram logo

flux

Automatic Speech RecognitionDeepgramHosted

Flux is the first conversational speech recognition model built specifically for voice agents.

Model Info
Terms and Licenselink
PartnerYes
Real-timeYes
Unit Pricing$0.0077 per audio minute (websocket)

Usage

Step 1: Create a Worker that establishes a WebSocket connection

TypeScript
export default {
async fetch(request, env, ctx): Promise<Response> {
const resp = await env.AI.run("@cf/deepgram/flux", {
encoding: "linear16",
sample_rate: "16000"
}, {
websocket: true
});
return resp;
},
} satisfies ExportedHandler<Env>;

Step 2: Deploy your Worker

Terminal window
npx wrangler deploy

Step 3: Write a client script to connect to your Worker and send audio

JavaScript
const ws = new WebSocket('wss://<your-worker-url.com>');
ws.onopen = () => {
console.log('Connected to WebSocket');
// Generate and send random audio bytes
// You can replace this part with a function
// that reads from your mic or other audio source
const audioData = generateRandomAudio();
ws.send(audioData);
console.log('Audio data sent');
};
ws.onmessage = (event) => {
// Transcription will be received here
// Add your custom logic to parse the data
console.log('Received:', event.data);
};
ws.onerror = (error) => {
console.error('WebSocket error:', error);
};
ws.onclose = () => {
console.log('WebSocket closed');
};
// Generate random audio data (1 second of noise at 44.1kHz, mono)
function generateRandomAudio() {
const sampleRate = 44100;
const duration = 1;
const numSamples = sampleRate * duration;
const buffer = new ArrayBuffer(numSamples * 2);
const view = new Int16Array(buffer);
for (let i = 0; i < numSamples; i++) {
view[i] = Math.floor(Math.random() * 65536 - 32768);
}
return buffer;
}

Parameters

encoding
stringrequiredenum: linear16Encoding of the audio stream. Currently only supports raw signed little-endian 16-bit PCM.
sample_rate
stringrequiredpattern: ^[0-9]+$Sample rate of the audio stream in Hz.
eager_eot_threshold
stringEnd-of-turn confidence required to fire an eager end-of-turn event. When set, enables EagerEndOfTurn and TurnResumed events. Valid Values 0.3 - 0.9.
eot_threshold
stringdefault: 0.7End-of-turn confidence required to finish a turn. Valid Values 0.5 - 0.9.
eot_timeout_ms
stringdefault: 5000pattern: ^[0-9]+$A turn will be finished when this much time has passed after speech, regardless of EOT confidence.
keyterm
stringKeyterm prompting can improve recognition of specialized terminology. Pass multiple keyterm query parameters to boost multiple keyterms.
mip_opt_out
stringdefault: falseenum: true, falseOpts out requests from the Deepgram Model Improvement Program. Refer to Deepgram Docs for pricing impacts before setting this to true. https://dpgr.am/deepgram-mip
tag
stringLabel your requests for the purpose of identification during usage reporting

API Schemas

{
"type": "object",
"properties": {
"encoding": {
"type": "string",
"description": "Encoding of the audio stream. Currently only supports raw signed little-endian 16-bit PCM.",
"enum": [
"linear16"
]
},
"sample_rate": {
"type": "string",
"description": "Sample rate of the audio stream in Hz.",
"pattern": "^[0-9]+$"
},
"eager_eot_threshold": {
"type": "string",
"description": "End-of-turn confidence required to fire an eager end-of-turn event. When set, enables EagerEndOfTurn and TurnResumed events. Valid Values 0.3 - 0.9."
},
"eot_threshold": {
"type": "string",
"description": "End-of-turn confidence required to finish a turn. Valid Values 0.5 - 0.9.",
"default": "0.7"
},
"eot_timeout_ms": {
"type": "string",
"description": "A turn will be finished when this much time has passed after speech, regardless of EOT confidence.",
"default": "5000",
"pattern": "^[0-9]+$"
},
"keyterm": {
"type": "string",
"description": "Keyterm prompting can improve recognition of specialized terminology. Pass multiple keyterm query parameters to boost multiple keyterms."
},
"mip_opt_out": {
"type": "string",
"description": "Opts out requests from the Deepgram Model Improvement Program. Refer to Deepgram Docs for pricing impacts before setting this to true. https://dpgr.am/deepgram-mip",
"enum": [
"true",
"false"
],
"default": "false"
},
"tag": {
"type": "string",
"description": "Label your requests for the purpose of identification during usage reporting"
}
},
"required": [
"sample_rate",
"encoding"
]
}