flux
Automatic Speech Recognition • DeepgramFlux is the first conversational speech recognition model built specifically for voice agents.
Model Info | |
---|---|
Partner | Yes |
Real-time | Yes |
Usage
Step 1: Create a Worker that establishes a WebSocket connection
export default { async fetch(request, env, ctx): Promise<Response> { const resp = await env.AI.run("@cf/deepgram/flux", { encoding: "linear16", sample_rate: "16000" }, { websocket: true }); return resp; },} satisfies ExportedHandler<Env>;
Step 2: Deploy your Worker
npx wrangler deploy
Step 3: Write a client script to connect to your Worker and send audio
const ws = new WebSocket('wss://<your-worker-url.com>');
ws.onopen = () => { console.log('Connected to WebSocket');
// Generate and send random audio bytes // You can replace this part with a function // that reads from your mic or other audio source const audioData = generateRandomAudio(); ws.send(audioData); console.log('Audio data sent');};
ws.onmessage = (event) => { // Transcription will be received here // Add your custom logic to parse the data console.log('Received:', event.data);};
ws.onerror = (error) => { console.error('WebSocket error:', error);};
ws.onclose = () => { console.log('WebSocket closed');};
// Generate random audio data (1 second of noise at 44.1kHz, mono)function generateRandomAudio() { const sampleRate = 44100; const duration = 1; const numSamples = sampleRate * duration; const buffer = new ArrayBuffer(numSamples * 2); const view = new Int16Array(buffer);
for (let i = 0; i < numSamples; i++) { view[i] = Math.floor(Math.random() * 65536 - 32768); }
return buffer;}
Parameters
* indicates a required field
Input
-
encoding
stringEncoding of the audio stream. Currently only supports raw signed little-endian 16-bit PCM.
-
sample_rate
stringSample rate of the audio stream in Hz.
-
eager_eot_threshold
stringEnd-of-turn confidence required to fire an eager end-of-turn event. When set, enables EagerEndOfTurn and TurnResumed events. Valid Values 0.3 - 0.9.
-
eot_threshold
string default 0.7End-of-turn confidence required to finish a turn. Valid Values 0.5 - 0.9.
-
eot_timeout_ms
string default 5000A turn will be finished when this much time has passed after speech, regardless of EOT confidence.
-
keyterm
stringKeyterm prompting can improve recognition of specialized terminology. Pass multiple keyterm query parameters to boost multiple keyterms.
-
mip_opt_out
string default falseOpts out requests from the Deepgram Model Improvement Program. Refer to Deepgram Docs for pricing impacts before setting this to true. https://dpgr.am/deepgram-mip
-
tag
stringLabel your requests for the purpose of identification during usage reporting
Output
-
request_id
stringThe unique identifier of the request (uuid)
-
sequence_id
integer min 0Starts at 0 and increments for each message the server sends to the client.
-
event
stringThe type of event being reported.
-
turn_index
integer min 0The index of the current turn
-
audio_window_start
numberStart time in seconds of the audio range that was transcribed
-
audio_window_end
numberEnd time in seconds of the audio range that was transcribed
-
transcript
stringText that was said over the course of the current turn
-
words
arrayThe words in the transcript
-
items
object-
word
string requiredThe individual punctuated, properly-cased word from the transcript
-
confidence
number requiredConfidence that this word was transcribed correctly
-
-
-
end_of_turn_confidence
numberConfidence that no more speech is coming in this turn
API Schemas
The following schemas are based on JSON Schema
{ "type": "object", "properties": { "encoding": { "type": "string", "description": "Encoding of the audio stream. Currently only supports raw signed little-endian 16-bit PCM.", "enum": [ "linear16" ] }, "sample_rate": { "type": "string", "description": "Sample rate of the audio stream in Hz.", "pattern": "^[0-9]+$" }, "eager_eot_threshold": { "type": "string", "description": "End-of-turn confidence required to fire an eager end-of-turn event. When set, enables EagerEndOfTurn and TurnResumed events. Valid Values 0.3 - 0.9." }, "eot_threshold": { "type": "string", "description": "End-of-turn confidence required to finish a turn. Valid Values 0.5 - 0.9.", "default": "0.7" }, "eot_timeout_ms": { "type": "string", "description": "A turn will be finished when this much time has passed after speech, regardless of EOT confidence.", "default": "5000", "pattern": "^[0-9]+$" }, "keyterm": { "type": "string", "description": "Keyterm prompting can improve recognition of specialized terminology. Pass multiple keyterm query parameters to boost multiple keyterms." }, "mip_opt_out": { "type": "string", "description": "Opts out requests from the Deepgram Model Improvement Program. Refer to Deepgram Docs for pricing impacts before setting this to true. https://dpgr.am/deepgram-mip", "enum": [ "true", "false" ], "default": "false" }, "tag": { "type": "string", "description": "Label your requests for the purpose of identification during usage reporting" }, "required": [ "sample_rate", "encoding" ] }}
{ "type": "object", "description": "Output will be returned as websocket messages.", "properties": { "request_id": { "type": "string", "description": "The unique identifier of the request (uuid)" }, "sequence_id": { "type": "integer", "description": "Starts at 0 and increments for each message the server sends to the client.", "minimum": 0 }, "event": { "type": "string", "description": "The type of event being reported.", "enum": [ "Update", "StartOfTurn", "EagerEndOfTurn", "TurnResumed", "EndOfTurn" ] }, "turn_index": { "type": "integer", "description": "The index of the current turn", "minimum": 0 }, "audio_window_start": { "type": "number", "description": "Start time in seconds of the audio range that was transcribed" }, "audio_window_end": { "type": "number", "description": "End time in seconds of the audio range that was transcribed" }, "transcript": { "type": "string", "description": "Text that was said over the course of the current turn" }, "words": { "type": "array", "description": "The words in the transcript", "items": { "type": "object", "required": [ "word", "confidence" ], "properties": { "word": { "type": "string", "description": "The individual punctuated, properly-cased word from the transcript" }, "confidence": { "type": "number", "description": "Confidence that this word was transcribed correctly" } } } }, "end_of_turn_confidence": { "type": "number", "description": "Confidence that no more speech is coming in this turn" } }}
Was this helpful?
- Resources
- API
- New to Cloudflare?
- Directory
- Sponsorships
- Open Source
- Support
- Help Center
- System Status
- Compliance
- GDPR
- Company
- cloudflare.com
- Our team
- Careers
- © 2025 Cloudflare, Inc.
- Privacy Policy
- Terms of Use
- Report Security Issues
- Trademark
-