Universal 3 Pro
Automatic Speech Recognition • AssemblyAI • ProxiedAssemblyAI's Universal 3 Pro speech recognition model for high-accuracy transcription.
| Model Info | |
|---|---|
| Terms and License | link ↗ |
| More information | link ↗ |
Usage
const response = await env.AI.run( 'assemblyai/universal-3-pro', { audio_url: 'https://cdn.openai.com/API/docs/audio/alloy.wav', }, { gateway: { id: 'default' }, })console.log(response)Examples
With Language Code — Transcribe with an explicit language code
const response = await env.AI.run( 'assemblyai/universal-3-pro', { audio_url: 'https://cdn.openai.com/API/docs/audio/echo.wav', language_code: 'en', }, { gateway: { id: 'default' }, })console.log(response)With Key Terms — Improve accuracy for domain-specific vocabulary
const response = await env.AI.run( 'assemblyai/universal-3-pro', { audio_url: 'https://cdn.openai.com/API/docs/audio/nova.wav', keyterms_prompt: [ 'Kubernetes', 'microservices', 'containerization', 'load balancer', ], }, { gateway: { id: 'default' }, })console.log(response)Input / Output JSON
{ "audio_url": "https://cdn.openai.com/API/docs/audio/nova.wav", "keyterms_prompt": [ "Kubernetes", "microservices", "containerization", "load balancer" ]}{ "text": "In the kitchen, the aroma of freshly baked bread filled the air. The loaves were golden brown and crusty on the outside and soft and warm on the inside."}Speaker Diarization — Identify different speakers in the audio
const response = await env.AI.run( 'assemblyai/universal-3-pro', { audio_url: 'https://cdn.openai.com/API/docs/audio/onyx.wav', speaker_labels: true, }, { gateway: { id: 'default' }, })console.log(response)Parameters
stringrequiredThe URL of the audio file to transcribe. Can be a publicly accessible URL or a data URI (data:audio/...;base64,...). For data URIs, the audio will be uploaded to AssemblyAI automatically.stringThe language code for the audio file (e.g., "en", "es", "fr"). Defaults to automatic language detection.booleanEnable automatic language detection. When enabled with speech_models, the system will automatically select the best model for the detected language.stringA custom prompt to guide transcription style, formatting, and output characteristics. Maximum 1,500 words.arrayAn array of up to 1,000 words or phrases (max 6 words per phrase) to improve transcription accuracy. Cannot be used with the prompt parameter.numberminimum: 0maximum: 1Controls randomness in model output (0.0-1.0). Lower values make output more deterministic. Default is 0.0.booleanEnable speaker diarization to identify different speakers in the audio.integerminimum: 1maximum: 50Expected number of speakers for speaker diarization.booleanEnable automatic chapter detection.booleanEnable detection of entities like names, organizations, and locations.booleanEnable sentiment analysis for each sentence.booleanEnable automatic extraction of key phrases and highlights.booleanEnable content safety detection for sensitive content.booleanEnable IAB (Interactive Advertising Bureau) content taxonomy classification.arrayCustom spelling rules to replace specific words or phrases in the transcription output.booleanInclude filler words like "um", "uh", etc. in the transcript.booleanProcess each audio channel separately for multi-channel audio files.booleanProcess audio as dual-channel (stereo) for better accuracy.stringformat: uriURL to receive webhook notifications when transcription is complete.integerminimum: 0Timestamp (in milliseconds) to start transcription from.integerminimum: 0Timestamp (in milliseconds) to end transcription at.arrayArray of words to boost recognition accuracy (legacy - use keyterms_prompt instead).stringenum: low, default, highHow much to boost the words in word_boost.booleanFilter profanity from the transcription.booleanRedact personally identifiable information.booleanGenerate a redacted audio file with PII removed.arraySpecific PII policies to apply for redaction.stringenum: entity_name, hashStrategy for substituting redacted PII.numberminimum: 0maximum: 1Confidence threshold for speech detection.stringThe transcribed text.API Schemas
{ "$schema": "https://json-schema.org/draft/2020-12/schema", "type": "object", "properties": { "audio_url": { "description": "The URL of the audio file to transcribe. Can be a publicly accessible URL or a data URI (data:audio/...;base64,...). For data URIs, the audio will be uploaded to AssemblyAI automatically.", "type": "string" }, "language_code": { "description": "The language code for the audio file (e.g., \"en\", \"es\", \"fr\"). Defaults to automatic language detection.", "type": "string" }, "language_detection": { "description": "Enable automatic language detection. When enabled with speech_models, the system will automatically select the best model for the detected language.", "type": "boolean" }, "prompt": { "description": "A custom prompt to guide transcription style, formatting, and output characteristics. Maximum 1,500 words.", "type": "string" }, "keyterms_prompt": { "description": "An array of up to 1,000 words or phrases (max 6 words per phrase) to improve transcription accuracy. Cannot be used with the prompt parameter.", "type": "array", "items": { "type": "string" } }, "temperature": { "description": "Controls randomness in model output (0.0-1.0). Lower values make output more deterministic. Default is 0.0.", "type": "number", "minimum": 0, "maximum": 1 }, "speaker_labels": { "description": "Enable speaker diarization to identify different speakers in the audio.", "type": "boolean" }, "speakers_expected": { "description": "Expected number of speakers for speaker diarization.", "type": "integer", "minimum": 1, "maximum": 50 }, "auto_chapters": { "description": "Enable automatic chapter detection.", "type": "boolean" }, "entity_detection": { "description": "Enable detection of entities like names, organizations, and locations.", "type": "boolean" }, "sentiment_analysis": { "description": "Enable sentiment analysis for each sentence.", "type": "boolean" }, "auto_highlights": { "description": "Enable automatic extraction of key phrases and highlights.", "type": "boolean" }, "content_safety": { "description": "Enable content safety detection for sensitive content.", "type": "boolean" }, "iab_categories": { "description": "Enable IAB (Interactive Advertising Bureau) content taxonomy classification.", "type": "boolean" }, "custom_spelling": { "description": "Custom spelling rules to replace specific words or phrases in the transcription output.", "type": "array", "items": { "type": "object", "properties": { "from": { "type": "array", "items": { "type": "string" } }, "to": { "type": "string" } }, "required": [ "from", "to" ], "additionalProperties": false } }, "disfluencies": { "description": "Include filler words like \"um\", \"uh\", etc. in the transcript.", "type": "boolean" }, "multichannel": { "description": "Process each audio channel separately for multi-channel audio files.", "type": "boolean" }, "dual_channel": { "description": "Process audio as dual-channel (stereo) for better accuracy.", "type": "boolean" }, "webhook_url": { "description": "URL to receive webhook notifications when transcription is complete.", "type": "string", "format": "uri" }, "audio_start_from": { "description": "Timestamp (in milliseconds) to start transcription from.", "type": "integer", "minimum": 0 }, "audio_end_at": { "description": "Timestamp (in milliseconds) to end transcription at.", "type": "integer", "minimum": 0 }, "word_boost": { "description": "Array of words to boost recognition accuracy (legacy - use keyterms_prompt instead).", "type": "array", "items": { "type": "string" } }, "boost_param": { "description": "How much to boost the words in word_boost.", "type": "string", "enum": [ "low", "default", "high" ] }, "filter_profanity": { "description": "Filter profanity from the transcription.", "type": "boolean" }, "redact_pii": { "description": "Redact personally identifiable information.", "type": "boolean" }, "redact_pii_audio": { "description": "Generate a redacted audio file with PII removed.", "type": "boolean" }, "redact_pii_policies": { "description": "Specific PII policies to apply for redaction.", "type": "array", "items": { "type": "string" } }, "redact_pii_sub": { "description": "Strategy for substituting redacted PII.", "type": "string", "enum": [ "entity_name", "hash" ] }, "speech_threshold": { "description": "Confidence threshold for speech detection.", "type": "number", "minimum": 0, "maximum": 1 } }, "required": [ "audio_url" ], "additionalProperties": false}{ "$schema": "https://json-schema.org/draft/2020-12/schema", "type": "object", "properties": { "text": { "description": "The transcribed text.", "type": "string" } }, "required": [ "text" ], "additionalProperties": false}