Skip to content
AssemblyAI logo

AssemblyAI Universal-3 Pro

Automatic Speech RecognitionAssemblyAIProxied

AssemblyAI's Universal 3 Pro speech recognition model for high-accuracy transcription.

Model Info
Terms and Licenselink
More informationlink
PricingView pricing in the Cloudflare dashboard

Usage

TypeScript
const response = await env.AI.run(
'assemblyai/universal-3-pro',
{ audio_url: 'https://cdn.openai.com/API/docs/audio/alloy.wav' },
)
console.log(response)
The sun rises in the east and sets in the west. This simple fact has been observed by humans for thousands of years.

Examples

With Language Code — Transcribe with an explicit language code
TypeScript
const response = await env.AI.run(
'assemblyai/universal-3-pro',
{ audio_url: 'https://cdn.openai.com/API/docs/audio/echo.wav', language_code: 'en' },
)
console.log(response)
In the heart of the city, there is a large park where people go to relax and enjoy nature. The park has a beautiful pond with ducks and swans.
With Key Terms — Improve accuracy for domain-specific vocabulary
TypeScript
const response = await env.AI.run(
'assemblyai/universal-3-pro',
{
audio_url: 'https://cdn.openai.com/API/docs/audio/nova.wav',
keyterms_prompt: ['Kubernetes', 'microservices', 'containerization', 'load balancer'],
},
)
console.log(response)
In the kitchen, the aroma of freshly baked bread filled the air. The loaves were golden brown and crusty on the outside and soft and warm on the inside.
Speaker Diarization — Identify different speakers in the audio
TypeScript
const response = await env.AI.run(
'assemblyai/universal-3-pro',
{ audio_url: 'https://cdn.openai.com/API/docs/audio/onyx.wav', speaker_labels: true },
)
console.log(response)
The train chugged along the tracks, carrying passengers to their destinations. The rhythmic sound of the wheels on the rails was soothing.

Parameters

audio_end_at
integermaximum: 9007199254740991minimum: 0Timestamp (in milliseconds) to end transcription at.
audio_start_from
integermaximum: 9007199254740991minimum: 0Timestamp (in milliseconds) to start transcription from.
audio_url
stringrequiredThe URL of the audio file to transcribe. Can be a publicly accessible URL or a data URI (data:audio/...;base64,...). For data URIs, the audio will be uploaded to AssemblyAI automatically.
auto_chapters
booleanEnable automatic chapter detection.
auto_highlights
booleanEnable automatic extraction of key phrases and highlights.
boost_param
stringenum: low, default, highHow much to boost the words in word_boost.
content_safety
booleanEnable content safety detection for sensitive content.
disfluencies
booleanInclude filler words like "um", "uh", etc. in the transcript.
domain
stringenum: medical-v1Domain-specific transcription mode. "medical-v1" enables medical terminology optimization.
dual_channel
booleanProcess audio as dual-channel (stereo) for better accuracy.
entity_detection
booleanEnable detection of entities like names, organizations, and locations.
filter_profanity
booleanFilter profanity from the transcription.
iab_categories
booleanEnable IAB (Interactive Advertising Bureau) content taxonomy classification.
language_code
stringThe language code for the audio file (e.g., "en", "es", "fr"). Defaults to automatic language detection.
language_detection
booleanEnable automatic language detection. When enabled with speech_models, the system will automatically select the best model for the detected language.
multichannel
booleanProcess each audio channel separately for multi-channel audio files.
prompt
stringA custom prompt to guide transcription style, formatting, and output characteristics. Maximum 1,500 words.
redact_pii
booleanRedact personally identifiable information.
redact_pii_audio
booleanGenerate a redacted audio file with PII removed.
redact_pii_sub
stringenum: entity_name, hashStrategy for substituting redacted PII.
sentiment_analysis
booleanEnable sentiment analysis for each sentence.
speaker_labels
booleanEnable speaker diarization to identify different speakers in the audio.
speakers_expected
integermaximum: 9007199254740991minimum: 1Expected number of speakers for speaker diarization.
speech_threshold
numbermaximum: 1minimum: 0Confidence threshold for speech detection.
temperature
numbermaximum: 1minimum: 0Controls randomness in model output (0.0-1.0). Lower values make output more deterministic. Default is 0.0.
webhook_url
stringformat: uriURL to receive webhook notifications when transcription is complete.

API Schemas (Raw)

Input
Output