AssemblyAI Universal-3 Pro

Automatic Speech Recognition • AssemblyAI

AssemblyAI's Universal 3 Pro speech recognition model for high-accuracy transcription.

Model Info
Terms and License	link ↗
More information	link ↗
Zero data retention	Yes
Pricing	View pricing in the Cloudflare dashboard ↗

Usage

TypeScript
cURL

const response = await env.AI.run(
  'assemblyai/universal-3-pro',
  { audio_url: 'https://cdn.openai.com/API/docs/audio/alloy.wav' },
)
console.log(response)

curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run \
  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "model": "assemblyai/universal-3-pro",
  "input": {
    "audio_url": "https://cdn.openai.com/API/docs/audio/alloy.wav"
  }
}'

Output
Raw response

The sun rises in the east and sets in the west. This simple fact has been observed by humans for thousands of years.

{
  "gatewayMetadata": {
    "keySource": "Unified"
  },
  "result": {
    "confidence": 0.99276465,
    "language_code": "en",
    "language_confidence": 0.9998,
    "text": "The sun rises in the east and sets in the west. This simple fact has been observed by humans for thousands of years.",
    "utterances": null,
    "words": [
      {
        "confidence": 0.9713957,
        "end": 129,
        "speaker": null,
        "start": 32,
        "text": "The"
      },
      {
        "confidence": 0.97053415,
        "end": 404,
        "speaker": null,
        "start": 129,
        "text": "sun"
      },
      {
        "confidence": 0.9998932,
        "end": 809,
        "speaker": null,
        "start": 420,
        "text": "rises"
      },
      {
        "confidence": 0.999092,
        "end": 922,
        "speaker": null,
        "start": 841,
        "text": "in"
      },
      {
        "confidence": 0.9997658,
        "end": 1068,
        "speaker": null,
        "start": 922,
        "text": "the"
      },
      {
        "confidence": 0.9684294,
        "end": 1456,
        "speaker": null,
        "start": 1149,
        "text": "east"
      },
      {
        "confidence": 0.9894344,
        "end": 1634,
        "speaker": null,
        "start": 1570,
        "text": "and"
      },
      {
        "confidence": 0.9999058,
        "end": 2055,
        "speaker": null,
        "start": 1715,
        "text": "sets"
      },
      {
        "confidence": 0.9997663,
        "end": 2104,
        "speaker": null,
        "start": 2055,
        "text": "in"
      },
      {
        "confidence": 0.9999552,
        "end": 2217,
        "speaker": null,
        "start": 2120,
        "text": "the"
      },
      {
        "confidence": 0.9913442,
        "end": 2638,
        "speaker": null,
        "start": 2217,
        "text": "west."
      },
      {
        "confidence": 0.9974367,
        "end": 3221,
        "speaker": null,
        "start": 3107,
        "text": "This"
      },
      {
        "confidence": 0.99965656,
        "end": 3560,
        "speaker": null,
        "start": 3269,
        "text": "simple"
      },
      {
        "confidence": 0.999713,
        "end": 3997,
        "speaker": null,
        "start": 3593,
        "text": "fact"
      },
      {
        "confidence": 0.99924207,
        "end": 4175,
        "speaker": null,
        "start": 3997,
        "text": "has"
      },
      {
        "confidence": 0.9995851,
        "end": 4289,
        "speaker": null,
        "start": 4224,
        "text": "been"
      },
      {
        "confidence": 0.9984724,
        "end": 4807,
        "speaker": null,
        "start": 4337,
        "text": "observed"
      },
      {
        "confidence": 0.9997143,
        "end": 4952,
        "speaker": null,
        "start": 4807,
        "text": "by"
      },
      {
        "confidence": 0.9997894,
        "end": 5422,
        "speaker": null,
        "start": 4969,
        "text": "humans"
      },
      {
        "confidence": 0.99947494,
        "end": 5519,
        "speaker": null,
        "start": 5422,
        "text": "for"
      },
      {
        "confidence": 0.99950385,
        "end": 6118,
        "speaker": null,
        "start": 5616,
        "text": "thousands"
      },
      {
        "confidence": 0.9995235,
        "end": 6231,
        "speaker": null,
        "start": 6118,
        "text": "of"
      },
      {
        "confidence": 0.9519594,
        "end": 6636,
        "speaker": null,
        "start": 6328,
        "text": "years."
      }
    ]
  },
  "state": "Completed"
}

Examples

With Language Code — Transcribe with an explicit language code

TypeScript
cURL

const response = await env.AI.run(
  'assemblyai/universal-3-pro',
  { audio_url: 'https://cdn.openai.com/API/docs/audio/echo.wav', language_code: 'en' },
)
console.log(response)

curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run \
  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "model": "assemblyai/universal-3-pro",
  "input": {
    "audio_url": "https://cdn.openai.com/API/docs/audio/echo.wav",
    "language_code": "en"
  }
}'

Output
Raw response

In the heart of the city, there is a large park where people go to relax and enjoy nature. The park has a beautiful pond with ducks and swans.

{
  "gatewayMetadata": {
    "keySource": "Unified"
  },
  "result": {
    "confidence": 0.9927905,
    "language_code": "en_us",
    "language_confidence": null,
    "text": "In the heart of the city, there is a large park where people go to relax and enjoy nature. The park has a beautiful pond with ducks and swans.",
    "utterances": null,
    "words": [
      {
        "confidence": 0.88134426,
        "end": 80,
        "speaker": null,
        "start": 32,
        "text": "In"
      },
      {
        "confidence": 0.9984907,
        "end": 241,
        "speaker": null,
        "start": 177,
        "text": "the"
      },
      {
        "confidence": 0.99956447,
        "end": 500,
        "speaker": null,
        "start": 258,
        "text": "heart"
      },
      {
        "confidence": 0.9995684,
        "end": 548,
        "speaker": null,
        "start": 500,
        "text": "of"
      },
      {
        "confidence": 0.99977916,
        "end": 677,
        "speaker": null,
        "start": 596,
        "text": "the"
      },
      {
        "confidence": 0.9956655,
        "end": 967,
        "speaker": null,
        "start": 677,
        "text": "city,"
      },
      {
        "confidence": 0.9987048,
        "end": 1435,
        "speaker": null,
        "start": 1322,
        "text": "there"
      },
      {
        "confidence": 0.99971443,
        "end": 1516,
        "speaker": null,
        "start": 1467,
        "text": "is"
      },
      {
        "confidence": 0.99948585,
        "end": 1596,
        "speaker": null,
        "start": 1564,
        "text": "a"
      },
      {
        "confidence": 0.9987669,
        "end": 2016,
        "speaker": null,
        "start": 1709,
        "text": "large"
      },
      {
        "confidence": 0.9981509,
        "end": 2467,
        "speaker": null,
        "start": 2129,
        "text": "park"
      },
      {
        "confidence": 0.9559358,
        "end": 2838,
        "speaker": null,
        "start": 2693,
        "text": "where"
      },
      {
        "confidence": 0.99979085,
        "end": 3145,
        "speaker": null,
        "start": 2854,
        "text": "people"
      },
      {
        "confidence": 0.9993555,
        "end": 3338,
        "speaker": null,
        "start": 3177,
        "text": "go"
      },
      {
        "confidence": 0.9998317,
        "end": 3467,
        "speaker": null,
        "start": 3338,
        "text": "to"
      },
      {
        "confidence": 0.99991953,
        "end": 4064,
        "speaker": null,
        "start": 3500,
        "text": "relax"
      },
      {
        "confidence": 0.9988979,
        "end": 4161,
        "speaker": null,
        "start": 4064,
        "text": "and"
      },
      {
        "confidence": 0.9999237,
        "end": 4484,
        "speaker": null,
        "start": 4161,
        "text": "enjoy"
      },
      {
        "confidence": 0.998528,
        "end": 4887,
        "speaker": null,
        "start": 4484,
        "text": "nature."
      },
      {
        "confidence": 0.990198,
        "end": 5758,
        "speaker": null,
        "start": 5597,
        "text": "The"
      },
      {
        "confidence": 0.99979144,
        "end": 6016,
        "speaker": null,
        "start": 5758,
        "text": "park"
      },
      {
        "confidence": 0.99926263,
        "end": 6177,
        "speaker": null,
        "start": 6064,
        "text": "has"
      },
      {
        "confidence": 0.9992211,
        "end": 6242,
        "speaker": null,
        "start": 6177,
        "text": "a"
      },
      {
        "confidence": 0.99989605,
        "end": 6774,
        "speaker": null,
        "start": 6322,
        "text": "beautiful"
      },
      {
        "confidence": 0.9998628,
        "end": 7193,
        "speaker": null,
        "start": 6790,
        "text": "pond"
      },
      {
        "confidence": 0.99960047,
        "end": 7355,
        "speaker": null,
        "start": 7193,
        "text": "with"
      },
      {
        "confidence": 0.99963534,
        "end": 7806,
        "speaker": null,
        "start": 7371,
        "text": "ducks"
      },
      {
        "confidence": 0.99866796,
        "end": 7919,
        "speaker": null,
        "start": 7855,
        "text": "and"
      },
      {
        "confidence": 0.9833702,
        "end": 8629,
        "speaker": null,
        "start": 7935,
        "text": "swans."
      }
    ]
  },
  "state": "Completed"
}

With Key Terms — Improve accuracy for domain-specific vocabulary

TypeScript
cURL

const response = await env.AI.run(
  'assemblyai/universal-3-pro',
  {
    audio_url: 'https://cdn.openai.com/API/docs/audio/nova.wav',
    keyterms_prompt: ['Kubernetes', 'microservices', 'containerization', 'load balancer'],
  },
)
console.log(response)

curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run \
  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "model": "assemblyai/universal-3-pro",
  "input": {
    "audio_url": "https://cdn.openai.com/API/docs/audio/nova.wav",
    "keyterms_prompt": [
      "Kubernetes",
      "microservices",
      "containerization",
      "load balancer"
    ]
  }
}'

Output
Raw response

In the kitchen, the aroma of freshly baked bread filled the air. The loaves were golden brown and crusty on the outside and soft and warm on the inside.

{
  "gatewayMetadata": {
    "keySource": "Unified"
  },
  "result": {
    "confidence": 0.9901139,
    "language_code": "en",
    "language_confidence": 0.9969,
    "text": "In the kitchen, the aroma of freshly baked bread filled the air. The loaves were golden brown and crusty on the outside and soft and warm on the inside.",
    "utterances": null,
    "words": [
      {
        "confidence": 0.9785539,
        "end": 80,
        "speaker": null,
        "start": 32,
        "text": "In"
      },
      {
        "confidence": 0.99962807,
        "end": 242,
        "speaker": null,
        "start": 177,
        "text": "the"
      },
      {
        "confidence": 0.99617165,
        "end": 565,
        "speaker": null,
        "start": 258,
        "text": "kitchen,"
      },
      {
        "confidence": 0.9991928,
        "end": 839,
        "speaker": null,
        "start": 743,
        "text": "the"
      },
      {
        "confidence": 0.99992657,
        "end": 1292,
        "speaker": null,
        "start": 839,
        "text": "aroma"
      },
      {
        "confidence": 0.99955577,
        "end": 1405,
        "speaker": null,
        "start": 1308,
        "text": "of"
      },
      {
        "confidence": 0.9996594,
        "end": 1889,
        "speaker": null,
        "start": 1405,
        "text": "freshly"
      },
      {
        "confidence": 0.99850214,
        "end": 2261,
        "speaker": null,
        "start": 1970,
        "text": "baked"
      },
      {
        "confidence": 0.9999217,
        "end": 2584,
        "speaker": null,
        "start": 2293,
        "text": "bread"
      },
      {
        "confidence": 0.9999,
        "end": 2859,
        "speaker": null,
        "start": 2600,
        "text": "filled"
      },
      {
        "confidence": 0.99993885,
        "end": 3004,
        "speaker": null,
        "start": 2859,
        "text": "the"
      },
      {
        "confidence": 0.9961201,
        "end": 3262,
        "speaker": null,
        "start": 3020,
        "text": "air."
      },
      {
        "confidence": 0.99501073,
        "end": 4119,
        "speaker": null,
        "start": 4054,
        "text": "The"
      },
      {
        "confidence": 0.9997483,
        "end": 4522,
        "speaker": null,
        "start": 4215,
        "text": "loaves"
      },
      {
        "confidence": 0.9998282,
        "end": 4781,
        "speaker": null,
        "start": 4619,
        "text": "were"
      },
      {
        "confidence": 0.99248224,
        "end": 5249,
        "speaker": null,
        "start": 4878,
        "text": "golden"
      },
      {
        "confidence": 0.9700398,
        "end": 5718,
        "speaker": null,
        "start": 5362,
        "text": "brown"
      },
      {
        "confidence": 0.9419883,
        "end": 5992,
        "speaker": null,
        "start": 5928,
        "text": "and"
      },
      {
        "confidence": 0.9994146,
        "end": 6541,
        "speaker": null,
        "start": 6089,
        "text": "crusty"
      },
      {
        "confidence": 0.9997141,
        "end": 6703,
        "speaker": null,
        "start": 6574,
        "text": "on"
      },
      {
        "confidence": 0.9999218,
        "end": 6784,
        "speaker": null,
        "start": 6719,
        "text": "the"
      },
      {
        "confidence": 0.9993179,
        "end": 7365,
        "speaker": null,
        "start": 6881,
        "text": "outside"
      },
      {
        "confidence": 0.8661144,
        "end": 7462,
        "speaker": null,
        "start": 7365,
        "text": "and"
      },
      {
        "confidence": 0.9996922,
        "end": 7882,
        "speaker": null,
        "start": 7462,
        "text": "soft"
      },
      {
        "confidence": 0.9998481,
        "end": 7995,
        "speaker": null,
        "start": 7882,
        "text": "and"
      },
      {
        "confidence": 0.99996424,
        "end": 8270,
        "speaker": null,
        "start": 8028,
        "text": "warm"
      },
      {
        "confidence": 0.9998791,
        "end": 8399,
        "speaker": null,
        "start": 8270,
        "text": "on"
      },
      {
        "confidence": 0.99982506,
        "end": 8512,
        "speaker": null,
        "start": 8415,
        "text": "the"
      },
      {
        "confidence": 0.9834439,
        "end": 8964,
        "speaker": null,
        "start": 8512,
        "text": "inside."
      }
    ]
  },
  "state": "Completed"
}

Speaker Diarization — Identify different speakers in the audio

TypeScript
cURL

const response = await env.AI.run(
  'assemblyai/universal-3-pro',
  { audio_url: 'https://cdn.openai.com/API/docs/audio/onyx.wav', speaker_labels: true },
)
console.log(response)

curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run \
  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "model": "assemblyai/universal-3-pro",
  "input": {
    "audio_url": "https://cdn.openai.com/API/docs/audio/onyx.wav",
    "speaker_labels": true
  }
}'

Output
Raw response

The train chugged along the tracks, carrying passengers to their destinations. The rhythmic sound of the wheels on the rails was soothing.

{
  "gatewayMetadata": {
    "keySource": "Unified"
  },
  "result": {
    "confidence": 0.99781793,
    "language_code": "en",
    "language_confidence": 0.9906,
    "text": "The train chugged along the tracks, carrying passengers to their destinations. The rhythmic sound of the wheels on the rails was soothing.",
    "utterances": [
      {
        "confidence": 0.99781793,
        "end": 7719,
        "speaker": "A",
        "start": 32,
        "text": "The train chugged along the tracks, carrying passengers to their destinations. The rhythmic sound of the wheels on the rails was soothing."
      }
    ],
    "words": [
      {
        "confidence": 0.9742124,
        "end": 113,
        "speaker": "A",
        "start": 32,
        "text": "The"
      },
      {
        "confidence": 0.99997795,
        "end": 403,
        "speaker": "A",
        "start": 177,
        "text": "train"
      },
      {
        "confidence": 0.99713653,
        "end": 904,
        "speaker": "A",
        "start": 516,
        "text": "chugged"
      },
      {
        "confidence": 0.9999881,
        "end": 1130,
        "speaker": "A",
        "start": 904,
        "text": "along"
      },
      {
        "confidence": 0.9999676,
        "end": 1308,
        "speaker": "A",
        "start": 1227,
        "text": "the"
      },
      {
        "confidence": 0.9995983,
        "end": 1808,
        "speaker": "A",
        "start": 1308,
        "text": "tracks,"
      },
      {
        "confidence": 0.9998933,
        "end": 2341,
        "speaker": "A",
        "start": 2131,
        "text": "carrying"
      },
      {
        "confidence": 0.999992,
        "end": 3068,
        "speaker": "A",
        "start": 2454,
        "text": "passengers"
      },
      {
        "confidence": 0.99999034,
        "end": 3229,
        "speaker": "A",
        "start": 3100,
        "text": "to"
      },
      {
        "confidence": 0.9999908,
        "end": 3423,
        "speaker": "A",
        "start": 3262,
        "text": "their"
      },
      {
        "confidence": 0.9992286,
        "end": 4198,
        "speaker": "A",
        "start": 3423,
        "text": "destinations."
      },
      {
        "confidence": 0.99871373,
        "end": 5119,
        "speaker": "A",
        "start": 5038,
        "text": "The"
      },
      {
        "confidence": 0.9999517,
        "end": 5523,
        "speaker": "A",
        "start": 5184,
        "text": "rhythmic"
      },
      {
        "confidence": 0.99993813,
        "end": 5926,
        "speaker": "A",
        "start": 5523,
        "text": "sound"
      },
      {
        "confidence": 0.99991894,
        "end": 6007,
        "speaker": "A",
        "start": 5926,
        "text": "of"
      },
      {
        "confidence": 0.99993825,
        "end": 6088,
        "speaker": "A",
        "start": 6007,
        "text": "the"
      },
      {
        "confidence": 0.99995935,
        "end": 6459,
        "speaker": "A",
        "start": 6169,
        "text": "wheels"
      },
      {
        "confidence": 0.99997675,
        "end": 6605,
        "speaker": "A",
        "start": 6556,
        "text": "on"
      },
      {
        "confidence": 0.99999475,
        "end": 6718,
        "speaker": "A",
        "start": 6637,
        "text": "the"
      },
      {
        "confidence": 0.9999932,
        "end": 7105,
        "speaker": "A",
        "start": 6718,
        "text": "rails"
      },
      {
        "confidence": 0.999851,
        "end": 7299,
        "speaker": "A",
        "start": 7138,
        "text": "was"
      },
      {
        "confidence": 0.98378325,
        "end": 7719,
        "speaker": "A",
        "start": 7299,
        "text": "soothing."
      }
    ]
  },
  "state": "Completed"
}

audio_url

stringThe URL of the audio file to transcribe. Can be a publicly accessible URL or a data URI (data:audio/...;base64,...). For data URIs, the audio will be uploaded to AssemblyAI automatically. Required for pre-recorded transcription (when stream is false or not set).

websocket

booleanEnable real-time WebSocket streaming for live audio transcription. When true, a WebSocket connection is established instead of submitting a pre-recorded transcription job. Cannot be used with audio_url.

language_code

stringThe language code for the audio file (e.g., "en", "es", "fr"). Defaults to automatic language detection.

language_detection

booleanEnable automatic language detection. When enabled with speech_models, the system will automatically select the best model for the detected language.

prompt

stringA custom prompt to guide transcription style, formatting, and output characteristics. Maximum 1,500 words.

▶keyterms_prompt[]

arrayAn array of up to 1,000 words or phrases (max 6 words per phrase) to improve transcription accuracy. Cannot be used with the prompt parameter.

temperature

numberminimum: 0maximum: 1Controls randomness in model output (0.0-1.0). Lower values make output more deterministic. Default is 0.0.

speaker_labels

booleanEnable speaker diarization to identify different speakers in the audio.

speakers_expected

integerminimum: 1maximum: 9007199254740991Expected number of speakers for speaker diarization.

auto_chapters

booleanEnable automatic chapter detection.

entity_detection

booleanEnable detection of entities like names, organizations, and locations.

sentiment_analysis

booleanEnable sentiment analysis for each sentence.

auto_highlights

booleanEnable automatic extraction of key phrases and highlights.

content_safety

booleanEnable content safety detection for sensitive content.

iab_categories

booleanEnable IAB (Interactive Advertising Bureau) content taxonomy classification.

▶custom_spelling[]

arrayCustom spelling rules to replace specific words or phrases in the transcription output.

disfluencies

booleanInclude filler words like "um", "uh", etc. in the transcript.

multichannel

booleanProcess each audio channel separately for multi-channel audio files.

dual_channel

booleanProcess audio as dual-channel (stereo) for better accuracy.

webhook_url

stringformat: uriURL to receive webhook notifications when transcription is complete.

audio_start_from

integerminimum: 0maximum: 9007199254740991Timestamp (in milliseconds) to start transcription from.

audio_end_at

integerminimum: 0maximum: 9007199254740991Timestamp (in milliseconds) to end transcription at.

▶word_boost[]

arrayArray of words to boost recognition accuracy (legacy - use keyterms_prompt instead).

boost_param

stringenum: low, default, highHow much to boost the words in word_boost.

filter_profanity

booleanFilter profanity from the transcription.

redact_pii

booleanRedact personally identifiable information.

redact_pii_audio

booleanGenerate a redacted audio file with PII removed.

▶redact_pii_policies[]

arraySpecific PII policies to apply for redaction.

redact_pii_sub

stringenum: entity_name, hashStrategy for substituting redacted PII.

speech_threshold

numberminimum: 0maximum: 1Confidence threshold for speech detection.

domain

stringenum: medical-v1Domain-specific transcription mode. "medical-v1" enables medical terminology optimization.

text

stringThe transcribed text.

words

array | nullWord-level timestamps and confidence scores.

utterances

array | nullSpeaker-separated utterances (when speaker_labels is enabled).

confidence

number | nullOverall confidence score for the transcription.

language_code

string | nullDetected or specified language code.

language_confidence

number | nullConfidence score for language detection.

API Schemas (Raw)

Input

Output