GPT-4o Transcribe

Automatic Speech Recognition • OpenAI • Proxied

A speech-to-text model that uses GPT-4o to transcribe audio with improved word error rate and better language recognition compared to original Whisper models.

Model Info
Terms and License	link ↗
More information	link ↗
Pricing	View pricing in the Cloudflare dashboard ↗

const response = await env.AI.run(
  'openai/gpt-4o-transcribe',
  { file: 'data:audio/wav;base64,<...>' },
)
console.log(response)

curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run \
  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "model": "openai/gpt-4o-transcribe",
  "input": {
    "file": "data:audio/wav;base64,<...>"
  }
}'

Output
Raw response

Hello

{
  "gatewayMetadata": {
    "keySource": "Unified"
  },
  "result": {
    "text": "Hello"
  },
  "state": "Completed"
}

Examples

With Language Hint — Transcribe with a language hint for better accuracy

TypeScript
cURL

const response = await env.AI.run(
  'openai/gpt-4o-transcribe',
  { file: 'data:audio/wav;base64,<...>', language: 'en' },
)
console.log(response)

curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run \
  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "model": "openai/gpt-4o-transcribe",
  "input": {
    "file": "data:audio/wav;base64,<...>",
    "language": "en"
  }
}'

Output
Raw response

Hello

{
  "gatewayMetadata": {
    "keySource": "Unified"
  },
  "result": {
    "text": "Hello"
  },
  "state": "Completed"
}

Guided Transcription — Use a prompt to guide transcription style and context

TypeScript
cURL

const response = await env.AI.run(
  'openai/gpt-4o-transcribe',
  {
    file: 'data:audio/wav;base64,<...>',
    prompt: 'This is a technical discussion about Kubernetes and cloud-native architecture.',
    language: 'en',
  },
)
console.log(response)

curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run \
  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "model": "openai/gpt-4o-transcribe",
  "input": {
    "file": "data:audio/wav;base64,<...>",
    "prompt": "This is a technical discussion about Kubernetes and cloud-native architecture.",
    "language": "en"
  }
}'

Output
Raw response

This is a technical discussion about Kubernetes and cloud-native architecture.

{
  "gatewayMetadata": {
    "keySource": "Unified"
  },
  "result": {
    "text": "This is a technical discussion about Kubernetes and cloud-native architecture."
  },
  "state": "Completed"
}

High Temperature — Higher temperature for more varied transcription

TypeScript
cURL

const response = await env.AI.run(
  'openai/gpt-4o-transcribe',
  { file: 'data:audio/wav;base64,<...>', temperature: 0.5 },
)
console.log(response)

curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run \
  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "model": "openai/gpt-4o-transcribe",
  "input": {
    "file": "data:audio/wav;base64,<...>",
    "temperature": 0.5
  }
}'

Output
Raw response

Hello, world!

{
  "gatewayMetadata": {
    "keySource": "Unified"
  },
  "result": {
    "text": "Hello, world!"
  },
  "state": "Completed"
}

Parameters

Input
Output

file

stringrequiredThe audio file as a data URI (data:audio/...;base64,...) or HTTPS URL. Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm.

language

stringThe language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.

prompt

stringAn optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.

temperature

numbermaximum: 1minimum: 0The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Defaults to 0 if omitted.

text

stringThe transcribed text.

API Schemas (Raw)

Input

Output