GPT-4o Transcribe
Automatic Speech Recognition • OpenAI • ProxiedA speech-to-text model that uses GPT-4o to transcribe audio with improved word error rate and better language recognition compared to original Whisper models.
| Model Info | |
|---|---|
| Terms and License | link ↗ |
| More information | link ↗ |
| Pricing | View pricing in the Cloudflare dashboard ↗ |
Usage
const response = await env.AI.run( 'openai/gpt-4o-transcribe', { file: 'data:audio/wav;base64,<...>' },)console.log(response)curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run \ --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \ --header "Content-Type: application/json" \ --data '{ "model": "openai/gpt-4o-transcribe", "input": { "file": "data:audio/wav;base64,<...>" }}'Hello
{ "gatewayMetadata": { "keySource": "Unified" }, "result": { "text": "Hello" }, "state": "Completed"}Examples
With Language Hint — Transcribe with a language hint for better accuracy
const response = await env.AI.run( 'openai/gpt-4o-transcribe', { file: 'data:audio/wav;base64,<...>', language: 'en' },)console.log(response)curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run \ --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \ --header "Content-Type: application/json" \ --data '{ "model": "openai/gpt-4o-transcribe", "input": { "file": "data:audio/wav;base64,<...>", "language": "en" }}'Hello
{ "gatewayMetadata": { "keySource": "Unified" }, "result": { "text": "Hello" }, "state": "Completed"}Guided Transcription — Use a prompt to guide transcription style and context
const response = await env.AI.run( 'openai/gpt-4o-transcribe', { file: 'data:audio/wav;base64,<...>', language: 'en', prompt: 'This is a technical discussion about Kubernetes and cloud-native architecture.', },)console.log(response)curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run \ --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \ --header "Content-Type: application/json" \ --data '{ "model": "openai/gpt-4o-transcribe", "input": { "file": "data:audio/wav;base64,<...>", "language": "en", "prompt": "This is a technical discussion about Kubernetes and cloud-native architecture." }}'This is a technical discussion about Kubernetes and cloud-native architecture.
{ "gatewayMetadata": { "keySource": "Unified" }, "result": { "text": "This is a technical discussion about Kubernetes and cloud-native architecture." }, "state": "Completed"}High Temperature — Higher temperature for more varied transcription
const response = await env.AI.run( 'openai/gpt-4o-transcribe', { file: 'data:audio/wav;base64,<...>', temperature: 0.5 },)console.log(response)curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run \ --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \ --header "Content-Type: application/json" \ --data '{ "model": "openai/gpt-4o-transcribe", "input": { "file": "data:audio/wav;base64,<...>", "temperature": 0.5 }}'Hello, world!
{ "gatewayMetadata": { "keySource": "Unified" }, "result": { "text": "Hello, world!" }, "state": "Completed"}Parameters
file
stringrequiredThe audio file as a data URI (data:audio/...;base64,...) or HTTPS URL. Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm.language
stringThe language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.prompt
stringAn optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.temperature
numbermaximum: 1minimum: 0The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Defaults to 0 if omitted.text
stringThe transcribed text.