Skip to content
OpenAI logo

GPT-4o Transcribe

Automatic Speech RecognitionOpenAIProxied

A speech-to-text model that uses GPT-4o to transcribe audio with improved word error rate and better language recognition compared to original Whisper models.

Model Info
Terms and Licenselink
More informationlink
PricingView pricing in the Cloudflare dashboard

Usage

TypeScript
const response = await env.AI.run(
'openai/gpt-4o-transcribe',
{ file: 'data:audio/wav;base64,<...>' },
)
console.log(response)
Hello

Examples

With Language Hint — Transcribe with a language hint for better accuracy
TypeScript
const response = await env.AI.run(
'openai/gpt-4o-transcribe',
{ file: 'data:audio/wav;base64,<...>', language: 'en' },
)
console.log(response)
Hello
Guided Transcription — Use a prompt to guide transcription style and context
TypeScript
const response = await env.AI.run(
'openai/gpt-4o-transcribe',
{
file: 'data:audio/wav;base64,<...>',
language: 'en',
prompt: 'This is a technical discussion about Kubernetes and cloud-native architecture.',
},
)
console.log(response)
This is a technical discussion about Kubernetes and cloud-native architecture.
High Temperature — Higher temperature for more varied transcription
TypeScript
const response = await env.AI.run(
'openai/gpt-4o-transcribe',
{ file: 'data:audio/wav;base64,<...>', temperature: 0.5 },
)
console.log(response)
Hello, world!

Parameters

file
stringrequiredThe audio file as a data URI (data:audio/...;base64,...) or HTTPS URL. Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm.
language
stringThe language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.
prompt
stringAn optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.
temperature
numbermaximum: 1minimum: 0The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Defaults to 0 if omitted.

API Schemas (Raw)

Input
Output