Skip to content
OpenAI logo

GPT-4o Transcribe

Automatic Speech RecognitionOpenAIProxied

A speech-to-text model that uses GPT-4o to transcribe audio with improved word error rate and better language recognition compared to original Whisper models.

Model Info
Terms and Licenselink
More informationlink

Usage

TypeScript
const response = await env.AI.run(
'openai/gpt-4o-transcribe',
{
file: 'https://cdn.openai.com/API/docs/audio/alloy.wav',
},
{
gateway: { id: 'default' },
}
)
console.log(response)
Input / Output JSON
{
"file": "https://cdn.openai.com/API/docs/audio/alloy.wav"
}

Examples

With Language Hint — Transcribe with a language hint for better accuracy
TypeScript
const response = await env.AI.run(
'openai/gpt-4o-transcribe',
{
file: 'https://cdn.openai.com/API/docs/audio/shimmer.wav',
language: 'en',
},
{
gateway: { id: 'default' },
}
)
console.log(response)
Input / Output JSON
{
"file": "https://cdn.openai.com/API/docs/audio/shimmer.wav",
"language": "en"
}
Guided Transcription — Use a prompt to guide transcription style and context
TypeScript
const response = await env.AI.run(
'openai/gpt-4o-transcribe',
{
file: 'https://cdn.openai.com/API/docs/audio/fable.wav',
language: 'en',
prompt:
'This is a technical discussion about Kubernetes and cloud-native architecture.',
},
{
gateway: { id: 'default' },
}
)
console.log(response)
Input / Output JSON
{
"file": "https://cdn.openai.com/API/docs/audio/fable.wav",
"language": "en",
"prompt": "This is a technical discussion about Kubernetes and cloud-native architecture."
}
High Temperature — Higher temperature for more varied transcription
TypeScript
const response = await env.AI.run(
'openai/gpt-4o-transcribe',
{
file: 'https://cdn.openai.com/API/docs/audio/echo.wav',
temperature: 0.5,
},
{
gateway: { id: 'default' },
}
)
console.log(response)
Input / Output JSON
{
"file": "https://cdn.openai.com/API/docs/audio/echo.wav",
"temperature": 0.5
}

Parameters

file
stringrequiredThe audio file as a URL or data URI (data:audio/...;base64,...). Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm.
language
stringThe language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.
prompt
stringAn optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.
temperature
numberrequireddefault: 0minimum: 0maximum: 1The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

API Schemas

{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"file": {
"description": "The audio file as a URL or data URI (data:audio/...;base64,...). Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm.",
"type": "string"
},
"language": {
"description": "The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.",
"type": "string"
},
"prompt": {
"description": "An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.",
"type": "string"
},
"temperature": {
"description": "The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.",
"default": 0,
"type": "number",
"minimum": 0,
"maximum": 1
}
},
"required": [
"file",
"temperature"
],
"additionalProperties": false
}