GPT-4o Transcribe
Automatic Speech Recognition • OpenAI • ProxiedA speech-to-text model that uses GPT-4o to transcribe audio with improved word error rate and better language recognition compared to original Whisper models.
| Model Info | |
|---|---|
| Terms and License | link ↗ |
| More information | link ↗ |
Usage
const response = await env.AI.run( 'openai/gpt-4o-transcribe', { file: 'https://cdn.openai.com/API/docs/audio/alloy.wav', }, { gateway: { id: 'default' }, })console.log(response)Examples
With Language Hint — Transcribe with a language hint for better accuracy
const response = await env.AI.run( 'openai/gpt-4o-transcribe', { file: 'https://cdn.openai.com/API/docs/audio/shimmer.wav', language: 'en', }, { gateway: { id: 'default' }, })console.log(response)Guided Transcription — Use a prompt to guide transcription style and context
const response = await env.AI.run( 'openai/gpt-4o-transcribe', { file: 'https://cdn.openai.com/API/docs/audio/fable.wav', language: 'en', prompt: 'This is a technical discussion about Kubernetes and cloud-native architecture.', }, { gateway: { id: 'default' }, })console.log(response)Input / Output JSON
{ "file": "https://cdn.openai.com/API/docs/audio/fable.wav", "language": "en", "prompt": "This is a technical discussion about Kubernetes and cloud-native architecture."}{ "text": "The library is a quiet and peaceful place where people go to read, study, and learn. The shelves are filled with books on every subject imaginable."}High Temperature — Higher temperature for more varied transcription
const response = await env.AI.run( 'openai/gpt-4o-transcribe', { file: 'https://cdn.openai.com/API/docs/audio/echo.wav', temperature: 0.5, }, { gateway: { id: 'default' }, })console.log(response)Parameters
stringrequiredThe audio file as a URL or data URI (data:audio/...;base64,...). Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm.stringThe language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.stringAn optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.numberrequireddefault: 0minimum: 0maximum: 1The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.stringThe transcribed text.API Schemas
{ "$schema": "https://json-schema.org/draft/2020-12/schema", "type": "object", "properties": { "file": { "description": "The audio file as a URL or data URI (data:audio/...;base64,...). Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm.", "type": "string" }, "language": { "description": "The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.", "type": "string" }, "prompt": { "description": "An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.", "type": "string" }, "temperature": { "description": "The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.", "default": 0, "type": "number", "minimum": 0, "maximum": 1 } }, "required": [ "file", "temperature" ], "additionalProperties": false}{ "$schema": "https://json-schema.org/draft/2020-12/schema", "type": "object", "properties": { "text": { "description": "The transcribed text.", "type": "string" } }, "required": [ "text" ], "additionalProperties": false}