whisper

Automatic Speech Recognition • OpenAI

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

Model Info
More information	link ↗
Unit Pricing	$0.00045 per audio minute

export interface Env {
  AI: Ai;
}

export default {
  async fetch(request, env): Promise<Response> {
    const res = await fetch(
      "https://github.com/Azure-Samples/cognitive-services-speech-sdk/raw/master/samples/cpp/windows/console/samples/enrollment_audio_katie.wav"
    );
    const blob = await res.arrayBuffer();

    const input = {
      audio: [...new Uint8Array(blob)],
    };

    const response = await env.AI.run(
      "@cf/openai/whisper",
      input
    );

    return Response.json({ input: { audio: [] }, response });
  },
} satisfies ExportedHandler<Env>;

curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/openai/whisper  \
  -X POST  \
  -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN"  \
  --data-binary "@talking-llama.mp3"

Parameters

Input
Output

Option 1

stringformat: binary

▶Option 2{}

object

text

stringThe transcription

word_count

number

▶words[]

array

vtt

string

API Schemas (Raw)

Input

Output