Skip to content
Cloudflare Docs
OpenAI logo

whisper-tiny-en Beta

Automatic Speech RecognitionOpenAI
@cf/openai/whisper-tiny-en

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalize to many datasets and domains without the need for fine-tuning. This is the English-only version of the Whisper Tiny model which was trained on the task of speech recognition.

Features
BetaYes

Usage

Workers - TypeScript

export interface Env {
AI: Ai;
}
export default {
async fetch(request, env): Promise<Response> {
const res = await fetch(
"https://github.com/Azure-Samples/cognitive-services-speech-sdk/raw/master/samples/cpp/windows/console/samples/enrollment_audio_katie.wav"
);
const blob = await res.arrayBuffer();
const input = {
audio: [...new Uint8Array(blob)],
};
const response = await env.AI.run(
"@cf/openai/whisper-tiny-en",
input
);
return Response.json({ input: { audio: [] }, response });
},
} satisfies ExportedHandler<Env>;

curl

Terminal window
curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/openai/whisper-tiny-en \
-X POST \
-H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
--data-binary "@talking-llama.mp3"

Parameters

* indicates a required field

Input

  • 0 string

  • 1 object

    • audio * array

      An array of integers that represent the audio data constrained to 8-bit unsigned integer values

      • items number

        A value between 0 and 255

Output

  • text * string

    The transcription

  • word_count number

  • words array

    • items object

      • word string

      • start number

        The second this word begins in the recording

      • end number

        The ending second when the word completes

  • vtt string

API Schemas

The following schemas are based on JSON Schema

{
"oneOf": [
{
"type": "string",
"format": "binary"
},
{
"type": "object",
"properties": {
"audio": {
"type": "array",
"description": "An array of integers that represent the audio data constrained to 8-bit unsigned integer values",
"items": {
"type": "number",
"description": "A value between 0 and 255"
}
}
},
"required": [
"audio"
]
}
]
}