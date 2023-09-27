Cloudflare Docs
Workers AI
Speech to text

Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web.

​​ Examples


import { Ai } from "@cloudflare/ai";



export interface Env {
	AI: any;

}



export default {
	async fetch(request: Request, env: Env) {
    const res: any = await fetch("https://github.com/Azure-Samples/cognitive-services-speech-sdk/raw/master/samples/cpp/windows/console/samples/enrollment_audio_katie.wav");
    const blob = await res.arrayBuffer();


    const ai = new Ai(env.AI);
    const input = {
      audio: [...new Uint8Array(blob)],
    };


    const response = await ai.run("@cf/openai/whisper", input);


    return new Response(JSON.stringify({ input: { audio: [] }, response }));
	}

};


$ curl https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/@cf/openai/whisper \
    -X POST \
    -H "Authorization: Bearer {API_TOKEN}" \
    --data-binary @talking-llama.mp3

​​ API schema

The following schema is based on JSON Schema


{
    "task": "speech-recognition",
    "tsClass": "AiSpeechRecognition",
    "jsonSchema": {
        "input": {
            "type": "object",
            "properties": {
                "audio": {
                    "type": "string",
                    "format": "binary"
                }
            },
            "required": ["audio"]
        },
        "output": {
            "type": "object",
            "properties": {
                "text": {
                    "type": "string"
                }
            }
        }
    }

}