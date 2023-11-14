Cloudflare Docs
Workers AI
Workers AI
Automatic Speech Recognition

Automatic speech recognition (ASR) models convert a speech signal, typically an audio input, to text.

  • Task type: speech-recognition
  • TypeScript class: AiSpeechRecognition

​​ Available Embedding Models

List of available models in for this task type:

Model IDDescription
@cf/openai/whisperAutomatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data
More information

​​ Examples


import { Ai } from "@cloudflare/ai";



export interface Env {
	AI: any;

}



export default {
  async fetch(request: Request, env: Env) {
    const res: any = await fetch("https://github.com/Azure-Samples/cognitive-services-speech-sdk/raw/master/samples/cpp/windows/console/samples/enrollment_audio_katie.wav");
    const blob = await res.arrayBuffer();


    const ai = new Ai(env.AI);
    const input = {
    audio: [...new Uint8Array(blob)],
    };


    const response = await ai.run("@cf/openai/whisper", input);


    return Response.json({ input: { audio: [] }, response });
  }

}


$ curl https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/openai/whisper \
  -X POST \
  -H "Authorization: Bearer {API_TOKEN}" \
  --data-binary @talking-llama.mp3

​​ API schema

The following schema is based on JSON Schema

​​ Input


{
  "oneOf": [
    {
      "type": "string",
      "format": "binary"
    },
    {
      "type": "object",
      "properties": {
        "audio": {
          "type": "array",
          "items": {
            "type": "number"
          }
        }
      }
    }
  ]

}

TypeScript class: AiSpeechRecognitionInput

​​ Output


{
  "type": "object",
  "contentType": "application/json",
  "properties": {
    "text": {
      "type": "string"
    }
  }

}

TypeScript class: AiSpeechRecognitionOutput