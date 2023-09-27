Speech to text

Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web.

ID: @cf/openai/whisper

Name: Automatic speech recognition (ASR) system from OpenAI

Task: speech-recognition

License type: MIT

import { Ai } from "@cloudflare/ai" ; export interface Env { AI : any ; } export default { async fetch ( request : Request , env : Env ) { const res : any = await fetch ( "https://github.com/Azure-Samples/cognitive-services-speech-sdk/raw/master/samples/cpp/windows/console/samples/enrollment_audio_katie.wav" ) ; const blob = await res . arrayBuffer ( ) ; const ai = new Ai ( env . AI ) ; const input = { audio : [ ... new Uint8Array ( blob ) ] , } ; const response = await ai . run ( "@cf/openai/whisper" , input ) ; return new Response ( JSON . stringify ( { input : { audio : [ ] } , response } ) ) ; } } ;

$ curl https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/@cf/openai/whisper \ -X POST \ -H "Authorization: Bearer {API_TOKEN}" \ --data-binary @talking-llama.mp3

​​ API schema

​​ API schema

The following schema is based on JSON Schema