Using AI Models
Agents can call AI models from any provider. Workers AI is built in and requires no API keys. You can also use OpenAI ↗, Anthropic ↗, Google Gemini ↗, or any service that exposes an OpenAI-compatible API.
The AI SDK ↗ provides a unified interface across all of these providers, and is what AIChatAgent and the starter template use under the hood. You can also use the model routing features in AI Gateway to route across providers, eval responses, and manage rate limits.
You can call models from any method within an Agent, including from HTTP requests using the onRequest handler, when a scheduled task runs, when handling a WebSocket message in the onMessage handler, or from any of your own methods.
Agents can call AI models on their own — autonomously — and can handle long-running responses that take minutes (or longer) to respond in full. If a client disconnects mid-stream, the Agent keeps running and can catch the client up when it reconnects.
Modern reasoning models can take some time to both generate a response and stream the response back to the client. Instead of buffering the entire response, you can stream it back over WebSockets.
import { Agent } from "agents";import { streamText } from "ai";import { createWorkersAI } from "workers-ai-provider";
export class MyAgent extends Agent { async onConnect(connection, ctx) { // }
async onMessage(connection, message) { let msg = JSON.parse(message); await this.queryReasoningModel(connection, msg.prompt); }
async queryReasoningModel(connection, userPrompt) { try { const workersai = createWorkersAI({ binding: this.env.AI }); const result = streamText({ model: workersai("@cf/zai-org/glm-4.7-flash"), prompt: userPrompt, });
for await (const chunk of result.textStream) { if (chunk) { connection.send(JSON.stringify({ type: "chunk", content: chunk })); } }
connection.send(JSON.stringify({ type: "done" })); } catch (error) { connection.send(JSON.stringify({ type: "error", error: error })); } }}import { Agent } from "agents";import { streamText } from "ai";import { createWorkersAI } from "workers-ai-provider";
interface Env { AI: Ai;}
export class MyAgent extends Agent<Env> { async onConnect(connection: Connection, ctx: ConnectionContext) { // }
async onMessage(connection: Connection, message: WSMessage) { let msg = JSON.parse(message); await this.queryReasoningModel(connection, msg.prompt); }
async queryReasoningModel(connection: Connection, userPrompt: string) { try { const workersai = createWorkersAI({ binding: this.env.AI }); const result = streamText({ model: workersai("@cf/zai-org/glm-4.7-flash"), prompt: userPrompt, });
for await (const chunk of result.textStream) { if (chunk) { connection.send(JSON.stringify({ type: "chunk", content: chunk })); } }
connection.send(JSON.stringify({ type: "done" })); } catch (error) { connection.send(JSON.stringify({ type: "error", error: error })); } }}You can also persist AI model responses back to Agent state using this.setState. If a user disconnects, read the message history back and send it to the user when they reconnect.
You can use any of the models available in Workers AI within your Agent by configuring a binding. No API keys are required.
Workers AI supports streaming responses by setting stream: true. Use streaming to avoid buffering and delaying responses, especially for larger models or reasoning models.
import { Agent } from "agents";
export class MyAgent extends Agent { async onRequest(request) { const stream = await this.env.AI.run( "@cf/deepseek-ai/deepseek-r1-distill-qwen-32b", { prompt: "Build me a Cloudflare Worker that returns JSON.", stream: true, }, );
return new Response(stream, { headers: { "content-type": "text/event-stream" }, }); }}import { Agent } from "agents";
interface Env { AI: Ai;}
export class MyAgent extends Agent<Env> { async onRequest(request: Request) { const stream = await this.env.AI.run( "@cf/deepseek-ai/deepseek-r1-distill-qwen-32b", { prompt: "Build me a Cloudflare Worker that returns JSON.", stream: true, }, );
return new Response(stream, { headers: { "content-type": "text/event-stream" }, }); }}Your Wrangler configuration needs an ai binding:
{ "ai": { "binding": "AI", },}[ai]binding = "AI"You can use AI Gateway directly from an Agent by specifying a gateway configuration when calling the AI binding. Model routing lets you route requests across providers based on availability, rate limits, or cost budgets.
import { Agent } from "agents";
export class MyAgent extends Agent { async onRequest(request) { const response = await this.env.AI.run( "@cf/deepseek-ai/deepseek-r1-distill-qwen-32b", { prompt: "Build me a Cloudflare Worker that returns JSON.", }, { gateway: { id: "{gateway_id}", skipCache: false, cacheTtl: 3360, }, }, );
return Response.json(response); }}import { Agent } from "agents";
interface Env { AI: Ai;}
export class MyAgent extends Agent<Env> { async onRequest(request: Request) { const response = await this.env.AI.run( "@cf/deepseek-ai/deepseek-r1-distill-qwen-32b", { prompt: "Build me a Cloudflare Worker that returns JSON.", }, { gateway: { id: "{gateway_id}", skipCache: false, cacheTtl: 3360, }, }, );
return Response.json(response); }}The ai binding in your Wrangler configuration is shared across both Workers AI and AI Gateway.
{ "ai": { "binding": "AI", },}[ai]binding = "AI"Visit the AI Gateway documentation to learn how to configure a gateway and retrieve a gateway ID.
The AI SDK ↗ provides a unified API for text generation, tool calling, structured responses, and more. It works with any provider that has an AI SDK adapter, including Workers AI via workers-ai-provider ↗.
npm i ai workers-ai-provideryarn add ai workers-ai-providerpnpm add ai workers-ai-providerimport { Agent } from "agents";import { generateText } from "ai";import { createWorkersAI } from "workers-ai-provider";
export class MyAgent extends Agent { async onRequest(request) { const workersai = createWorkersAI({ binding: this.env.AI }); const { text } = await generateText({ model: workersai("@cf/zai-org/glm-4.7-flash"), prompt: "Build me an AI agent on Cloudflare Workers", });
return Response.json({ modelResponse: text }); }}import { Agent } from "agents";import { generateText } from "ai";import { createWorkersAI } from "workers-ai-provider";
interface Env { AI: Ai;}
export class MyAgent extends Agent<Env> { async onRequest(request: Request): Promise<Response> { const workersai = createWorkersAI({ binding: this.env.AI }); const { text } = await generateText({ model: workersai("@cf/zai-org/glm-4.7-flash"), prompt: "Build me an AI agent on Cloudflare Workers", });
return Response.json({ modelResponse: text }); }}You can swap the provider to use OpenAI, Anthropic, or any other AI SDK-compatible adapter:
npm i ai @ai-sdk/openaiyarn add ai @ai-sdk/openaipnpm add ai @ai-sdk/openaiimport { Agent } from "agents";import { generateText } from "ai";import { openai } from "@ai-sdk/openai";
export class MyAgent extends Agent { async onRequest(request) { const { text } = await generateText({ model: openai("gpt-4o"), prompt: "Build me an AI agent on Cloudflare Workers", });
return Response.json({ modelResponse: text }); }}import { Agent } from "agents";import { generateText } from "ai";import { openai } from "@ai-sdk/openai";
export class MyAgent extends Agent { async onRequest(request: Request): Promise<Response> { const { text } = await generateText({ model: openai("gpt-4o"), prompt: "Build me an AI agent on Cloudflare Workers", });
return Response.json({ modelResponse: text }); }}Agents can call models across any service that supports the OpenAI API. For example, you can use the OpenAI SDK to call one of Google's Gemini models ↗ directly from your Agent.
Agents can stream responses back over HTTP using Server-Sent Events (SSE) from within an onRequest handler, or by using the native WebSocket API to stream responses back to a client.
import { Agent } from "agents";import { OpenAI } from "openai";
export class MyAgent extends Agent { async onRequest(request) { const client = new OpenAI({ apiKey: this.env.GEMINI_API_KEY, baseURL: "https://generativelanguage.googleapis.com/v1beta/openai/", });
let { readable, writable } = new TransformStream(); let writer = writable.getWriter(); const textEncoder = new TextEncoder();
this.ctx.waitUntil( (async () => { const stream = await client.chat.completions.create({ model: "gemini-2.0-flash", messages: [ { role: "user", content: "Write me a Cloudflare Worker." }, ], stream: true, });
for await (const part of stream) { writer.write( textEncoder.encode(part.choices[0]?.delta?.content || ""), ); } writer.close(); })(), );
return new Response(readable); }}import { Agent } from "agents";import { OpenAI } from "openai";
export class MyAgent extends Agent { async onRequest(request: Request): Promise<Response> { const client = new OpenAI({ apiKey: this.env.GEMINI_API_KEY, baseURL: "https://generativelanguage.googleapis.com/v1beta/openai/", });
let { readable, writable } = new TransformStream(); let writer = writable.getWriter(); const textEncoder = new TextEncoder();
this.ctx.waitUntil( (async () => { const stream = await client.chat.completions.create({ model: "gemini-2.0-flash", messages: [ { role: "user", content: "Write me a Cloudflare Worker." }, ], stream: true, });
for await (const part of stream) { writer.write( textEncoder.encode(part.choices[0]?.delta?.content || ""), ); } writer.close(); })(), );
return new Response(readable); }}