Build agents on Cloudflare
Build AI-powered agents that can autonomously perform tasks, persist state, browse the web, and communicate back to users in real-time over any channel.
- Serverless inference that scales up and down: run AI directly on Cloudflre, without worrying about pre-provisioning VMs at peak, and worrying about utilization. Call the latest open-source models on Workers AI, and pay just for what you use.
- Non I/O bound pricing: don't pay for long-running processes when your code is not executing. Cloudflare Workers is designed to scale down and only charge you for CPU time ↗, as opposed to wall-clock time.
- Designed for durable execution: Durable Objects and Workflows are built for a programming model that enables guaranteed execution for async tasks like long-running deep thinking LLM calls, human-in-the-loop, or unreliable API calls.
- Scalable, and reliable, without compromising on performance: by running on Cloudflare's network, agents can execute tasks close to the user without introducing latency for real-time experiences.
Build agents that can execute complex tasks, progressively save state, and call out to any third party API they need to using Workflows. Send emails or text messages, browse the web, process and summarize documents, and/or query your database.
npm create cloudflare@latest workflows-starter -- --template "cloudflare/workflows-starter"cd workflows-starternpm i resend
import { WorkflowEntrypoint, WorkflowStep, WorkflowEvent } from 'cloudflare:workers';import { Resend } from 'resend';
type Env = { MY_WORKFLOW: Workflow; RESEND_API_KEY: string;};
type Params = { email: string; metadata: Record<string, string>;};
export class MyWorkflow extends WorkflowEntrypoint<Env, Params> { async run(event: WorkflowEvent<Params>, step: WorkflowStep) {
const files = await step.do('my first step', async () => { // Fetch a list of files from $SOME_SERVICE return { files: [ 'doc_7392_rev3.pdf', 'report_x29_final.pdf', 'memo_2024_05_12.pdf', 'file_089_update.pdf', 'proj_alpha_v2.pdf', 'data_analysis_q2.pdf', 'notes_meeting_52.pdf', 'summary_fy24_draft.pdf', ],79 collapsed lines
}; });
const summaries = await step.do('summarize text', async () => { const results = {}; for (const filename of files.files) { const fileContent = await this.env.MY_BUCKET.get(filename); if (!fileContent) continue;
const text = await fileContent.text(); const summary = await this.env.WORKERS_AI.run('@cf/meta/llama-3.2-3b-instruct', { messages: [{ role: 'user', content: `Please summarize the following text concisely: ${text}` }] }); results[filename] = summary.response; } return results; });
await step.sleep('wait on something', '1 minute');
let summaryKey = await step.do( 'store summaries in R2', async () => { const summaryKey = `summaries-${Date.now()}.json`; await this.env.MY_BUCKET.put(summaryKey, JSON.stringify(summaries)); return summaryKey; }, );
await step.do( 'email summaries', { retries: { limit: 3, delay: '5 second', backoff: 'exponential', } }, async () => { const summaryText = Object.entries(summaries) .map(([filename, summary]) => `${filename}:\n${summary}\n\n`) .join('');
const resend = new Resend(this.env.RESEND_API_KEY);
await resend.emails.send({ from: 'notifications@yourdomain.com', to: event.payload.email, subject: 'Your Document Summaries', text: summaryText, }); } );
return summaryKey; }}
export default { async fetch(req: Request, env: Env): Promise<Response> { let id = new URL(req.url).searchParams.get('instanceId');
if (id) { let instance = await env.MY_WORKFLOW.get(id); return Response.json({ status: await instance.status(), }); }
let instance = await env.MY_WORKFLOW.create(); return Response.json({ id: instance.id, details: await instance.status(), }); },};
Use Durable Objects — stateful, serverless, long-running micro-servers — to ship interactive, real-time agents that can connect to the latest AI models.
Stream responses over WebSockets, and don't time out while waiting for the latest chain-of-thought models — including o1
or deepseek-r1
— to respond.
npm i openai
import { DurableObject } from "cloudflare:workers";
export interface Env { DURABLE_AGENT: DurableObjectNamespace<DurableAgent>; OPENAI_API_KEY: string;}
export default { async fetch(request: Request, env: Env, ctx: ExecutionContext) { if (request.url.endsWith("/agent/chat")) { const upgradeHeader = request.headers.get("Upgrade"); if (!upgradeHeader || upgradeHeader !== "websocket") { return Response.json( { error: "Durable Object expected Upgrade: websocket" }, { status: 426 } ); }
const url = new URL(request.url); const agentId = url.searchParams.get("id") || (await crypto.randomUUID());
let id = env.DURABLE_AGENT.idFromName(agentId); let agent = env.DURABLE_AGENT.get(id);
return agent.fetch(request); }
return Response.json({ message: "Bad Request" }, { status: 400 }); },54 collapsed lines
};
export class DurableAgent extends DurableObject { constructor(private state: DurableObjectState, private env: Env) { super(); }
async fetch(request: Request): Promise<Response> { const webSocketPair = new WebSocketPair(); const [client, server] = Object.values(webSocketPair);
this.ctx.acceptWebSocket(server);
return new Response(null, { status: 101, webSocket: client, }); }
async webSocketMessage(ws: WebSocket, message: ArrayBuffer | string) { try { const openai = new OpenAI({ apiKey: this.env.OPENAI_API_KEY, timeout: 10 * 60 * 1000, // Don't let it think TOO long. });
// Stream the response to immediately start sending chunks to the client, // rather than buffering the entire response and making the user wait const stream = await openai.chat.completions.create({ model: "o1", messages: [{ role: "user", content: message.toString() }], stream: true, });
for await (const chunk of stream) { const content = chunk.choices[0]?.delta?.content; if (content) { ws.send(content); } } } catch (error) { ws.send( JSON.stringify({ error: "OpenAI request failed", message: error.message, }) ); } }
async webSocketClose(ws: WebSocket, code: number, reason: string, wasClean: boolean) { ws.close(code, "Durable Object is closing WebSocket"); }}
Use the Browser Rendering API to allow your agents to search the web, take screenshots, and directly interact with websites.
npm install @cloudflare/puppeteer --save-dev
import puppeteer from "@cloudflare/puppeteer";
interface Env { MYBROWSER: Fetcher; BROWSER_KV_DEMO: KVNamespace;}
export default { async fetch(request: Request, env: Env): Promise<Response> { const { searchParams } = new URL(request.url); const url = searchParams.get("url");
if (!url) { return new Response("Please add an ?url=https://example.com/ parameter"); }
const normalizedUrl = new URL(url).toString(); let img = await env.BROWSER_KV_DEMO.get(normalizedUrl, { type: "arrayBuffer" });
if (img === null) { const browser = await puppeteer.launch(env.MYBROWSER); const page = await browser.newPage(); await page.goto(normalizedUrl); img = await page.screenshot() as Buffer;
await env.BROWSER_KV_DEMO.put(normalizedUrl, img, { expirationTtl: 60 * 60 * 24, // 24 hours });
10 collapsed lines
await browser.close(); }
return new Response(img, { headers: { "content-type": "image/jpeg", }, }); },};
Use AI Gateway to cache, log, retry and run evals (evaluations) for your agents, no matter where they're deployed.
from anthropic import Anthropic
anthropic = Anthropic( api_key="<your_anthropic_api_key>", # Route, cache, fallback and log prompt-response pairs between your app # and your AI model provider. base_url="https://gateway.ai.cloudflare.com/v1/${accountId}/${gatewayId}/anthropic")
message = anthropic.messages.create( model="claude-3-opus-20240229", max_tokens=1000, messages=[{ "role": "user", "content": "Generate a Cloudflare Worker that returns a simple JSON payload based on a query param", }])
print(message.content)
Build agents using your favorite AI frameworks, and deploy it directly to Cloudflare Workers.
Use LangChain ↗ to build Retrieval-Augmented Generation (RAG) applications using Workers AI and Vectorize.
Give your agents more context and the ability to search across content, reply to user queries, and expand their domain knowledge.
npm i @langchain/cloudflare hono
import { CloudflareVectorizeStore, CloudflareWorkersAIEmbeddings} from "@langchain/cloudflare";import { VectorizeIndex } from "@cloudflare/workers-types";import { Ai } from "@cloudflare/ai";import { Hono } from "hono";
export interface Env { VECTORIZE_INDEX: VectorizeIndex; AI: Ai;}
const app = new Hono<{ Bindings: Env }>();
app.get("/", async (c) => { const embeddings = new CloudflareWorkersAIEmbeddings({ binding: c.env.AI, model: "@cf/baai/bge-small-en-v1.5", });
const store = new CloudflareVectorizeStore(embeddings, { index: c.env.VECTORIZE_INDEX, });
const results = await store.similaritySearch("hello", 5); return c.json(results);});
38 collapsed lines
app.post("/load", async (c) => { const embeddings = new CloudflareWorkersAIEmbeddings({ binding: c.env.AI, model: "@cf/baai/bge-small-en-v1.5", });
const store = new CloudflareVectorizeStore(embeddings, { index: c.env.VECTORIZE_INDEX, });
const documents = [ { pageContent: "hello", metadata: {} }, { pageContent: "world", metadata: {} }, { pageContent: "hi", metadata: {} } ];
await store.addDocuments(documents, { ids: ["id1", "id2", "id3"] });
return c.json({ success: true });});
app.delete("/clear", async (c) => { const embeddings = new CloudflareWorkersAIEmbeddings({ binding: c.env.AI, model: "@cf/baai/bge-small-en-v1.5", });
const store = new CloudflareVectorizeStore(embeddings, { index: c.env.VECTORIZE_INDEX, });
await store.delete({ ids: ["id1", "id2", "id3"] }); return c.json({ success: true });});
export default app;
Ship faster with the AI SDK ↗: make it easier to generate text, tool call and/or get structured output from your AI models (and then deploy it Workers.
npm i ai workers-ai-provider
import { createWorkersAI } from 'workers-ai-provider';import { streamText } from 'ai';
type Env = { AI: Ai;};
export default { async fetch(_: Request, env: Env) { const workersai = createWorkersAI({ binding: env.AI }); const result = streamText({ model: workersai('@cf/meta/llama-3.2-3b-instruct'), prompt: 'Write short essay on why you like Cloudflare Durable Objects.', });
return result.toTextStreamResponse({ headers: { 'Content-Type': 'text/x-unknown', 'content-encoding': 'identity', 'transfer-encoding': 'chunked', }, }); },};
Use any model provider with OpenAI compatible endpoints, including ChatGPT ↗, DeepSeek ↗ and Workers AI, directly from Cloudflare Workers.
npm i openai
import OpenAI from "openai";
export interface Env { OPENAI_API_KEY: string;}
export default { async fetch(request: Request, env: Env) { const url = new URL(request.url); const prompt = url.searchParams.get('prompt') || "Make some robot noises";
const openai = new OpenAI({ apiKey: env.OPENAI_API_KEY });
const chatCompletion = await openai.chat.completions.create({ messages: [{ role: "user", content: prompt }], model: "gpt-3.5-turbo", });
const embeddings = await openai.embeddings.create({ model: "text-embedding-ada-002", input: "Cloudflare Agents documentation", });
return new Response(JSON.stringify({ chatCompletion, embeddings })); }}
Observe and control your AI applications with caching, rate limiting, request retries, model fallback, and more.
Build full-stack AI applications with Vectorize, Cloudflare’s vector database. Adding Vectorize enables you to perform tasks such as semantic search, recommendations, anomaly detection or can be used to provide context and memory to an LLM.
Run machine learning models, powered by serverless GPUs, on Cloudflare's global network.
Build real-time serverless video, audio and data applications with WebRTC running on Cloudflare's network.
Build stateful agents that guarantee executions, including automatic retries, persistent state that runs for minutes, hours, days, or weeks.