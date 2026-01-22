Build a Durable AI Agent
In this guide, you will build an AI agent that researches GitHub repositories. Give it a task like "Compare open-source LLM projects" and it will:
- Search GitHub for relevant repositories
- Fetch details about each one (stars, forks, activity)
- Analyze and compare them
- Return a recommendation
Each LLM call and tool call becomes a step — a self-contained, individually retriable unit of work. If any step fails, Workflows retries it automatically. If the entire Workflow crashes mid-task, it resumes from the last successful step.
|Challenge
|Solution with Workflows
|Long-running agent loops
|Durable execution that survives any interruption
|Unreliable LLM and API calls
|Automatic retry with independent checkpoints
|Waiting for human approval
waitForEvent() pauses for hours or days
|Polling for job completion
step.sleep() between checks without consuming resources
This guide uses the Anthropic SDK, but the same patterns apply to any LLM SDK (OpenAI, Google AI, Mistral, etc.).
If you want to skip the steps and pull down the complete agent, utilizing AI Gateway, run the following command:
Use this option if you are familiar with Cloudflare Workflows or want to explore the code first.
Follow the steps below to learn how to build a durable AI agent from scratch.
- Sign up for a Cloudflare account ↗.
- Install
Node.js↗.
Node.js version manager
Use a Node version manager like Volta ↗ or nvm ↗ to avoid permission issues and change Node.js versions. Wrangler, discussed later in this guide, requires a Node version of
16.17.0 or later.
You will also need an Anthropic API key ↗ for LLM calls. New accounts include free credits.
-
Create a new Worker project by running the following command:
For setup, select the following options:
- For What would you like to start with?, choose
Hello World example.
- For Which template would you like to use?, choose
Worker only.
- For Which language do you want to use?, choose
TypeScript.
- For Do you want to use git for version control?, choose
Yes.
- For Do you want to deploy your application?, choose
No(we will be making some changes before deploying).
- For What would you like to start with?, choose
-
Move into your project:
-
Install dependencies:
Tools are functions the LLM can call to interact with external systems. You define the schema (what inputs the tool accepts) and the implementation (what it does). The LLM decides when to use each tool based on the task.
-
Create
src/tools.tswith two complementary tools:
These tools complement each other:
search_repos finds repositories, and
get_repo fetches details about specific ones.
A Workflow extends
WorkflowEntrypoint and implements a
run method.
- The
stepobject provides methods to define durable steps.
step.do(name, callback)executes code and persists the result. If the Workflow is interrupted, it resumes from the last successful step.
For a gentler introduction, refer to Build your first Workflow.
The agent loop sends messages to the LLM, executes any tool calls, and repeats until the task is complete. Each LLM call and tool execution is wrapped in
step.do() for durability.
-
Create
src/workflow.ts:
Why separate steps for LLM and tools?
Each
step.do() creates a checkpoint. If your Workflow crashes or the Worker restarts:
- After LLM step: The response is persisted. On resume, it skips the LLM call and moves to tool execution.
- After tool step: The result is persisted. If a later tool fails, earlier tools do not re-run.
This is especially important for:
- LLM calls: Expensive and slow, should not repeat unnecessarily
- External APIs: May have rate limits or side effects
- Idempotency: Some tools (like sending emails) should not run twice
-
Open
wrangler.jsoncand add the
workflowconfiguration:
The
class_namemust match your exported class, and
bindingis the variable name you use to access the Workflow in your code (like
env.AGENT_WORKFLOW).
-
Generate types for your bindings:
This creates a
worker-configuration.d.tsfile with the
Envtype that includes your
AGENT_WORKFLOWbinding.
The Worker exposes an HTTP API to start new agent instances and check their status. Each instance runs independently and can be polled for results.
-
Replace
src/index.ts:
-
Create a
.envfile for local development:
-
Start the dev server:
-
Start an agent that searches and compares repositories:
-
Check progress (may take a few seconds to complete):
The agent will search for repositories, fetch details, and return a comparison.
-
Deploy the Worker:
-
Add your API key as a secret:
-
Start an agent on your deployed Worker:
-
Inspect agent runs with the CLI:
This shows every step the agent took, including LLM calls, tool executions, timing, and any retries.
You can also view this in the Cloudflare dashboard under agent-workflow.Go to Workflows
The polling approach works well for simple use cases, but for real-time UIs you can combine Workflows with the Agents SDK. The pattern is as follows:
- Agent handles WebSocket connections and client state
- Workflow runs the durable agent loop and pushes updates to the Agent
- Agent broadcasts state changes to all connected clients
In your Workflow, push updates to the Agent:
In your Agent, receive updates and broadcast to clients:
Clients use
useAgent() to subscribe to state changes:
This gives you durable execution (Workflows) with real-time UI updates (Agents SDK). For a complete example with a React UI, refer to the durable-ai-agent template ↗.
