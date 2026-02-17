Build AI-powered chat interfaces with
AIChatAgent and
useAgentChat. Messages are automatically persisted to SQLite, streams resume on disconnect, and tool calls work across server and client.
The
@cloudflare/ai-chat package provides two main exports:
Export Import Purpose
AIChatAgent
@cloudflare/ai-chat
Server-side agent class with message persistence and streaming
useAgentChat
@cloudflare/ai-chat/react
React hook for building chat UIs
Built on the
AI SDK and Cloudflare Durable Objects, you get: ↗
Automatic message persistence — conversations stored in SQLite, survive restarts
Resumable streaming — disconnected clients resume mid-stream without data loss
Real-time sync — messages broadcast to all connected clients via WebSocket
Tool support — server-side, client-side, and human-in-the-loop tool patterns
Data parts — attach typed JSON (citations, progress, usage) to messages alongside text
Row size protection — automatic compaction when messages approach SQLite limits
npm install @cloudflare/ai-chat agents ai
import { AIChatAgent } from "@cloudflare/ai-chat" ; import { createWorkersAI } from "workers-ai-provider" ; import { streamText , convertToModelMessages } from "ai" ; export class ChatAgent extends AIChatAgent { // Use any provicer such as workers-ai-provider, openai, anthropic, google, etc. const workersai = createWorkersAI ( { binding : this . env . AI } ) ; const result = streamText ( { model : workersai ( "@cf/zai-org/glm-4.7-flash" ) , messages : await convertToModelMessages ( this . messages ) , return result . toUIMessageStreamResponse () ; import { AIChatAgent } from "@cloudflare/ai-chat" ; import { createWorkersAI } from "workers-ai-provider" ; import { streamText , convertToModelMessages } from "ai" ; export class ChatAgent extends AIChatAgent { // Use any provicer such as workers-ai-provider, openai, anthropic, google, etc. const workersai = createWorkersAI ( { binding : this . env . AI } ) ; const result = streamText ( { model : workersai ( "@cf/zai-org/glm-4.7-flash" ) , messages : await convertToModelMessages ( this . messages ) , return result . toUIMessageStreamResponse () ;
import { useAgent } from "agents/react" ; import { useAgentChat } from "@cloudflare/ai-chat/react" ; const agent = useAgent ( { agent : "ChatAgent" } ) ; const { messages , sendMessage , status } = useAgentChat ( { agent } ) ; < strong >{ msg . role }:</ strong > { msg . parts . map (( part , i ) => part . type === "text" ? < span key = { i }>{ part . text }</ span > : null , const input = e . currentTarget . elements . namedItem ( "input" ) ; sendMessage ({ text: input . value }); < input name = "input" placeholder = "Type a message..." /> < button type = "submit" disabled = { status === "streaming" }> import { useAgent } from "agents/react" ; import { useAgentChat } from "@cloudflare/ai-chat/react" ; const agent = useAgent ( { agent : "ChatAgent" } ) ; const { messages , sendMessage , status } = useAgentChat ( { agent } ) ; < strong > { msg. role } : < / strong > { msg . parts . map (( part , i ) => part . type === " text " ? < span key = { i } > { part. text } </ span > : null , const input = e . currentTarget . elements . namedItem ( sendMessage ( { text : input . value } ) ; < input name = "input" placeholder = "Type a message..." /> < button type = "submit" disabled = { status === "streaming" } >
" ai " : { " binding " : "AI" }, " bindings " : [{ " name " : "ChatAgent" , " class_name " : "ChatAgent" }], " migrations " : [{ " tag " : "v1" , " new_sqlite_classes " : [ "ChatAgent" ] }],
The
new_sqlite_classes migration is required —
AIChatAgent uses SQLite for message persistence and stream chunk buffering.
sequenceDiagram
participant Client as Client (useAgentChat)
participant Agent as AIChatAgent
participant DB as SQLite
Client->>Agent: CF_AGENT_USE_CHAT_REQUEST (WebSocket)
Agent->>DB: Persist messages
Agent->>Agent: onChatMessage()
loop Streaming response
Agent-->>Client: CF_AGENT_USE_CHAT_RESPONSE (chunks)
Agent->>DB: Buffer chunks
end
Agent->>DB: Persist final message
Agent-->>Client: CF_AGENT_CHAT_MESSAGES (broadcast to all clients)
The client sends a message via WebSocket
AIChatAgent persists messages to SQLite and calls your
onChatMessage method
Your method returns a streaming
Response (typically from
streamText)
Chunks stream back over WebSocket in real-time
When the stream completes, the final message is persisted and broadcast to all connections
Extends
Agent from the
agents package. Manages conversation state, persistence, and streaming.
import { AIChatAgent } from "@cloudflare/ai-chat" ; export class ChatAgent extends AIChatAgent { // Access current messages // this.messages: UIMessage[] // Limit stored messages (optional) maxPersistedMessages = 200 ; async onChatMessage ( onFinish , options ) { // onFinish: optional callback for streamText (cleanup is automatic) // options.abortSignal: cancel signal // options.body: custom data from client // Return a Response (streaming or plain text) import { AIChatAgent } from "@cloudflare/ai-chat" ; export class ChatAgent extends AIChatAgent { // Access current messages // this.messages: UIMessage[] // Limit stored messages (optional) maxPersistedMessages = 200 ; async onChatMessage ( onFinish ? , options ? ) { // onFinish: optional callback for streamText (cleanup is automatic) // options.abortSignal: cancel signal // options.body: custom data from client // Return a Response (streaming or plain text)
This is the main method you override. It receives the conversation context and should return a
Response.
Streaming response (most common):
export class ChatAgent extends AIChatAgent { const workersai = createWorkersAI ( { binding : this . env . AI } ) ; const result = streamText ( { model : workersai ( "@cf/zai-org/glm-4.7-flash" ) , system : "You are a helpful assistant." , messages : await convertToModelMessages ( this . messages ) , return result . toUIMessageStreamResponse () ; export class ChatAgent extends AIChatAgent { const workersai = createWorkersAI ( { binding : this . env . AI } ) ; const result = streamText ( { model : workersai ( "@cf/zai-org/glm-4.7-flash" ) , system : "You are a helpful assistant." , messages : await convertToModelMessages ( this . messages ) , return result . toUIMessageStreamResponse () ;
Plain text response:
export class ChatAgent extends AIChatAgent { return new Response ( "Hello! I am a simple agent." , { headers : { "Content-Type" : "text/plain" },
Accessing custom body data:
export class ChatAgent extends AIChatAgent { async onChatMessage ( _onFinish , options ) { const { timezone , userId } = options ?. body ?? {}; // Use these values in your LLM call or business logic
The current conversation history, loaded from SQLite. This is an array of
UIMessage objects from the AI SDK. Messages are automatically persisted after each interaction.
Cap the number of messages stored in SQLite. When the limit is exceeded, the oldest messages are deleted. This controls storage only — it does not affect what is sent to the LLM.
export class ChatAgent extends AIChatAgent { maxPersistedMessages = 200 ; export class ChatAgent extends AIChatAgent { maxPersistedMessages = 200 ;
To control what is sent to the model, use the AI SDK's
pruneMessages():
import { streamText , convertToModelMessages , pruneMessages } from "ai" ; export class ChatAgent extends AIChatAgent { const workersai = createWorkersAI ( { binding : this . env . AI } ) ; const result = streamText ( { model : workersai ( "@cf/zai-org/glm-4.7-flash" ) , messages : pruneMessages ( { messages : await convertToModelMessages ( this . messages ) , reasoning : "before-last-message" , toolCalls : "before-last-2-messages" , return result . toUIMessageStreamResponse () ; import { streamText , convertToModelMessages , pruneMessages } from "ai" ; export class ChatAgent extends AIChatAgent { const workersai = createWorkersAI ( { binding : this . env . AI } ) ; const result = streamText ( { model : workersai ( "@cf/zai-org/glm-4.7-flash" ) , messages : pruneMessages ( { messages : await convertToModelMessages ( this . messages ) , reasoning : "before-last-message" , toolCalls : "before-last-2-messages" , return result . toUIMessageStreamResponse () ;
persistMessages and
saveMessages
For advanced cases, you can manually persist messages:
// Persist messages without triggering a new response await this . persistMessages ( messages ) ; // Persist messages AND trigger onChatMessage (e.g., programmatic messages) await this . saveMessages ( messages ) ; // Persist messages without triggering a new response await this . persistMessages ( messages ) ; // Persist messages AND trigger onChatMessage (e.g., programmatic messages) await this . saveMessages ( messages ) ;
Override
onConnect and
onClose to add custom logic. Stream resumption and message sync are handled for you:
export class ChatAgent extends AIChatAgent { async onConnect ( connection , ctx ) { // Your custom logic (e.g., logging, auth checks) console . log ( "Client connected:" , connection . id ) ; // Stream resumption and message sync are handled automatically async onClose ( connection , code , reason , wasClean ) { console . log ( "Client disconnected:" , connection . id ) ; // Connection cleanup is handled automatically export class ChatAgent extends AIChatAgent { async onConnect ( connection , ctx ) { // Your custom logic (e.g., logging, auth checks) console . log ( "Client connected:" , connection . id ) ; // Stream resumption and message sync are handled automatically async onClose ( connection , code , reason , wasClean ) { console . log ( "Client disconnected:" , connection . id ) ; // Connection cleanup is handled automatically
The
destroy() method cancels any pending chat requests and cleans up stream state. It is called automatically when the Durable Object is evicted, but you can call it manually if needed.
When a user clicks "stop" in the chat UI, the client sends a
CF_AGENT_CHAT_REQUEST_CANCEL message. The server propagates this to the
abortSignal in
options:
export class ChatAgent extends AIChatAgent { async onChatMessage ( _onFinish , options ) { const result = streamText ( { model : workersai ( "@cf/zai-org/glm-4.7-flash" ) , messages : await convertToModelMessages ( this . messages ) , abortSignal : options ?. abortSignal , // Pass through for cancellation return result . toUIMessageStreamResponse () ; export class ChatAgent extends AIChatAgent { async onChatMessage ( _onFinish , options ) { const result = streamText ( { model : workersai ( "@cf/zai-org/glm-4.7-flash" ) , messages : await convertToModelMessages ( this . messages ) , abortSignal : options ?. abortSignal , // Pass through for cancellation return result . toUIMessageStreamResponse () ;
React hook that connects to an
AIChatAgent over WebSocket. Wraps the AI SDK's
useChat with a native WebSocket transport.
import { useAgent } from "agents/react" ; import { useAgentChat } from "@cloudflare/ai-chat/react" ; const agent = useAgent ( { agent : "ChatAgent" } ) ; } = useAgentChat ( { agent } ) ; import { useAgent } from "agents/react" ; import { useAgentChat } from "@cloudflare/ai-chat/react" ; const agent = useAgent ( { agent : "ChatAgent" } ) ; } = useAgentChat ( { agent } ) ;
Option Type Default Description
agent
ReturnType<typeof useAgent>
Required Agent connection from
useAgent
onToolCall
({ toolCall, addToolOutput }) => void
— Handle client-side tool execution
autoContinueAfterToolResult
boolean
true
Auto-continue conversation after client tool results and approvals
resume
boolean
true
Enable automatic stream resumption on reconnect
body
object | () => object
— Custom data sent with every request
prepareSendMessagesRequest
(options) => { body?, headers? }
— Advanced per-request customization
getInitialMessages
(options) => Promise<UIMessage[]> or
null
— Custom initial message loader. Set to
null to skip the HTTP fetch entirely (useful when providing
messages directly)
Property Type Description
messages
UIMessage[]
Current conversation messages
sendMessage
(message) => void
Send a message
clearHistory
() => void
Clear conversation (client and server)
addToolOutput
({ toolCallId, output }) => void
Provide output for a client-side tool
addToolApprovalResponse
({ id, approved }) => void
Approve or reject a tool requiring approval
setMessages
(messages | updater) => void
Set messages directly (syncs to server)
status
string
"idle",
"submitted",
"streaming", or
"error"
AIChatAgent supports three tool patterns, all using the AI SDK's
tool() function:
Pattern Where it runs When to use Server-side Server (automatic) API calls, database queries, computations Client-side Browser (via
onToolCall)
Geolocation, clipboard, camera, local storage Approval Server (after user approval) Payments, deletions, external actions
Tools with an
execute function run automatically on the server:
import { streamText , convertToModelMessages , tool , stepCountIs } from "ai" ; export class ChatAgent extends AIChatAgent { const workersai = createWorkersAI ( { binding : this . env . AI } ) ; const result = streamText ( { model : workersai ( "@cf/zai-org/glm-4.7-flash" ) , messages : await convertToModelMessages ( this . messages ) , description : "Get weather for a city" , inputSchema : z . object ( { city : z . string () } ) , execute : async ({ city }) => { const data = await fetchWeather ( city ) ; return { temperature : data . temp , condition : data . condition }; stopWhen : stepCountIs ( 5 ) , return result . toUIMessageStreamResponse () ; import { streamText , convertToModelMessages , tool , stepCountIs } from "ai" ; export class ChatAgent extends AIChatAgent { const workersai = createWorkersAI ( { binding : this . env . AI } ) ; const result = streamText ( { model : workersai ( "@cf/zai-org/glm-4.7-flash" ) , messages : await convertToModelMessages ( this . messages ) , description : "Get weather for a city" , inputSchema : z . object ( { city : z . string () } ) , execute : async ({ city }) => { const data = await fetchWeather ( city ) ; return { temperature : data . temp , condition : data . condition }; stopWhen : stepCountIs ( 5 ) , return result . toUIMessageStreamResponse () ;
Define a tool on the server without
execute, then handle it on the client with
onToolCall. Use this for tools that need browser APIs.
Server:
description : "Get the user's location from the browser" , inputSchema : z . object ( {} ) , // No execute — the client handles it description : "Get the user's location from the browser" , inputSchema : z . object ( {} ) , // No execute — the client handles it
Client:
const { messages , sendMessage } = useAgentChat ( { onToolCall : async ({ toolCall , addToolOutput }) => { if ( toolCall . toolName === "getLocation" ) { const pos = await new Promise ( ( resolve , reject ) => navigator . geolocation . getCurrentPosition ( resolve , reject ) , toolCallId : toolCall . toolCallId , output : { lat : pos . coords . latitude , lng : pos . coords . longitude }, const { messages , sendMessage } = useAgentChat ( { onToolCall : async ({ toolCall , addToolOutput }) => { if ( toolCall . toolName === "getLocation" ) { const pos = await new Promise ( ( resolve , reject ) => navigator . geolocation . getCurrentPosition ( resolve , reject ) , toolCallId : toolCall . toolCallId , output : { lat : pos . coords . latitude , lng : pos . coords . longitude },
When the LLM invokes
getLocation, the stream pauses. The
onToolCall callback fires, your code provides the output, and the conversation continues.
Use
needsApproval for tools that require user confirmation before executing.
Server:
description : "Process a payment" , needsApproval : async ({ amount }) => amount > 100 , execute : async ({ amount , recipient }) => charge ( amount , recipient ) , description : "Process a payment" , needsApproval : async ({ amount }) => amount > 100 , execute : async ({ amount , recipient }) => charge ( amount , recipient ) ,
Client:
const { messages , addToolApprovalResponse } = useAgentChat ( { agent } ) ; // Render pending approvals from message parts ( part ) => part . type === "tool" && part . state === "approval-required" , < div key = { part . toolCallId }> < p >Approve { part . toolName }?</ p > addToolApprovalResponse ({ addToolApprovalResponse ({ const { messages , addToolApprovalResponse } = useAgentChat ( { agent } ) ; // Render pending approvals from message parts ( part ) => part . type === "tool" && part . state === "approval-required" , < div key = { part. toolCallId } > < p > Approve { part . toolName } ? </ p > addToolApprovalResponse ({ addToolApprovalResponse ({
For more patterns, refer to
Human-in-the-loop.
Include custom data with every chat request using the
body option:
const { messages , sendMessage } = useAgentChat ( { timezone : Intl . DateTimeFormat () . resolvedOptions () . timeZone , const { messages , sendMessage } = useAgentChat ( { timezone : Intl . DateTimeFormat () . resolvedOptions () . timeZone ,
For dynamic values, use a function:
Access these fields on the server:
export class ChatAgent extends AIChatAgent { async onChatMessage ( _onFinish , options ) { const { timezone , userId } = options ?. body ?? {}; export class ChatAgent extends AIChatAgent { async onChatMessage ( _onFinish , options ) { const { timezone , userId } = options ?. body ?? {};
For advanced per-request customization (custom headers, different body per request), use
prepareSendMessagesRequest:
const { messages , sendMessage } = useAgentChat ( { prepareSendMessagesRequest : async ({ messages , trigger }) => ( { headers : { Authorization : `Bearer ${ await getToken () } ` }, body : { requestedAt : Date . now () }, const { messages , sendMessage } = useAgentChat ( { prepareSendMessagesRequest : async ({ messages , trigger }) => ( { headers : { Authorization : `Bearer ${ await getToken () } ` }, body : { requestedAt : Date . now () },
Data parts let you attach typed JSON to messages alongside text — progress indicators, source citations, token usage, or any structured data your UI needs.
Writing data parts (server)
Use
createUIMessageStream with
writer.write() to send data parts from the server:
createUIMessageStreamResponse , export class ChatAgent extends AIChatAgent { const workersai = createWorkersAI ( { binding : this . env . AI } ) ; const stream = createUIMessageStream ( { execute : async ({ writer }) => { const result = streamText ( { model : workersai ( "@cf/zai-org/glm-4.7-flash" ) , messages : await convertToModelMessages ( this . messages ) , writer . merge ( result . toUIMessageStream ()) ; // Write a data part — persisted to message.parts data : { query : "agents" , status : "searching" , results : [] }, // Later: update the same part in-place (same type + id) results : [ "Agents SDK docs" , "Durable Objects guide" ] , return createUIMessageStreamResponse ( { stream } ) ; createUIMessageStreamResponse , export class ChatAgent extends AIChatAgent { const workersai = createWorkersAI ( { binding : this . env . AI } ) ; const stream = createUIMessageStream ( { execute : async ({ writer }) => { const result = streamText ( { model : workersai ( "@cf/zai-org/glm-4.7-flash" ) , messages : await convertToModelMessages ( this . messages ) , writer . merge ( result . toUIMessageStream ()) ; // Write a data part — persisted to message.parts data : { query : "agents" , status : "searching" , results : [] }, // Later: update the same part in-place (same type + id) results : [ "Agents SDK docs" , "Durable Objects guide" ] , return createUIMessageStreamResponse ( { stream } ) ;
Pattern How Persisted? Use case Reconciliation Same
type +
id → updates in-place
Yes Progressive state (searching → found) Append No
id, or different
id → appends
Yes Log entries, multiple citations Transient
transient: true → not added to
message.parts
No Ephemeral status (thinking indicator)
Transient parts are broadcast to connected clients in real time but excluded from SQLite persistence and
message.parts. Use the
onData callback to consume them.
Reading data parts (client)
Non-transient data parts appear in
message.parts. Use the
UIMessage generic to type them:
import { useAgentChat } from "@cloudflare/ai-chat/react" ; const { messages } = useAgentChat ( { agent } ) ; // Typed access — no casts needed for ( const msg of messages ) { for ( const part of msg . parts ) { if ( part . type === "data-sources" ) { console . log ( part . data . results ) ; // string[] import { useAgentChat } from "@cloudflare/ai-chat/react" ; import type { UIMessage } from "ai" ; type ChatMessage = UIMessage < sources : { query : string ; status : string ; results : string [] }; usage : { model : string ; inputTokens : number ; outputTokens : number }; const { messages } = useAgentChat < unknown , ChatMessage > ( { agent } ) ; // Typed access — no casts needed for ( const msg of messages ) { for ( const part of msg . parts ) { if ( part . type === "data-sources" ) { console . log ( part . data . results ) ; // string[]
Transient parts with
onData
Transient data parts are not in
message.parts. Use the
onData callback instead:
const [ thinking , setThinking ] = useState (false) ; const { messages } = useAgentChat ( { if ( part . type === "data-thinking" ) { const [ thinking , setThinking ] = useState (false) ; const { messages } = useAgentChat < unknown , ChatMessage > ( { if ( part . type === "data-thinking" ) {
On the server, write transient parts with
transient: true:
data : { model : "glm-4.7-flash" , startedAt : new Date () . toISOString () }, data : { model : "glm-4.7-flash" , startedAt : new Date () . toISOString () },
onData fires on all code paths — new messages, stream resumption, and cross-tab broadcasts.
Streams automatically resume when a client disconnects and reconnects. No configuration is needed — it works out of the box.
When streaming is active:
All chunks are buffered in SQLite as they are generated
If the client disconnects, the server continues streaming and buffering
When the client reconnects, it receives all buffered chunks and resumes live streaming
Disable with
resume: false:
const { messages } = useAgentChat ( { agent , resume : false } ) ; const { messages } = useAgentChat ( { agent , resume : false } ) ;
SQLite rows have a maximum size of 2 MB. When a message approaches this limit (for example, a tool returning a very large output),
AIChatAgent automatically compacts the message:
Tool output compaction — Large tool outputs are replaced with an LLM-friendly summary that instructs the model to suggest re-running the tool
Text truncation — If the message is still too large after tool compaction, text parts are truncated with a note
Compacted messages include
metadata.compactedToolOutputs so clients can detect and display this gracefully.
Controlling LLM context vs storage
Storage (
maxPersistedMessages) and LLM context are independent:
Concern Control Scope How many messages SQLite stores
maxPersistedMessages
Persistence What the model sees
pruneMessages()
LLM context Row size limits Automatic compaction Per-message
export class ChatAgent extends AIChatAgent { const result = streamText ( { model : workersai ( "@cf/zai-org/glm-4.7-flash" ) , messages : pruneMessages ( { messages : await convertToModelMessages ( this . messages ) , reasoning : "before-last-message" , toolCalls : "before-last-2-messages" , return result . toUIMessageStreamResponse () ; export class ChatAgent extends AIChatAgent { const result = streamText ( { model : workersai ( "@cf/zai-org/glm-4.7-flash" ) , messages : pruneMessages ( { messages : await convertToModelMessages ( this . messages ) , reasoning : "before-last-message" , toolCalls : "before-last-2-messages" , return result . toUIMessageStreamResponse () ;
Using different AI providers
AIChatAgent works with any AI SDK-compatible provider. The server code determines which model to use — the client does not need to change it manually.
import { createWorkersAI } from "workers-ai-provider" ; const workersai = createWorkersAI ( { binding : this . env . AI } ) ; const result = streamText ( { model : workersai ( "@cf/zai-org/glm-4.7-flash" ) , messages : await convertToModelMessages ( this . messages ) , import { createWorkersAI } from "workers-ai-provider" ; const workersai = createWorkersAI ( { binding : this . env . AI } ) ; const result = streamText ( { model : workersai ( "@cf/zai-org/glm-4.7-flash" ) , messages : await convertToModelMessages ( this . messages ) ,
import { createOpenAI } from "@ai-sdk/openai" ; const openai = createOpenAI ( { apiKey : this . env . OPENAI_API_KEY } ) ; const result = streamText ( { model : openai . chat ( "gpt-4o" ) , messages : await convertToModelMessages ( this . messages ) , import { createOpenAI } from "@ai-sdk/openai" ; const openai = createOpenAI ( { apiKey : this . env . OPENAI_API_KEY } ) ; const result = streamText ( { model : openai . chat ( "gpt-4o" ) , messages : await convertToModelMessages ( this . messages ) ,
import { createAnthropic } from "@ai-sdk/anthropic" ; const anthropic = createAnthropic ( { apiKey : this . env . ANTHROPIC_API_KEY } ) ; const result = streamText ( { model : anthropic ( "claude-sonnet-4-20250514" ) , messages : await convertToModelMessages ( this . messages ) , import { createAnthropic } from "@ai-sdk/anthropic" ; const anthropic = createAnthropic ( { apiKey : this . env . ANTHROPIC_API_KEY } ) ; const result = streamText ( { model : anthropic ( "claude-sonnet-4-20250514" ) , messages : await convertToModelMessages ( this . messages ) ,
Since
onChatMessage gives you full control over the
streamText call, you can use any AI SDK feature directly. The patterns below all work out of the box — no special
AIChatAgent configuration is needed.
Use
to change the model, available tools, or system prompt between steps in a multi-step agent loop:
prepareStep
↗
import { streamText , convertToModelMessages , tool , stepCountIs } from "ai" ; export class ChatAgent extends AIChatAgent { const result = streamText ( { model : cheapModel , // Default model for simple steps messages : await convertToModelMessages ( this . messages ) , summarize : summarizeTool , stopWhen : stepCountIs ( 10 ) , prepareStep : async ({ stepNumber , messages }) => { // Phase 1: Search (steps 0-2) toolChoice : "required" , // Force tool use // Phase 2: Analyze with a stronger model (steps 3-5) activeTools : [ "analyze" ] , return { activeTools : [ "summarize" ] }; return result . toUIMessageStreamResponse () ; import { streamText , convertToModelMessages , tool , stepCountIs } from "ai" ; export class ChatAgent extends AIChatAgent { const result = streamText ( { model : cheapModel , // Default model for simple steps messages : await convertToModelMessages ( this . messages ) , summarize : summarizeTool , stopWhen : stepCountIs ( 10 ) , prepareStep : async ({ stepNumber , messages }) => { // Phase 1: Search (steps 0-2) toolChoice : "required" , // Force tool use // Phase 2: Analyze with a stronger model (steps 3-5) activeTools : [ "analyze" ] , return { activeTools : [ "summarize" ] }; return result . toUIMessageStreamResponse () ;
prepareStep runs before each step and can return overrides for
model,
activeTools,
toolChoice,
system, and
messages. Use it to:
Switch models — use a cheap model for simple steps, escalate for reasoning
Phase tools — restrict which tools are available at each step
Manage context — prune or transform messages to stay within token limits
Force tool calls — use
toolChoice: { type: "tool", toolName: "search" } to require a specific tool
Language model middleware
Use
to add guardrails, RAG, caching, or logging without modifying your chat logic:
wrapLanguageModel
↗
import { streamText , convertToModelMessages , wrapLanguageModel } from "ai" ; const guardrailMiddleware = { wrapGenerate : async ({ doGenerate }) => { const { text , ... rest } = await doGenerate () ; // Filter PII or sensitive content from the response const cleaned = text ?. replace ( / \b \d {3} -\d {2} -\d {4}\b / g , "[REDACTED]" ) ; return { text : cleaned , ... rest }; export class ChatAgent extends AIChatAgent { const model = wrapLanguageModel ( { middleware : [ guardrailMiddleware ] , const result = streamText ( { messages : await convertToModelMessages ( this . messages ) , return result . toUIMessageStreamResponse () ; import { streamText , convertToModelMessages , wrapLanguageModel } from "ai" ; import type { LanguageModelV3Middleware } from "@ai-sdk/provider" ; const guardrailMiddleware : LanguageModelV3Middleware = { wrapGenerate : async ({ doGenerate }) => { const { text , ... rest } = await doGenerate () ; // Filter PII or sensitive content from the response const cleaned = text ?. replace ( / \b \d {3} -\d {2} -\d {4}\b / g , "[REDACTED]" ) ; return { text : cleaned , ... rest }; export class ChatAgent extends AIChatAgent { const model = wrapLanguageModel ( { middleware : [ guardrailMiddleware ] , const result = streamText ( { messages : await convertToModelMessages ( this . messages ) , return result . toUIMessageStreamResponse () ;
The AI SDK includes built-in middlewares:
extractReasoningMiddleware — surface chain-of-thought from models like DeepSeek R1
defaultSettingsMiddleware — apply default temperature, max tokens, etc.
simulateStreamingMiddleware — add streaming to non-streaming models
Multiple middlewares compose in order:
middleware: [first, second] applies as
first(second(model)).
Use
inside tools for structured data extraction:
generateObject
↗
export class ChatAgent extends AIChatAgent { const result = streamText ( { messages : await convertToModelMessages ( this . messages ) , extractContactInfo : tool ( { "Extract structured contact information from the conversation" , text : z . string () . describe ( "The text to extract contact info from" ) , execute : async ({ text }) => { const { object } = await generateObject ( { email : z . string () . email () , phone : z . string () . optional () , prompt : `Extract contact information from: ${ text } ` , stopWhen : stepCountIs ( 5 ) , return result . toUIMessageStreamResponse () ; export class ChatAgent extends AIChatAgent { const result = streamText ( { messages : await convertToModelMessages ( this . messages ) , extractContactInfo : tool ( { "Extract structured contact information from the conversation" , text : z . string () . describe ( "The text to extract contact info from" ) , execute : async ({ text }) => { const { object } = await generateObject ( { email : z . string () . email () , phone : z . string () . optional () , prompt : `Extract contact information from: ${ text } ` , stopWhen : stepCountIs ( 5 ) , return result . toUIMessageStreamResponse () ;
Tools can delegate work to focused sub-calls with their own context. Use
to define a reusable agent, then call it from a tool's
ToolLoopAgent
↗
execute:
// Define a reusable research agent with its own tools and instructions const researchAgent = new ToolLoopAgent ( { instructions : "You are a research assistant. Be thorough and cite sources." , tools : { webSearch : webSearchTool }, stopWhen : stepCountIs ( 10 ) , export class ChatAgent extends AIChatAgent { const result = streamText ( { model : orchestratorModel , messages : await convertToModelMessages ( this . messages ) , description : "Research a topic in depth" , topic : z . string () . describe ( "The topic to research" ) , execute : async ({ topic }) => { const { text } = await researchAgent . generate ( { return { summary : text }; stopWhen : stepCountIs ( 5 ) , return result . toUIMessageStreamResponse () ; // Define a reusable research agent with its own tools and instructions const researchAgent = new ToolLoopAgent ( { instructions : "You are a research assistant. Be thorough and cite sources." , tools : { webSearch : webSearchTool }, stopWhen : stepCountIs ( 10 ) , export class ChatAgent extends AIChatAgent { const result = streamText ( { model : orchestratorModel , messages : await convertToModelMessages ( this . messages ) , description : "Research a topic in depth" , topic : z . string () . describe ( "The topic to research" ) , execute : async ({ topic }) => { const { text } = await researchAgent . generate ( { return { summary : text }; stopWhen : stepCountIs ( 5 ) , return result . toUIMessageStreamResponse () ;
The research agent runs in its own context — its token budget is separate from the orchestrator's. Only the summary goes back to the parent model.
Note
ToolLoopAgent is best suited for subagents, not as a replacement for
streamText in
onChatMessage itself. The main
onChatMessage benefits from direct access to
this.env,
this.messages, and
options.body — things that a pre-configured
ToolLoopAgent instance cannot reference.
Streaming progress with preliminary results
By default, a tool part appears as loading until
execute returns. Use an async generator (
async function*) to stream progress updates to the client while the tool is still working:
description : "Research a topic in depth" , topic : z . string () . describe ( "The topic to research" ) , async * execute ({ topic }) { // Preliminary result — the client sees "searching" immediately yield { status : "searching" , topic , summary : undefined }; const { text } = await researchAgent . generate ( { prompt : topic } ) ; // Final result — sent to the model for its next step yield { status : "done" , topic , summary : text }; description : "Research a topic in depth" , topic : z . string () . describe ( "The topic to research" ) , async * execute ({ topic }) { // Preliminary result — the client sees "searching" immediately yield { status : "searching" , topic , summary : undefined }; const { text } = await researchAgent . generate ( { prompt : topic } ) ; // Final result — sent to the model for its next step yield { status : "done" , topic , summary : text };
Each
yield updates the tool part on the client in real-time (with
preliminary: true). The last yielded value becomes the final output that the model sees.
This pattern is useful when:
A task requires exploring large amounts of information that would bloat the main context
You want to show real-time progress for long-running tools
You want to parallelize independent research (multiple tool calls run concurrently)
You need different models or system prompts for different subtasks
For more, refer to the
AI SDK Agents docs, ↗ Subagents, and ↗ Preliminary Tool Results. ↗
When multiple clients connect to the same agent instance, messages are automatically broadcast to all connections. If one client sends a message, all other connected clients receive the updated message list.
Client A ──── sendMessage("Hello") ────▶ AIChatAgent Client A ◀── CF_AGENT_USE_CHAT_RESPONSE ──────┤ Client B ◀── CF_AGENT_CHAT_MESSAGES ──────────┘
The originating client receives the streaming response. All other clients receive the final messages via a
CF_AGENT_CHAT_MESSAGES broadcast.
Import path Exports
@cloudflare/ai-chat
AIChatAgent,
createToolsFromClientSchemas
@cloudflare/ai-chat/react
useAgentChat
@cloudflare/ai-chat/types
MessageType,
OutgoingMessage,
IncomingMessage
The chat protocol uses typed JSON messages over WebSocket:
Message Direction Purpose
CF_AGENT_USE_CHAT_REQUEST
Client → Server Send a chat message
CF_AGENT_USE_CHAT_RESPONSE
Server → Client Stream response chunks
CF_AGENT_CHAT_MESSAGES
Server → Client Broadcast updated messages
CF_AGENT_CHAT_CLEAR
Bidirectional Clear conversation
CF_AGENT_CHAT_REQUEST_CANCEL
Client → Server Cancel active stream
CF_AGENT_TOOL_RESULT
Client → Server Provide tool output
CF_AGENT_TOOL_APPROVAL
Client → Server Approve or reject a tool
CF_AGENT_MESSAGE_UPDATED
Server → Client Notify of message update
CF_AGENT_STREAM_RESUMING
Server → Client Notify of stream resumption
CF_AGENT_STREAM_RESUME_REQUEST
Client → Server Request stream resume check
The following APIs are deprecated and will emit a console warning when used. They will be removed in a future release.
Deprecated Replacement Notes
addToolResult({ toolCallId, result })
addToolOutput({ toolCallId, output })
Renamed for consistency with AI SDK terminology
createToolsFromClientSchemas()
Client tools are now registered automatically No manual schema conversion needed
extractClientToolSchemas()
Client tools are now registered automatically Schemas are sent with tool results
detectToolsRequiringConfirmation()
Use
needsApproval on the tool definition
Approval is now per-tool, not a global filter
tools option on
useAgentChat
Define tools in
onChatMessage on the server
All tool definitions belong on the server
toolsRequiringConfirmation option
Use
needsApproval on individual tools
Per-tool approval replaces global list
If you are upgrading from an earlier version, replace deprecated calls with their replacements. The deprecated APIs still work but will be removed in a future major version.