Skip to content

Changelog

New updates and improvements at Cloudflare.

Developer platform
hero image
  1. You can now manage Cloudflare Tunnels directly from Wrangler, the CLI for the Cloudflare Developer Platform. The new wrangler tunnel commands let you create, run, and manage tunnels without leaving your terminal.

    Wrangler tunnel commands demo

    Available commands:

    • wrangler tunnel create — Create a new remotely managed tunnel.
    • wrangler tunnel list — List all tunnels in your account.
    • wrangler tunnel info — Display details about a specific tunnel.
    • wrangler tunnel delete — Delete a tunnel.
    • wrangler tunnel run — Run a tunnel using the cloudflared daemon.
    • wrangler tunnel quick-start — Start a free, temporary tunnel without an account using Quick Tunnels.

    Wrangler handles downloading and managing the cloudflared binary automatically. On first use, you will be prompted to download cloudflared to a local cache directory.

    These commands are currently experimental and may change without notice.

    To get started, refer to the Wrangler tunnel commands documentation.

  1. Workers AI is officially in the big models game. @cf/moonshotai/kimi-k2.5 is the first frontier-scale open-source model on our AI inference platform — a large model with a full 256k context window, multi-turn tool calling, vision inputs, and structured outputs. By bringing a frontier-scale model directly onto the Cloudflare Developer Platform, you can now run the entire agent lifecycle on a single, unified platform.

    The model has proven to be a fast, efficient alternative to larger proprietary models without sacrificing quality. As AI adoption increases, the volume of inference is skyrocketing — now you can access frontier intelligence at a fraction of the cost.

    Key capabilities

    • 256,000 token context window for retaining full conversation history, tool definitions, and entire codebases across long-running agent sessions
    • Multi-turn tool calling for building agents that invoke tools across multiple conversation turns
    • Vision inputs for processing images alongside text
    • Structured outputs with JSON mode and JSON Schema support for reliable downstream parsing
    • Function calling for integrating external tools and APIs into agent workflows

    Prefix caching and session affinity

    When an agent sends a new prompt, it resends all previous prompts, tools, and context from the session. The delta between consecutive requests is usually just a few new lines of input. Prefix caching avoids reprocessing the shared context, saving time and compute from the prefill stage. This means faster Time to First Token (TTFT) and higher Tokens Per Second (TPS) throughput.

    Workers AI has done prefix caching, but we are now surfacing cached tokens as a usage metric and offering a discount on cached tokens compared to input tokens (pricing is listed on the model page).

    Terminal window
    curl -X POST \
    "https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/@cf/moonshotai/kimi-k2.5" \
    -H "Authorization: Bearer {api_token}" \
    -H "Content-Type: application/json" \
    -H "x-session-affinity: ses_12345678" \
    -d '{
    "messages": [
    {
    "role": "system",
    "content": "You are a helpful assistant."
    },
    {
    "role": "user",
    "content": "What is prefix caching and why does it matter?"
    }
    ],
    "max_tokens": 2400,
    "stream": true
    }'

    Some clients like OpenCode implement session affinity automatically. The Agents SDK starter also sets up the wiring for you.

    Redesigned asynchronous API

    For volumes of requests that exceed synchronous rate limits, you can submit batches of inferences to be completed asynchronously. We have revamped the Asynchronous Batch API with a pull-based system that processes queued requests as soon as capacity is available. With internal testing, async requests usually execute within 5 minutes, but this depends on live traffic.

    The async API is the best way to avoid capacity errors in durable workflows. It is ideal for use cases that are not real-time, such as code scanning agents or research agents.

    To use the asynchronous API, pass queueRequest: true:

    JavaScript
    // 1. Push a batch of requests into the queue
    const res = await env.AI.run(
    "@cf/moonshotai/kimi-k2.5",
    {
    requests: [
    {
    messages: [{ role: "user", content: "Tell me a joke" }],
    },
    {
    messages: [{ role: "user", content: "Explain the Pythagoras theorem" }],
    },
    ],
    },
    { queueRequest: true },
    );
    // 2. Grab the request ID
    const requestId = res.request_id;
    // 3. Poll for the result
    const result = await env.AI.run("@cf/moonshotai/kimi-k2.5", {
    request_id: requestId,
    });
    if (result.status === "queued" || result.status === "running") {
    // Retry by polling again
    } else {
    return Response.json(result);
    }

    You can also set up event notifications to know when inference is complete instead of polling.

    Get started

    Use Kimi K2.5 through the Workers AI binding (env.AI.run()), the REST API at /run or /v1/chat/completions, AI Gateway, or via the OpenAI-compatible endpoint.

    For more information, refer to the Kimi K2.5 model page, pricing, and prompt caching.

  1. You can now use a Workers binding to transform videos with Media Transformations. This allows you to resize, crop, extract frames, and extract audio from videos stored anywhere, even in private locations like R2 buckets.

    The Media Transformations binding is useful when you want to:

    • Transform videos stored in private or protected sources
    • Optimize videos and store the output directly back to R2 for re-use
    • Extract still frames for classification or description with Workers AI
    • Extract audio tracks for transcription using Workers AI

    To get started, add the Media binding to your Wrangler configuration:

    JSONC
    {
    "$schema": "./node_modules/wrangler/config-schema.json",
    "media": {
    "binding": "MEDIA"
    }
    }

    Then use the binding in your Worker to transform videos:

    JavaScript
    export default {
    async fetch(request, env) {
    const video = await env.R2_BUCKET.get("input.mp4");
    const result = env.MEDIA.input(video.body)
    .transform({ width: 480, height: 270 })
    .output({ mode: "video", duration: "5s" });
    return await result.response();
    },
    };

    Output modes include video for optimized MP4 clips, frame for still images, spritesheet for multiple frames, and audio for M4A extraction.

    For more information, refer to the Media Transformations binding documentation.

  1. The latest releases of @cloudflare/codemode add a new MCP barrel export, remove ai and zod as required peer dependencies from the main entry point, and give you more control over the sandbox.

    New @cloudflare/codemode/mcp export

    A new @cloudflare/codemode/mcp entry point provides two functions that wrap MCP servers with Code Mode:

    • codeMcpServer({ server, executor }) — wraps an existing MCP server with a single code tool where each upstream tool becomes a typed codemode.* method.
    • openApiMcpServer({ spec, executor, request }) — creates search and execute MCP tools from an OpenAPI spec with host-side request proxying and automatic $ref resolution.
    JavaScript
    import { codeMcpServer } from "@cloudflare/codemode/mcp";
    import { DynamicWorkerExecutor } from "@cloudflare/codemode";
    const executor = new DynamicWorkerExecutor({ loader: env.LOADER });
    // Wrap an existing MCP server — all its tools become
    // typed methods the LLM can call from generated code
    const server = await codeMcpServer({ server: upstreamMcp, executor });

    Zero-dependency main entry point

    Breaking change in v0.2.0: generateTypes and the ToolDescriptor / ToolDescriptors types have moved to @cloudflare/codemode/ai:

    JavaScript
    // Before
    import { generateTypes } from "@cloudflare/codemode";
    // After
    import { generateTypes } from "@cloudflare/codemode/ai";

    The main entry point (@cloudflare/codemode) no longer requires the ai or zod peer dependencies. It now exports:

    ExportDescription
    sanitizeToolNameSanitize tool names into valid JS identifiers
    normalizeCodeNormalize LLM-generated code into async arrow functions
    generateTypesFromJsonSchemaGenerate TypeScript type definitions from plain JSON Schema
    jsonSchemaToTypeConvert a single JSON Schema to a TypeScript type string
    DynamicWorkerExecutorSandboxed code execution via Dynamic Worker Loader
    ToolDispatcherRPC target for dispatching tool calls from sandbox to host

    The ai and zod peer dependencies are now optional — only required when importing from @cloudflare/codemode/ai.

    Custom sandbox modules

    DynamicWorkerExecutor now accepts an optional modules option to inject custom ES modules into the sandbox:

    JavaScript
    const executor = new DynamicWorkerExecutor({
    loader: env.LOADER,
    modules: {
    "utils.js": `export function add(a, b) { return a + b; }`,
    },
    });
    // Sandbox code can then: import { add } from "utils.js"

    Internal normalization and sanitization

    DynamicWorkerExecutor now normalizes code and sanitizes tool names internally. You no longer need to call normalizeCode() or sanitizeToolName() before passing code and functions to execute().

    Upgrade

    Terminal window
    npm i @cloudflare/codemode@latest

    See the Code Mode documentation for the full API reference.

  1. AI Gateway now supports the cf-aig-collect-log-payload header, which controls whether request and response bodies are stored in logs. By default, this header is set to true and payloads are stored alongside metadata. Set this header to false to skip payload storage while still logging metadata such as token counts, model, provider, status code, cost, and duration.

    This is useful when you need usage metrics but do not want to persist sensitive prompt or response data.

    Terminal window
    curl https://gateway.ai.cloudflare.com/v1/$ACCOUNT_ID/$GATEWAY_ID/openai/chat/completions \
    --header "Authorization: Bearer $TOKEN" \
    --header 'Content-Type: application/json' \
    --header 'cf-aig-collect-log-payload: false' \
    --data '{
    "model": "gpt-4o-mini",
    "messages": [
    {
    "role": "user",
    "content": "What is the email address and phone number of user123?"
    }
    ]
    }'

    For more information, refer to Logging.

  1. You can now set topK up to 50 when a Vectorize query returns values or full metadata. This raises the previous limit of 20 for queries that use returnValues: true or returnMetadata: "all".

    Use the higher limit when you need more matches in a single query response without dropping values or metadata. Refer to the Vectorize API reference for query options and current topK limits.

  1. You can now SSH into running Container instances using Wrangler. This is useful for debugging, inspecting running processes, or executing one-off commands inside a Container.

    To connect, enable wrangler_ssh in your Container configuration and add your ssh-ed25519 public key to authorized_keys:

    JSONC
    {
    "containers": [
    {
    "wrangler_ssh": {
    "enabled": true
    },
    "authorized_keys": [
    {
    "name": "<NAME>",
    "public_key": "<YOUR_PUBLIC_KEY_HERE>"
    }
    ]
    }
    ]
    }

    Then connect with:

    Terminal window
    wrangler containers ssh <INSTANCE_ID>

    You can also run a single command without opening an interactive shell:

    Terminal window
    wrangler containers ssh <INSTANCE_ID> -- ls -al

    Use wrangler containers instances <APPLICATION> to find the instance ID for a running Container.

    For more information, refer to the SSH documentation.

  1. A new wrangler containers instances command lists all instances for a given Container application. This mirrors the instances view in the Cloudflare dashboard.

    The command displays each instance's ID, name, state, location, version, and creation time:

    Terminal window
    wrangler containers instances <APPLICATION_ID>

    Use the --json flag for machine-readable output, which is also the default format in non-interactive environments such as CI pipelines.

    For the full list of options, refer to the containers instances command reference.

  1. We're excited to partner with NVIDIA to bring @cf/nvidia/nemotron-3-120b-a12b to Workers AI. NVIDIA Nemotron 3 Super is a Mixture-of-Experts (MoE) model with a hybrid Mamba-transformer architecture, 120B total parameters, and 12B active parameters per forward pass.

    The model is optimized for running many collaborating agents per application. It delivers high accuracy for reasoning, tool calling, and instruction following across complex multi-step tasks.

    Key capabilities:

    • Hybrid Mamba-transformer architecture delivers over 50% higher token generation throughput compared to leading open models, reducing latency for real-world applications
    • Tool calling support for building AI agents that invoke tools across multiple conversation turns
    • Multi-Token Prediction (MTP) accelerates long-form text generation by predicting several future tokens simultaneously in a single forward pass
    • 32,000 token context window for retaining conversation history and plan states across multi-step agent workflows

    Use Nemotron 3 Super through the Workers AI binding (env.AI.run()), the REST API at /run or /v1/chat/completions, or the OpenAI-compatible endpoint.

    For more information, refer to the Nemotron 3 Super model page.

  1. Edit: this post has been edited to clarify crawling behavior with respect to site guidance.

    You can now crawl an entire website with a single API call using Browser Rendering's new /crawl endpoint, available in open beta. Submit a starting URL, and pages are automatically discovered, rendered in a headless browser, and returned in multiple formats, including HTML, Markdown, and structured JSON. The endpoint is a signed-agent that respects robots.txt and AI Crawl Control by default, making it easy for developers to comply with website rules, and making it less likely for crawlers to ignore web-owner guidance. This is great for training models, building RAG pipelines, and researching or monitoring content across a site.

    Crawl jobs run asynchronously. You submit a URL, receive a job ID, and check back for results as pages are processed.

    Terminal window
    # Initiate a crawl
    curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \
    -H 'Authorization: Bearer <apiToken>' \
    -H 'Content-Type: application/json' \
    -d '{
    "url": "https://blog.cloudflare.com/"
    }'
    # Check results
    curl -X GET 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl/{job_id}' \
    -H 'Authorization: Bearer <apiToken>'

    Key features:

    • Multiple output formats - Return crawled content as HTML, Markdown, and structured JSON (powered by Workers AI)
    • Crawl scope controls - Configure crawl depth, page limits, and wildcard patterns to include or exclude specific URL paths
    • Automatic page discovery - Discovers URLs from sitemaps, page links, or both
    • Incremental crawling - Use modifiedSince and maxAge to skip pages that haven't changed or were recently fetched, saving time and cost on repeated crawls
    • Static mode - Set render: false to fetch static HTML without spinning up a browser, for faster crawling of static sites
    • Well-behaved bot - Honors robots.txt directives, including crawl-delay

    Available on both the Workers Free and Paid plans.

    Note: the /crawl endpoint cannot bypass Cloudflare bot detection or captchas, and self-identifies as a bot.

    To get started, refer to the crawl endpoint documentation. If you are setting up your own site to be crawled, review the robots.txt and sitemaps best practices.

  1. Cloudflare Workflows allows you to configure specific retry logic for each step in your workflow execution. Now, you can access which retry attempt is currently executing for calls to step.do():

    TypeScript
    await step.do("my-step", async (ctx) => {
    // ctx.attempt is 1 on first try, 2 on first retry, etc.
    console.log(`Attempt ${ctx.attempt}`);
    });

    You can use the step context for improved logging & observability, progressive backoff, or conditional logic in your workflow definition.

    Note that the current attempt number is 1-indexed. For more information on retry behavior, refer to Sleeping and Retrying.

  1. Real-time transcription in RealtimeKit now supports 10 languages with regional variants, powered by Deepgram Nova-3 running on Workers AI.

    During a meeting, participant audio is routed through AI Gateway to Nova-3 on Workers AI — so transcription runs on Cloudflare's network end-to-end, reducing latency compared to routing through external speech-to-text services.

    Set the language when creating a meeting via ai_config.transcription.language:

    {
    "ai_config": {
    "transcription": {
    "language": "fr"
    }
    }
    }

    Supported languages include English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch — with regional variants like en-AU, en-GB, en-IN, en-NZ, es-419, fr-CA, de-CH, pt-BR, and pt-PT. Use multi for automatic multilingual detection.

    If you are building voice agents or real-time translation workflows, your agent can now transcribe in the caller's language natively — no extra services or routing logic needed.

  1. Browser Rendering REST API rate limits for Workers Paid plans have been increased from 3 requests per second (180/min) to 10 requests per second (600/min). No action is needed to benefit from the higher limit.

    Browser Rendering REST API rate limit increased from 3 to 10 requests per second

    The REST API lets you perform common browser tasks with a single API call, and you can now do it at a higher rate.

    If you use the Browser Sessions method, increases to concurrent browser and new browser limits are coming soon. Stay tuned.

    For full details, refer to the Browser Rendering limits page.

  1. You can now customize how the Markdown Conversion service processes different file types by passing a conversionOptions object.

    Available options:

    • Images: Set the language for AI-generated image descriptions
    • HTML: Use CSS selectors to extract specific content, or provide a hostname to resolve relative links
    • PDF: Exclude metadata from the output

    Use the env.AI binding:

    JavaScript
    await env.AI.toMarkdown(
    { name: "page.html", blob: new Blob([html]) },
    {
    conversionOptions: {
    html: { cssSelector: "article.content" },
    image: { descriptionLanguage: "es" },
    },
    },
    );

    Or call the REST API:

    Terminal window
    curl https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/tomarkdown \
    -H 'Authorization: Bearer {API_TOKEN}' \
    -F 'files=@index.html' \
    -F 'conversionOptions={"html": {"cssSelector": "article.content"}}'

    For more details, refer to Conversion Options.

  1. Each Workflow on Workers Paid now supports 10,000 steps by default, configurable up to 25,000 steps in your wrangler.jsonc file:

    {
    "workflows": [
    {
    "name": "my-workflow",
    "binding": "MY_WORKFLOW",
    "class_name": "MyWorkflow",
    "limits": {
    "steps": 25000
    }
    }
    ]
    }

    Previously, each instance was limited to 1,024 steps. Now, Workflows can support more complex, long-running executions without the additional complexity of recursive or child workflow calls.

    Note that the maximum persisted state limit per Workflow instance remains 100 MB for Workers Free and 1 GB for Workers Paid. Refer to Workflows limits for more information.

  1. Sandboxes now support real-time filesystem watching via sandbox.watch(). The method returns a Server-Sent Events stream backed by native inotify, so your Worker receives create, modify, delete, and move events as they happen inside the container.

    sandbox.watch(path, options)

    Pass a directory path and optional filters. The returned stream is a standard ReadableStream you can proxy directly to a browser client or consume server-side.

    JavaScript
    // Stream events to a browser client
    const stream = await sandbox.watch("/workspace/src", {
    recursive: true,
    include: ["*.ts", "*.js"],
    });
    return new Response(stream, {
    headers: { "Content-Type": "text/event-stream" },
    });

    Server-side consumption with parseSSEStream

    Use parseSSEStream to iterate over events inside a Worker without forwarding them to a client.

    JavaScript
    import { parseSSEStream } from "@cloudflare/sandbox";
    const stream = await sandbox.watch("/workspace/src", { recursive: true });
    for await (const event of parseSSEStream(stream)) {
    console.log(event.type, event.path);
    }

    Each event includes a type field (create, modify, delete, or move) and the affected path. Move events also include a from field with the original path.

    Options

    OptionTypeDescription
    recursivebooleanWatch subdirectories. Defaults to false.
    includestring[]Glob patterns to filter events. Omit to receive all events.

    Upgrade

    To update to the latest version:

    Terminal window
    npm i @cloudflare/sandbox@latest

    For full API details, refer to the Sandbox file watching reference.

  1. The latest release of the Agents SDK rewrites observability from scratch with diagnostics_channel, adds keepAlive() to prevent Durable Object eviction during long-running work, and introduces waitForMcpConnections so MCP tools are always available when onChatMessage runs.

    Observability rewrite

    The previous observability system used console.log() with a custom Observability.emit() interface. v0.7.0 replaces it with structured events published to diagnostics channels — silent by default, zero overhead when nobody is listening.

    Every event has a type, payload, and timestamp. Events are routed to seven named channels:

    ChannelEvent types
    agents:statestate:update
    agents:rpcrpc, rpc:error
    agents:messagemessage:request, message:response, message:clear, message:cancel, message:error, tool:result, tool:approval
    agents:scheduleschedule:create, schedule:execute, schedule:cancel, schedule:retry, schedule:error, queue:retry, queue:error
    agents:lifecycleconnect, destroy
    agents:workflowworkflow:start, workflow:event, workflow:approved, workflow:rejected, workflow:terminated, workflow:paused, workflow:resumed, workflow:restarted
    agents:mcpmcp:client:preconnect, mcp:client:connect, mcp:client:authorize, mcp:client:discover

    Use the typed subscribe() helper from agents/observability for type-safe access:

    JavaScript
    import { subscribe } from "agents/observability";
    const unsub = subscribe("rpc", (event) => {
    if (event.type === "rpc") {
    console.log(`RPC call: ${event.payload.method}`);
    }
    if (event.type === "rpc:error") {
    console.error(
    `RPC failed: ${event.payload.method}${event.payload.error}`,
    );
    }
    });
    // Clean up when done
    unsub();

    In production, all diagnostics channel messages are automatically forwarded to Tail Workers — no subscription code needed in the agent itself:

    JavaScript
    export default {
    async tail(events) {
    for (const event of events) {
    for (const msg of event.diagnosticsChannelEvents) {
    // msg.channel is "agents:rpc", "agents:workflow", etc.
    console.log(msg.timestamp, msg.channel, msg.message);
    }
    }
    },
    };

    The custom Observability override interface is still supported for users who need to filter or forward events to external services.

    For the full event reference, refer to the Observability documentation.

    keepAlive() and keepAliveWhile()

    Durable Objects are evicted after a period of inactivity (typically 70-140 seconds with no incoming requests, WebSocket messages, or alarms). During long-running operations — streaming LLM responses, waiting on external APIs, running multi-step computations — the agent can be evicted mid-flight.

    keepAlive() prevents this by creating a 30-second heartbeat schedule. The alarm firing resets the inactivity timer. Returns a disposer function that cancels the heartbeat when called.

    JavaScript
    const dispose = await this.keepAlive();
    try {
    const result = await longRunningComputation();
    await sendResults(result);
    } finally {
    dispose();
    }

    keepAliveWhile() wraps an async function with automatic cleanup — the heartbeat starts before the function runs and stops when it completes:

    JavaScript
    const result = await this.keepAliveWhile(async () => {
    const data = await longRunningComputation();
    return data;
    });

    Key details:

    • Multiple concurrent callers — Each keepAlive() call returns an independent disposer. Disposing one does not affect others.
    • AIChatAgent built-inAIChatAgent automatically calls keepAlive() during streaming responses. You do not need to add it yourself.
    • Uses the scheduling system — The heartbeat does not conflict with your own schedules. It shows up in getSchedules() if you need to inspect it.

    For the full API reference and when-to-use guidance, refer to Schedule tasks — Keeping the agent alive.

    waitForMcpConnections

    AIChatAgent now waits for MCP server connections to settle before calling onChatMessage. This ensures this.mcp.getAITools() returns the full set of tools, especially after Durable Object hibernation when connections are being restored in the background.

    JavaScript
    export class ChatAgent extends AIChatAgent {
    // Default — waits up to 10 seconds
    // waitForMcpConnections = { timeout: 10_000 };
    // Wait forever
    waitForMcpConnections = true;
    // Disable waiting
    waitForMcpConnections = false;
    }
    ValueBehavior
    { timeout: 10_000 }Wait up to 10 seconds (default)
    { timeout: N }Wait up to N milliseconds
    trueWait indefinitely until all connections ready
    falseDo not wait (old behavior before 0.2.0)

    For lower-level control, call this.mcp.waitForConnections() directly inside onChatMessage instead.

    Other improvements

    • MCP deduplication by name and URLaddMcpServer with HTTP transport now deduplicates on both server name and URL. Calling it with the same name but a different URL creates a new connection. URLs are normalized before comparison (trailing slashes, default ports, hostname case).
    • callbackHost optional for non-OAuth serversaddMcpServer no longer requires callbackHost when connecting to MCP servers that do not use OAuth.
    • MCP URL security — Server URLs are validated before connection to prevent SSRF. Private IP ranges, loopback addresses, link-local addresses, and cloud metadata endpoints are blocked.
    • Custom denial messagesaddToolOutput now supports state: "output-error" with errorText for custom denial messages in human-in-the-loop tool approval flows.
    • requestId in chat optionsonChatMessage options now include a requestId for logging and correlating events.

    Upgrade

    To update to the latest version:

    Terminal window
    npm i agents@latest @cloudflare/ai-chat@latest
  1. You can now start using AI Gateway with a single API call — no setup required. Use default as your gateway ID, and AI Gateway creates one for you automatically on the first request.

    To try it out, create an API token with AI Gateway - Read, AI Gateway - Edit, and Workers AI - Read permissions, then run:

    Terminal window
    curl -X POST https://gateway.ai.cloudflare.com/v1/$CLOUDFLARE_ACCOUNT_ID/default/compat/chat/completions \
    --header "cf-aig-authorization: Bearer $CLOUDFLARE_API_TOKEN" \
    --header 'Content-Type: application/json' \
    --data '{
    "model": "workers-ai/@cf/meta/llama-3.3-70b-instruct-fp8-fast",
    "messages": [
    {
    "role": "user",
    "content": "What is Cloudflare?"
    }
    ]
    }'

    AI Gateway gives you logging, caching, rate limiting, and access to multiple AI providers through a single endpoint. For more information, refer to Get started.

  1. The latest release of the Agents SDK lets you define an Agent and an McpAgent in the same Worker and connect them over RPC — no HTTP, no network overhead. It also makes OAuth opt-in for simple MCP connections, hardens the schema converter for production workloads, and ships a batch of @cloudflare/ai-chat reliability fixes.

    RPC transport for MCP

    You can now connect an Agent to an McpAgent in the same Worker using a Durable Object binding instead of an HTTP URL. The connection stays entirely within the Cloudflare runtime — no network round-trips, no serialization overhead.

    Pass the Durable Object namespace directly to addMcpServer:

    JavaScript
    import { Agent } from "agents";
    export class MyAgent extends Agent {
    async onStart() {
    // Connect via DO binding — no HTTP, no network overhead
    await this.addMcpServer("counter", env.MY_MCP);
    // With props for per-user context
    await this.addMcpServer("counter", env.MY_MCP, {
    props: { userId: "user-123", role: "admin" },
    });
    }
    }

    The addMcpServer method now accepts string | DurableObjectNamespace as the second parameter with full TypeScript overloads, so HTTP and RPC paths are type-safe and cannot be mixed.

    Key capabilities:

    • Hibernation support — RPC connections survive Durable Object hibernation automatically. The binding name and props are persisted to storage and restored on wake-up, matching the behavior of HTTP MCP connections.
    • Deduplication — Calling addMcpServer with the same server name returns the existing connection instead of creating duplicates. Connection IDs are stable across hibernation restore.
    • Smaller surface area — The RPC transport internals have been rewritten and reduced from 609 lines to 245 lines. RPCServerTransport now uses JSONRPCMessageSchema from the MCP SDK for validation instead of hand-written checks.

    Optional OAuth for MCP connections

    addMcpServer() no longer eagerly creates an OAuth provider for every connection. For servers that do not require authentication, a simple call is all you need:

    JavaScript
    // No callbackHost, no OAuth config — just works
    await this.addMcpServer("my-server", "https://mcp.example.com");

    If the server responds with a 401, the SDK throws a clear error: "This MCP server requires OAuth authentication. Provide callbackHost in addMcpServer options to enable the OAuth flow." The restore-from-storage flow also handles missing callback URLs gracefully, skipping auth provider creation for non-OAuth servers.

    Hardened JSON Schema to TypeScript converter

    The schema converter used by generateTypes() and getAITools() now handles edge cases that previously caused crashes in production:

    • Depth and circular reference guards — Prevents stack overflows on recursive or deeply nested schemas
    • $ref resolution — Supports internal JSON Pointers (#/definitions/..., #/$defs/..., #)
    • Tuple supportprefixItems (JSON Schema 2020-12) and array items (draft-07)
    • OpenAPI 3.0 nullable: true — Supported across all schema branches
    • Per-tool error isolation — One malformed schema cannot crash the full pipeline in generateTypes() or getAITools()
    • Missing inputSchema fallbackgetAITools() falls back to { type: "object" } instead of throwing

    @cloudflare/ai-chat fixes

    • Tool denial flow — Denied tool approvals (approved: false) now transition to output-denied with a tool_result, fixing Anthropic provider compatibility. Custom denial messages are supported via state: "output-error" and errorText.
    • Abort/cancel support — Streaming responses now properly cancel the reader loop when the abort signal fires and send a done signal to the client.
    • Duplicate message persistencepersistMessages() now reconciles assistant messages by content and order, preventing duplicate rows when clients resend full history.
    • requestId in OnChatMessageOptions — Handlers can now send properly-tagged error responses for pre-stream failures.
    • redacted_thinking preservation — The message sanitizer no longer strips Anthropic redacted_thinking blocks.
    • /get-messages reliability — Endpoint handling moved from a prototype onRequest() override to a constructor wrapper, so it works even when users override onRequest without calling super.onRequest().
    • Client tool APIs undeprecatedcreateToolsFromClientSchemas, clientTools, AITool, extractClientToolSchemas, and the tools option on useAgentChat are restored for SDK use cases where tools are defined dynamically at runtime.
    • jsonSchema initialization — Fixed jsonSchema not initialized error when calling getAITools() in onChatMessage.

    Upgrade

    To update to the latest version:

    Terminal window
    npm i agents@latest @cloudflare/ai-chat@latest
  1. You can now run more Containers concurrently with significantly higher limits on memory, vCPU, and disk.

    LimitPrevious LimitNew Limit
    Memory for concurrent live Container instances400GiB6TiB
    vCPU for concurrent live Container instances1001,500
    Disk for concurrent live Container instances2TB30TB

    This 15x increase enables larger-scale workloads on Containers. You can now run 15,000 instances of the lite instance type, 6,000 instances of basic, over 1,500 instances of standard-1, or over 1,000 instances of standard-2 concurrently.

    Refer to Limits for more details on the available instance types and limits.

  1. Pywrangler, the CLI tool for managing Python Workers and packages, now supports Windows, allowing you to develop and deploy Python Workers from Windows environments. Previously, Pywrangler was only available on macOS and Linux.

    You can install and use Pywrangler on Windows the same way you would on other platforms. Specify your Worker's Python dependencies in your pyproject.toml file, then use the following commands to develop and deploy:

    Terminal window
    uvx --from workers-py pywrangler dev
    uvx --from workers-py pywrangler deploy

    All existing Pywrangler functionality, including package management, local development, and deployment, works on Windows without any additional configuration.

    Requirements

    This feature requires the following minimum versions:

    • wrangler >= 4.64.0
    • workers-py >= 1.72.0
    • uv >= 0.29.8

    To upgrade workers-py (which includes Pywrangler) in your project, run:

    Terminal window
    uv tool upgrade workers-py

    To upgrade wrangler, run:

    Terminal window
    npm install -g wrangler@latest

    To upgrade uv, run:

    Terminal window
    uv self update

    To get started with Python Workers on Windows, refer to the Python packages documentation for full details on Pywrangler.

  1. Workers Observability now includes a query language that lets you write structured queries directly in the search bar to filter your logs and traces. The search bar doubles as a free text search box — type any term to search across all metadata and attributes, or write field-level queries for precise filtering.

    Workers Observability search bar with autocomplete suggestions and Query Builder sidebar filters

    Queries written in the search bar sync with the Query Builder sidebar, so you can write a query by hand and then refine it visually, or build filters in the Query Builder and see the corresponding query syntax. The search bar provides autocomplete suggestions for metadata fields and operators as you type.

    The query language supports:

    • Free text search — search everywhere with a keyword like error, or match an exact phrase with "exact phrase"
    • Field queries — filter by specific fields using comparison operators (for example, status = 500 or $workers.wallTimeMs > 100)
    • Operators=, !=, >, >=, <, <=, and : (contains)
    • Functionscontains(field, value), startsWith(field, prefix), regex(field, pattern), and exists(field)
    • Boolean logic — add conditions with AND, OR, and NOT

    Select the help icon next to the search bar to view the full syntax reference, including all supported operators, functions, and keyboard shortcuts.

    Go to the Workers Observability dashboard to try the query language.

  1. You can now deploy any existing project to Cloudflare Workers — even without a Wrangler configuration file — and wrangler deploy will just work.

    Starting with Wrangler 4.68.0, running wrangler deploy automatically configures your project by detecting your framework, installing required adapters, and deploying it to Cloudflare Workers.

    Using Wrangler locally

    Terminal window
    npx wrangler deploy

    When you run wrangler deploy in a project without a configuration file, Wrangler:

    1. Detects your framework from package.json
    2. Prompts you to confirm the detected settings
    3. Installs any required adapters
    4. Generates a wrangler.jsonc configuration file
    5. Deploys your project to Cloudflare Workers

    You can also use wrangler setup to configure without deploying, or pass --yes to skip prompts.

    Using the Cloudflare dashboard

    Automatic configuration pull request created by Workers Builds

    When you connect a repository through the Workers dashboard, a pull request is generated for you with all necessary files, and a preview deployment to check before merging.

    Background

    In December 2025, we introduced automatic configuration as an experimental feature. It is now generally available and the default behavior.

    If you have questions or run into issues, join the GitHub discussion.

  1. deleteAll() now deletes a Durable Object alarm in addition to stored data for Workers with a compatibility date of 2026-02-24 or later. This change simplifies clearing a Durable Object's storage with a single API call.

    Previously, deleteAll() only deleted user-stored data for an object. Alarm usage stores metadata in an object's storage, which required a separate deleteAlarm() call to fully clean up all storage for an object. The deleteAll() change applies to both KV-backed and SQLite-backed Durable Objects.

    JavaScript
    // Before: two API calls required to clear all storage
    await this.ctx.storage.deleteAlarm();
    await this.ctx.storage.deleteAll();
    // Now: a single call clears both data and the alarm
    await this.ctx.storage.deleteAll();

    For more information, refer to the Storage API documentation.

  1. Cloudflare Pipelines ingests streaming data via Workers or HTTP endpoints, transforms it with SQL, and writes it to R2 as Apache Iceberg tables. Today we're shipping three improvements to help you understand why streaming events get dropped, catch data quality issues early, and set up Pipelines faster.

    Dropped event metrics

    When stream events don't match the expected schema, Pipelines accepts them during ingestion but drops them when attempting to deliver them to the sink. To help you identify the root cause of these issues, we are introducing a new dashboard and metrics that surface dropped events with detailed error messages.

    The Errors tab in the Cloudflare dashboard showing deserialization errors grouped by type with individual error details

    Dropped events can also be queried programmatically via the new pipelinesUserErrorsAdaptiveGroups GraphQL dataset. The dataset breaks down failures by specific error type (missing_field, type_mismatch, parse_failure, or null_value) so you can trace issues back to the source.

    query GetPipelineUserErrors(
    $accountTag: String!
    $pipelineId: String!
    $datetimeStart: Time!
    $datetimeEnd: Time!
    ) {
    viewer {
    accounts(filter: { accountTag: $accountTag }) {
    pipelinesUserErrorsAdaptiveGroups(
    limit: 100
    filter: {
    pipelineId: $pipelineId
    datetime_geq: $datetimeStart
    datetime_leq: $datetimeEnd
    }
    orderBy: [count_DESC]
    ) {
    count
    dimensions {
    errorFamily
    errorType
    }
    }
    }
    }
    }

    For the full list of dimensions, error types, and additional query examples, refer to User error metrics.

    Typed Pipelines bindings

    Sending data to a Pipeline from a Worker previously used a generic Pipeline<PipelineRecord> type, which meant schema mismatches (wrong field names, incorrect types) were only caught at runtime as dropped events.

    Running wrangler types now generates schema-specific TypeScript types for your Pipeline bindings. TypeScript catches missing required fields and incorrect field types at compile time, before your code is deployed.

    TypeScript
    declare namespace Cloudflare {
    type EcommerceStreamRecord = {
    user_id: string;
    event_type: string;
    product_id?: string;
    amount?: number;
    };
    interface Env {
    STREAM: import("cloudflare:pipelines").Pipeline<Cloudflare.EcommerceStreamRecord>;
    }
    }

    For more information, refer to Typed Pipeline bindings.

    Improved Pipelines setup

    Setting up a new Pipeline previously required multiple manual steps: creating an R2 bucket, enabling R2 Data Catalog, generating an API token, and configuring format, compression, and rolling policies individually.

    The wrangler pipelines setup command now offers a Simple setup mode that applies recommended defaults and automatically creates the R2 bucket and enables R2 Data Catalog if they do not already exist. Validation errors during setup prompt you to retry inline rather than restarting the entire process.

    For a full walkthrough, refer to the Getting started guide.