Long-running agents

Build agents that persist for days, weeks, or months — surviving restarts, waking on demand, and managing work that spans far longer than any single request.

The short version:

Agents are durable identities, not always-on processes.
State, SQL data, schedules, and fiber checkpoints survive hibernation and restarts.
In-memory variables, timers, open fetches, and local closures do not survive eviction.
Use keepAlive() for active work measured in minutes, runFiber() when work needs recovery, startFiber() when callers need durable acceptance and status, and Workflows for heavyweight multi-step jobs.
Use sub-agents when one parent coordinates many long-lived child contexts.

Why Cloudflare for long-running agents

Agents spend most of their time waiting. Waiting for user input (seconds to days), LLM responses (seconds to minutes), tool results (seconds to hours), human approvals (hours to days), or scheduled wake-ups (minutes to months). On a traditional VM or container, you pay for all that idle time. An agent that is 99% dormant and 1% active still costs you 100% of a server.

Durable Objects invert this model. An agent exists as an addressable entity with persistent state, but consumes zero compute when hibernated. When something happens — an HTTP request, a WebSocket message, a scheduled alarm, an inbound email — the platform wakes the agent, loads its state from SQLite, and hands it the event. The agent does its work, then goes back to sleep.

This is the actor model ↗: each agent has an identity, durable state, and wakes on message. You do not manage servers, routing, health checks, or restart logic. The platform handles placement, scaling, and recovery.

The economics follow directly:

	VMs / Containers	Durable Objects
Idle cost	Full compute cost, always	Zero (hibernated)
Scaling	Provision and manage capacity	Automatic, per-agent
State	External database required	Built-in SQLite
Recovery	You build it (process managers, health checks)	Platform restarts, state survives
Identity / routing	You build it (load balancers, sticky sessions)	Built-in (name to agent)
10,000 agents, each active 1% of the time	10,000 always-on instances	~100 active at any moment

For agents — which are inherently bursty, stateful, and long-lived — this is a natural fit.

The lifecycle of a long-running agent

A long-running agent is not a process that runs continuously. It is an entity that exists continuously but runs intermittently. Understanding the lifecycle is key to building agents that work reliably over long timelines.

Wake → onStart() → handle events → idle (~2 min) → hibernation
  ▲                                                      │
  └──────────────── alarm or request wakes agent ────────┘

Eviction (crash / redeploy) can happen at any point.
State persists in SQLite. Agent restarts on next event.

What survives

this.state — persisted to SQLite on every setState() call
this.sql data — all SQLite tables you create
Scheduled tasks — stored in SQLite, trigger alarms to wake the agent
Connection state — connection.setState() data for each WebSocket client
Fiber checkpoints and ledgers — stash() data from runFiber() and retained startFiber() status rows

Any higher-level abstractions built on SQLite also survive, since they share the same durable storage.

What does not survive

In-memory variables — class fields not stored via setState() or this.sql
Running timers — setTimeout, setInterval are lost on hibernation/eviction
Open fetch requests — in-flight HTTP calls are abandoned
Local closures — callbacks and promise chains are lost

The implication: any work that matters must be persisted or recoverable. The SDK provides primitives for this — schedules, fibers, queues — but understanding the boundary between "in-memory" and "durable" is essential.

Running example: a project manager agent

Throughout this doc, we build up a project manager agent that:

Lives for the duration of a project (weeks or months)
Tracks tasks, assigns work to sub-agents, and reports progress
Wakes up on schedule to check deadlines and send reminders
Reacts to external events (webhooks from GitHub, emails from team members)
Handles long-running operations (CI pipelines, code reviews, deployments)
Survives any number of restarts and evictions along the way

import { Agent } from "agents";

type ProjectState = {
  name: string;
  status: "planning" | "active" | "review" | "complete";
  tasks: Task[];
  plan: Plan | null;
};

type Task = {
  id: string;
  title: string;
  status: "pending" | "in_progress" | "blocked" | "complete";
  assignee?: string;
  dueDate?: string;
  completedAt?: number;
  externalJobId?: string;
};

export class ProjectManager extends Agent<ProjectState> {
  initialState: ProjectState = {
    name: "",
    status: "planning",
    tasks: [],
    plan: null,
  };
}

The Plan type is introduced in Planning as a durability strategy. We add capabilities to this agent section by section.

Waking up: how agents get activated

A hibernated agent can be woken by any of these sources:

Wake source	How it works	Example
HTTP request	Any request to the agent's URL triggers `onRequest()`	A webhook from GitHub
WebSocket connection	A client connects, triggering `onConnect()`	A team member opens the dashboard
RPC call	Another Worker or agent calls a method via service binding or `@callable`	A coordinator agent delegates a task
Scheduled alarm	A stored schedule fires, triggering your callback	Daily standup reminder at 9am
Email	An inbound email triggers `onEmail()`	A team member replies to a status email

The pattern extends naturally to any event source that can reach a Worker — anything from telephony webhooks to chat platform bots. An external signal arrives, the platform wakes the agent, and the agent handles it.

The agent does not need to be "started" or "deployed" separately for each wake source — they all route to the same Durable Object instance. The agent's identity (its name) is the routing key.

export class ProjectManager extends Agent<ProjectState> {
  async onStart() {
    // Daily deadline check at 9am UTC — idempotent, safe across restarts
    await this.schedule(
      "0 9 * * *",
      "checkDeadlines",
      {},
      {
        idempotent: true,
      },
    );

    // Progress sync every 30 minutes
    await this.scheduleEvery(1800, "syncProgress");
  }

  async onRequest(request: Request): Promise<Response> {
    const url = new URL(request.url);

    if (url.pathname.endsWith("/github-webhook")) {
      const event = await request.json();
      await this.handleGitHubEvent(event);
      return new Response("OK");
    }

    return Response.json({
      project: this.state.name,
      status: this.state.status,
    });
  }

  async checkDeadlines() {
    /* ... find overdue tasks, broadcast alerts ... */
  }
  async syncProgress() {
    /* ... check on sub-agents, update task statuses ... */
  }
}

Staying alive during long work

Sometimes an agent needs to do work that takes longer than the idle eviction window (~70–140 seconds). Streaming an LLM response, orchestrating a multi-step tool chain, or waiting on a slow API all risk the agent being evicted mid-flight.

keepAlive() prevents this by creating a heartbeat that resets the inactivity timer:

export class ProjectManager extends Agent<ProjectState> {
  async generateProjectPlan(goal: string) {
    const result = await this.keepAliveWhile(async () => {
      const plan = await this.callLLM(`Create a project plan for: ${goal}`);
      const tasks = await this.callLLM(
        `Break this into tasks: ${JSON.stringify(plan)}`,
      );
      return { plan, tasks };
    });

    this.setState({
      ...this.state,
      status: "active",
      plan: result.plan,
      tasks: result.tasks,
    });
  }
}

keepAliveWhile() is the recommended approach — it guarantees the heartbeat is cleaned up when the work finishes (or throws). For manual control, keepAlive() returns a disposer:

const dispose = await this.keepAlive();
try {
  await longWork();
} finally {
  dispose();
}

When keepAlive is not enough

keepAlive is for work measured in minutes, not hours. For truly long-running operations, use a different strategy:

Duration	Strategy
Seconds	Normal request handling
Minutes	`keepAlive()` / `keepAliveWhile()`
Minutes	`startFiber()` when retryable acceptance matters
Minutes to hours	Workflows
Hours to days	Async pattern: start job, hibernate, wake on completion

Surviving crashes: fibers and recovery

An agent can be evicted at any time — a deploy, a platform restart, or hitting resource limits. If the agent was mid-task, that work is lost unless it was checkpointed.

runFiber() provides crash-recoverable execution. It persists a row in SQLite for the duration of the work, and lets you stash() intermediate state. If the agent is evicted, the fiber row survives, and onFiberRecovered() is called on the next activation.

Use startFiber() when the important boundary is durable acceptance. It adds an idempotency key, retained status records, inspection, cancellation, and cleanup on top of the same fiber machinery. By default it returns after acceptance; pass waitForCompletion: true when the request should stay open until the accepted job reaches a terminal status. This is a good fit for webhooks where the provider may retry delivery and the agent must avoid starting duplicate visible side effects.

export class ProjectManager extends Agent<ProjectState> {
  async executeTask(task: Task) {
    await this.runFiber(`task:${task.id}`, async (ctx) => {
      const resources = await this.gatherResources(task);
      ctx.stash({ phase: "prepared", resources, task });

      const result = await this.runSubAgent(task, resources);
      ctx.stash({ phase: "executed", result, task });

      await this.updateTaskStatus(task.id, "complete", result);
    });
  }

  async onFiberRecovered(ctx: FiberRecoveryContext) {
    if (!ctx.name.startsWith("task:")) return;
    const { phase, task } = ctx.snapshot as { phase: string; task: Task };

    if (phase === "prepared") {
      await this.executeTask(task);
    } else if (phase === "executed") {
      await this.updateTaskStatus(
        task.id,
        "complete",
        (ctx.snapshot as { result: unknown }).result,
      );
    }
  }
}

The pattern is: checkpoint before expensive work, recover from the last checkpoint. This is not automatic replay — you decide what recovery means for your domain.

For the full API reference — FiberContext, FiberRecoveryContext, concurrent fibers, inline vs fire-and-forget patterns — refer to Durable Execution.

Handling long async operations

The project manager frequently kicks off work that takes far longer than any single activation — a CI pipeline runs for 20 minutes, a design review takes a day, a video asset takes hours to generate. The agent should not stay alive for any of this. Instead, it starts the work, persists the job ID in state, and hibernates. When the result arrives — via a callback, a poll, or a workflow completion — the agent wakes, correlates the result, and moves on.

Pattern: webhook callback

The project manager starts a CI pipeline for a task. The pipeline takes 20 minutes. Rather than holding a connection open, the agent registers its own URL as the callback and goes to sleep:

export class ProjectManager extends Agent<ProjectState> {
  async startCIPipeline(task: Task) {
    const response = await fetch("https://ci.example.com/api/pipelines", {
      method: "POST",
      body: JSON.stringify({
        repo: "org/project",
        branch: "main",
        callback_url: `${this.url}/ci-callback?taskId=${task.id}`,
      }),
    });

    const { pipelineId } = await response.json();
    this.updateTask(task.id, {
      status: "in_progress",
      externalJobId: pipelineId,
    });
  }

  async onRequest(request: Request): Promise<Response> {
    const url = new URL(request.url);
    if (url.pathname.endsWith("/ci-callback")) {
      const taskId = url.searchParams.get("taskId");
      const result = await request.json();
      this.updateTask(taskId, {
        status: result.status === "success" ? "complete" : "blocked",
      });
      return new Response("OK");
    }
    // ... other routes
  }
}

Pattern: polling with schedule

Not every external service supports callbacks. When the project manager submits a video asset for generation, it needs to check back periodically until the job completes:

export class ProjectManager extends Agent<ProjectState> {
  async startVideoGeneration(task: Task) {
    const response = await fetch("https://video-api.example.com/generate", {
      method: "POST",
      body: JSON.stringify({ prompt: task.title }),
    });
    const { jobId } = await response.json();
    this.updateTask(task.id, { status: "in_progress", externalJobId: jobId });
    await this.schedule(60, "pollExternalJob", {
      taskId: task.id,
      jobId,
      attempt: 1,
    });
  }

  async pollExternalJob(payload: {
    taskId: string;
    jobId: string;
    attempt: number;
  }) {
    const response = await fetch(
      `https://video-api.example.com/status/${payload.jobId}`,
    );
    const status = await response.json();

    if (status.state === "complete" || status.state === "failed") {
      this.updateTask(payload.taskId, {
        status: status.state === "complete" ? "complete" : "blocked",
      });
      return;
    }

    const nextDelay = Math.min(60 * payload.attempt, 600);
    await this.schedule(nextDelay, "pollExternalJob", {
      ...payload,
      attempt: payload.attempt + 1,
    });
  }
}

Pattern: workflow delegation

A production deployment involves multiple steps that must each retry independently — build, test, stage, promote. The project manager should not manage these steps internally; it delegates to a Workflow that handles retries and step sequencing:

export class ProjectManager extends Agent<ProjectState> {
  async startDeployment(task: Task) {
    const instanceId = await this.runWorkflow("DEPLOY_WORKFLOW", {
      taskId: task.id,
      environment: "production",
    });
    this.updateTask(task.id, {
      status: "in_progress",
      externalJobId: instanceId,
    });
  }

  async onWorkflowComplete(
    workflowName: string,
    instanceId: string,
    result?: unknown,
  ) {
    const task = this.state.tasks.find((t) => t.externalJobId === instanceId);
    if (task) this.updateTask(task.id, { status: "complete" });
  }
}

Reconstructing context after a long wait

The CI pipeline finishes 20 minutes later. The webhook wakes the project manager. The task status is updated. But now what? If the agent was using an LLM to orchestrate work — deciding which task to run next, drafting a status report, reasoning about blockers — it needs to pick up that reasoning thread. The original prompt, the in-flight tool call, the chain of thought — all gone from memory.

This is the fundamental challenge of long-running AI agents. Most frameworks assume tool calls complete within the LLM's timeout and do not address this directly.

Three approaches work today:

Replay the full conversation history. AIChatAgent persists all messages in SQLite. When the result arrives, append it to the history and re-invoke the LLM. This is the simplest approach but re-processes the entire context window.

Stash a continuation summary. Before hibernating, persist a compact description of what the agent was doing and what to do with the result:

ctx.stash({
  task: "Waiting for CI results",
  onSuccess: "Mark task complete, move to next step in plan",
  onFailure: "Notify team, schedule retry in 1 hour",
  relevantContext: { taskId, planStep: 3 },
});

On recovery, use the stash to construct a focused prompt rather than replaying everything.

Use the plan as context. If the agent has a structured plan, the plan itself provides sufficient context: "I am on step 3 of 7, the step was 'run CI pipeline', the result just arrived." This is the most robust approach for long-running agents — the plan is both a recovery mechanism and a context reconstruction strategy. Refer to the next section.

Planning as a durability strategy

A structured plan is not just useful for showing progress to users — it is a durability mechanism. An agent with a plan can recover from any interruption by looking at where it left off.

type Plan = {
  goal: string;
  steps: PlanStep[];
  currentStep: number;
  createdAt: string;
  updatedAt: string;
};

type PlanStep = {
  id: string;
  description: string;
  status: "pending" | "in_progress" | "complete" | "failed" | "skipped";
  result?: unknown;
};

export class ProjectManager extends Agent<ProjectState> {
  async createPlan(goal: string) {
    const steps = await this.keepAliveWhile(async () => {
      return this.callLLM(`
        Break down this project goal into concrete steps.
        Return a JSON array of { id, description } objects.
        Goal: ${goal}
      `);
    });

    this.setState({
      ...this.state,
      plan: {
        goal,
        steps: steps.map((s: { id: string; description: string }) => ({
          ...s,
          status: "pending" as const,
        })),
        currentStep: 0,
        createdAt: new Date().toISOString(),
        updatedAt: new Date().toISOString(),
      },
    });

    await this.schedule(0, "executeNextStep");
  }

  async executeNextStep() {
    const { plan } = this.state;
    if (!plan || plan.currentStep >= plan.steps.length) {
      this.setState({ ...this.state, status: "complete" });
      return;
    }

    const step = plan.steps[plan.currentStep];

    try {
      const result = await this.keepAliveWhile(() => this.executeStep(step));

      const updatedSteps = plan.steps.map((s) =>
        s.id === step.id ? { ...s, status: "complete" as const, result } : s,
      );
      this.setState({
        ...this.state,
        plan: {
          ...plan,
          steps: updatedSteps,
          currentStep: plan.currentStep + 1,
          updatedAt: new Date().toISOString(),
        },
      });

      await this.schedule(0, "executeNextStep");
    } catch (error) {
      const updatedSteps = plan.steps.map((s) =>
        s.id === step.id ? { ...s, status: "failed" as const } : s,
      );
      this.setState({
        ...this.state,
        plan: {
          ...plan,
          steps: updatedSteps,
          updatedAt: new Date().toISOString(),
        },
      });
    }
  }
}

This pattern has several advantages for long-running agents:

Recovery is trivial — on restart, check plan.currentStep and resume
Progress is visible — clients see which steps are done and what is next
Re-planning is possible — if a step fails or requirements change, the agent can revise the remaining steps without losing completed work
Human oversight — the plan is a natural approval checkpoint ("here is what I am going to do — proceed?")
Context reconstruction — the plan tells the LLM where it is, what happened, and what to do next, without replaying the full conversation

Delegating to sub-agents

A project manager does not do everything itself. It delegates specialized work to sub-agents — each with their own identity, state, and lifecycle.

export class ProjectManager extends Agent<ProjectState> {
  async delegateTask(task: Task) {
    const researcher = await this.subAgent(
      ResearchAgent,
      `research-${task.id}`,
    );

    const findings = await researcher.research(task.title);

    this.updateTask(task.id, { status: "complete" });
    return findings;
  }
}

Sub-agents have their own state, schedules, durable fibers, and lifecycle. They are colocated under the parent, but each child stores its own SQLite data and runs callbacks with the child as this.

Because facets do not have independent alarm slots, the top-level parent owns the physical Durable Object alarm. The Agents SDK records which sub-agent owns each schedule or fiber recovery lease, wakes the parent, and routes the callback back into the child. The parent does not need to stay active while the sub-agent works — it can start the work, hibernate, and be woken by the child-owned schedule or recovery check.

For the full subAgent() API — typed RPC stubs, client routing, access control, storage isolation, and alarm-backed APIs — refer to Sub-agents. For AI-specific sub-agent streaming (running full LLM turns through a child agent), refer to Think: Sub-agent RPC.

Recovering interrupted LLM streams

The patterns above handle the project manager's coordination work — scheduling, delegating, polling. But the project manager also uses an LLM directly: generating plans, summarizing progress, drafting status emails. Those LLM calls stream tokens over a connection that cannot be resumed if the agent is evicted mid-response.

For chat-oriented agents built on AIChatAgent, this is an even sharper problem — the user is watching the response stream in real time and sees it stop mid-sentence. chatRecovery wraps each chat turn in a runFiber, providing automatic keepAlive during streaming and a recovery hook when the agent restarts:

import { AIChatAgent } from "@cloudflare/ai-chat";
import type {
  ChatRecoveryContext,
  ChatRecoveryOptions,
} from "@cloudflare/ai-chat";

class ProjectChat extends AIChatAgent<Env> {
  override chatRecovery = true;

  override async onChatRecovery(
    ctx: ChatRecoveryContext,
  ): Promise<ChatRecoveryOptions> {
    // ctx.partialText    — text generated before eviction
    // ctx.recoveryData   — whatever you stashed via this.stash()
    // ctx.messages        — full conversation history
    // ctx.createdAt       — when the interrupted turn started
    return {};
  }
}

The right recovery strategy depends on the LLM provider:

Provider	Strategy	How it works	Token cost
Workers AI	Continue from partial	`continueLastTurn()` — model continues via assistant prefill	Low
OpenAI (Responses API)	Retrieve completed response	Stash `responseId` during streaming, retrieve on recovery	Zero
Anthropic	Synthetic continuation	Persist partial, send a synthetic user message asking the model to continue	Medium
Other	Try prefill, fall back to synthetic	`continueLastTurn()` if the provider supports it, synthetic message otherwise	Varies

Use ctx.createdAt to suppress stale recoveries. For example, if a recovered chat turn is older than a few minutes, you may persist the partial answer but skip automatic continuation to avoid surprising the user with an old response.

Think enables chatRecovery by default. The default path persists partial output and auto-continues or retries the turn when safe, so many apps do not need a custom hook. Override onChatRecovery when a provider has a better recovery strategy, or configure chatRecovery = { maxAttempts, terminalMessage, onExhausted } to tune the terminal user experience.

If the agent is interrupted before any assistant stream chunks are written, there is no partial assistant message to continue. When the latest persisted message is still the unanswered user message from that turn, chat recovery retries the turn automatically unless onChatRecovery returns { continue: false }.

Managing state over time

An agent that runs for months accumulates data: conversation history, timeline events, completed tasks, schedule records. Without management, this grows unbounded.

Housekeeping

Schedule periodic cleanup to prune old data and archive completed work:

export class ProjectManager extends Agent<ProjectState> {
  async onStart() {
    await this.schedule("0 0 * * *", "housekeeping", {}, { idempotent: true });
  }

  async housekeeping() {
    const cutoff = Date.now() - 30 * 24 * 60 * 60 * 1000;
    const toArchive = this.state.tasks.filter(
      (t) => t.status === "complete" && (t.completedAt ?? 0) < cutoff,
    );
    for (const task of toArchive) {
      this
        .sql`INSERT INTO archived_tasks (id, data) VALUES (${task.id}, ${JSON.stringify(task)})`;
    }
    this.setState({
      ...this.state,
      tasks: this.state.tasks.filter(
        (t) => !toArchive.some((a) => a.id === t.id),
      ),
    });

    this.deleteWorkflows({
      status: ["complete", "errored"],
      createdBefore: new Date(Date.now() - 7 * 24 * 60 * 60 * 1000),
    });
  }
}

Conversation history management

For agents that use AIChatAgent, conversation history can grow large over extended lifespans. Without management, a 3-month conversation will exhaust the LLM's context window long before the project ends.

Strategies for managing conversation size:

Sliding window — keep only the last N messages in the active context. Simple and predictable.
Summarization — periodically summarize older messages and replace them with a compact summary. Original messages can remain in SQLite for audit.
Selective retention — retain messages that contain decisions, approvals, and key context while pruning routine exchanges.

End of life

A long-running agent eventually completes its purpose. The project ships, the investigation concludes, the monitoring window closes. Clean up explicitly:

export class ProjectManager extends Agent<ProjectState> {
  async completeProject() {
    const schedules = await this.listSchedules();
    for (const schedule of schedules) {
      await this.cancelSchedule(schedule.id);
    }

    this.setState({ ...this.state, status: "complete" });

    // All SQLite data, schedules, and state are permanently deleted
    await this.destroy();
  }
}

this.destroy() is permanent. If you may need the agent's data later, archive it to an external store (R2, D1, or an API call) before destroying. For agents that might be reactivated, simply mark them as complete and let them hibernate — they cost nothing when idle.

When to use Workflows vs agent-internal patterns

Both Workflows and agent-internal primitives (schedules, fibers, queues) support long-running work. The right choice depends on the nature of the work:

	Agent-internal	Workflows
Best for	Agent-centric work: scheduling, polling, state updates	Independent multi-step pipelines
Durability	SQLite (survives eviction)	Workflow engine (survives everything)
Retries	`this.retry()`, schedule-level retries	Per-step retries with backoff
Max duration	Minutes per activation (with `keepAlive`)	30 minutes per step, unlimited steps
Human approval	Build it yourself (state + WebSocket)	Built-in `waitForApproval()`
Complexity	Lower — everything is in the agent	Higher — separate class, wrangler config

A pragmatic rule: if the work is about the agent managing its own lifecycle (checking deadlines, syncing state, sending reminders), use schedules and fibers. If the work is a discrete pipeline that could fail and retry independently (deploy, data processing, report generation), use a Workflow.

The project manager agent uses both: schedules for its own rhythms (daily standups, progress syncs), and Workflows for heavyweight operations (deployments, CI pipelines).

Summary

Long-running agents on Cloudflare are not long-running processes. They are durable entities that wake, work, and sleep — potentially over weeks or months. The key primitives:

Primitive	Purpose
`setState()` / `this.sql`	Persist state across activations
`schedule()` / `scheduleEvery()`	Wake the agent at future times
`keepAlive()` / `keepAliveWhile()`	Prevent eviction during active work
`runFiber()` / `stash()`	Checkpoint and recover long tasks
`startFiber()`	Durably accept, inspect, and cancel jobs
`chatRecovery`	Recover interrupted LLM streams
`onRequest()` / `onEmail()` / RPC	Wake on external events
`runWorkflow()`	Delegate heavyweight multi-step work
`subAgent()`	Delegate specialized work to child agents
Structured plans in state	Enable recovery, visibility, and re-planning

For the project manager agent, these compose into an agent that:

Plans — breaks goals into steps, persists the plan in state
Executes — runs steps one at a time, hibernating between them
Reacts — wakes on webhooks, emails, and schedules
Recovers — resumes from the last checkpoint after any interruption
Delegates — hands off work to sub-agents and Workflows
Maintains — prunes old data, archives completed work, manages its own lifecycle
Ends — cleans up and destroys itself when the project is done

The agent does not need to run continuously to do any of this. It just needs to exist.

Durable Execution — runFiber(), startFiber(), stash(), and crash recovery
Schedule tasks — delayed, cron, and interval tasks
Retries — retry options and patterns
Workflows — durable multi-step processing
Store and sync state — setState() and persistence
WebSockets — lifecycle hooks and hibernation
Callable methods — RPC via @callable and service bindings
Email routing — receiving inbound email
Webhooks — receiving external events
Human in the loop — approval flows