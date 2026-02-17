Retry failed operations with exponential backoff and jitter. The Agents SDK provides built-in retry support for scheduled tasks, queued tasks, and a general-purpose this.retry() method for your own code.
Overview
Transient failures are common when calling external APIs, interacting with other services, or running background tasks. The retry system handles these automatically:
Exponential backoff — each retry waits longer than the last
If the callback throws, it is retried according to the retry options. If all attempts fail, the error is logged and routed through onError(). The schedule is still removed (for one-time schedules) or rescheduled (for cron/interval) regardless of success or failure.
Retries in queues
Pass retry options when adding a task to the queue:
Validation resolves partial options against class-level or built-in defaults before checking cross-field constraints. This means { baseDelayMs: 5000 } is caught immediately when the resolved maxDelayMs is 3000, rather than failing later at execution time.
Default behavior
Even without explicit retry options, scheduled and queued callbacks are retried with sensible defaults:
Setting
Default
maxAttempts
3
baseDelayMs
100
maxDelayMs
3000
These defaults apply to this.retry(), queue(), schedule(), and scheduleEvery(). Per-call-site options override them.
Class-level defaults
Override the defaults for your entire agent via static options:
The delay between retries uses full jitter exponential backoff:
This means early retries are fast (often under 200ms), and later retries back off to avoid overwhelming a failing service. The randomization (jitter) prevents multiple agents from retrying at the exact same moment.
How it works
Backoff strategy
The retry system uses the "Full Jitter" strategy from the AWS Architecture Blog ↗. Given 3 attempts with default settings:
Attempt
Upper Bound
Actual Delay
1
min(2^1 * 100, 3000) = 200ms
random(0, 200ms)
2
min(2^2 * 100, 3000) = 400ms
random(0, 400ms)
3
(no retry — final attempt)
—
With maxAttempts: 5 and baseDelayMs: 500:
Attempt
Upper Bound
Actual Delay
1
min(2 * 500, 3000) = 1000ms
random(0, 1000ms)
2
min(4 * 500, 3000) = 2000ms
random(0, 2000ms)
3
min(8 * 500, 3000) = 3000ms
random(0, 3000ms)
4
min(16 * 500, 3000) = 3000ms
random(0, 3000ms)
5
(no retry — final attempt)
—
MCP server retries
When adding an MCP server, you can configure retry options for connection and reconnection attempts:
No dead-letter queue. If a queued or scheduled task fails all retry attempts, it is removed. Implement your own persistence if you need to track failed tasks.
Retry delays block the agent. During the backoff delay, the Durable Object is awake but idle. For short delays (under 3 seconds) this is fine. For longer recovery times, use this.schedule() instead.
Queue retries are head-of-line blocking. Queue items are processed sequentially. If one item is being retried with long delays, it blocks all subsequent items. If you need independent retry behavior, use this.retry() inside the callback rather than per-task retry options on queue().
No circuit breaker. The retry system does not track failure rates across calls. If a service is persistently down, each task will exhaust its retry budget independently.
shouldRetry is only available on this.retry(). The shouldRetry predicate cannot be used with schedule() or queue() because functions cannot be serialized to the database. For scheduled/queued tasks, handle non-retryable errors inside the callback itself.