Skip to content

Request handling

Your AI gateway supports different strategies for handling requests to providers, which allows you to manage AI interactions effectively and ensure your applications remain responsive and reliable.

Request timeouts

A request timeout allows you to return an error or trigger a retry if a provider takes too long to respond.

These timeouts help:

  • Improve user experience, by preventing users from waiting too long for a response
  • Proactively handle errors, by detecting unresponsive providers

A timeout is set in milliseconds. The timeout is based on when the first part of the response comes back. As long as the first part of the response returns within the specified timeframe — such as when streaming a response — your gateway will wait for the response.

Configuration

For a provider-specific endpoint, configure the timeout value by adding a cf-aig-request-timeout header.

Provider-specific endpoint example
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/workers-ai/@cf/meta/llama-3.1-8b-instruct \
--header 'Authorization: Bearer {cf_api_token}' \
--header 'Content-Type: application/json' \
--header 'cf-aig-request-timeout: 5000'
--data '{"prompt": "What is Cloudflare?"}'

Request retries

AI Gateway supports automatic retries for failed requests, with a maximum of five retry attempts.

This feature improves your application's resiliency, ensuring you can recover from temporary issues without manual intervention.

With request retries, you can adjust a combination of three properties:

  • Number of attempts (maximum of 5 tries)
  • How long before retrying (in milliseconds, maximum of 5 seconds)
  • Backoff method (constant, linear, or exponential)

On the final retry attempt, your gateway will wait until the request completes, regardless of how long it takes.

Configuration

For a provider-specific endpoint, configure the retry settings by adding different header values:

  • cf-aig-max-attempts (number)
  • cf-aig-retry-delay (number)
  • cf-aig-backoff ("constant" | "linear" | "exponential)