Request handling
Your AI gateway supports different strategies for handling requests to providers, which allows you to manage AI interactions effectively and ensure your applications remain responsive and reliable.
A request timeout allows you to return an error or trigger a retry if a provider takes too long to respond.
These timeouts help:
- Improve user experience, by preventing users from waiting too long for a response
- Proactively handle errors, by detecting unresponsive providers
A timeout is set in milliseconds. The timeout is based on when the first part of the response comes back. As long as the first part of the response returns within the specified timeframe — such as when streaming a response — your gateway will wait for the response.
For a provider-specific endpoint, configure the timeout value by adding a cf-aig-request-timeout header.
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/workers-ai/@cf/meta/llama-3.1-8b-instruct \ --header 'Authorization: Bearer {cf_api_token}' \ --header 'Content-Type: application/json' \ --header 'cf-aig-request-timeout: 5000' --data '{"prompt": "What is Cloudflare?"}'AI Gateway supports automatic retries for failed requests, with a maximum of five retry attempts.
This feature improves your application's resiliency, ensuring you can recover from temporary issues without manual intervention.
With request retries, you can adjust a combination of three properties:
- Number of attempts (maximum of 5 tries)
- How long before retrying (in milliseconds, maximum of 5 seconds)
- Backoff method (constant, linear, or exponential)
On the final retry attempt, your gateway will wait until the request completes, regardless of how long it takes.
For a provider-specific endpoint, configure the retry settings by adding different header values:
cf-aig-max-attempts(number)cf-aig-retry-delay(number)cf-aig-backoff("constant" | "linear" | "exponential)