Cloudflare Docs
AI Gateway
Edit this page
Report an issue with this page
Log into the Cloudflare dashboard
Set theme to dark (⇧+D)

Rate limiting

Rate limiting controls the traffic that reaches your application, which prevents expensive bills and suspicious activity.

​​ Parameters

You can define rate limits as the number of requests that get sent in a specific time frame. For example, you can limit your application to 100 requests per 60 seconds.

You can also select if you would like a fixed or sliding rate limiting technique. With rate limiting, we allow a certain number of requests within a window of time. For example, if it is a fixed rate, the window is based on time, so there would be no more than x requests in a ten minute window. If it is a sliding rate, there would be no more than x requests in the last ten minutes.

To illustrate this, let us say you had a limit of ten requests per ten minutes, starting at 12:00. So the fixed window is 12:00-12:10, 12:10-12:20, and so on. If you sent ten requests at 12:09 and ten requests at 12:11, all 20 requests would be successful in a fixed window strategy. However, they would fail in a sliding window strategy since there were more than ten requests in the last ten minutes.

​​ Default configuration

To set the default rate limiting configuration in the dashboard:

  1. Log into the Cloudflare dashboard and select your account.
  2. Go to AI > AI Gateway.
  3. Go to Settings.
  4. Enable Rate-limiting.
  5. Adjust the rate, time period, and rate limiting method as desired.

To set the default rate limiting configuration using the API:

  1. Create an API token with the following permissions:
  • AI Gateway - Read
  • AI Gateway - Edit
  1. Get your Account ID.
  2. Using that API token and Account ID, send a POST request to create a new Gateway and include a value for the rate_limiting_interval, rate_limiting_limit, and rate_limiting_technique.

This rate limiting behavior will be uniformly applied to all requests for that gateway.