Use Pruna P-video through AI Gateway

This tutorial shows how to call the Pruna's P-video ↗ model on Replicate through AI Gateway.

Prerequisites

A Cloudflare account ↗
A Replicate account ↗ with an API token

1. Get a Replicate API token

Go to replicate.com ↗ and sign up for an account.
Once logged in, go to replicate.com/settings/api-tokens ↗.
Select Create token and give it a name.
Copy the token and store it somewhere safe.

Go to AI Gateway

Log into the Cloudflare dashboard ↗ and select your account.
Go to AI > AI Gateway.
Select Create Gateway.
Enter your Gateway name. Note: Gateway name has a 64 character limit.
Select Create.

To set up an AI Gateway using the API:

Create an API token with the following permissions:
- AI Gateway - Read
- AI Gateway - Edit
Get your Account ID.
Using that API token and Account ID, send a POST request to the Cloudflare API.

Note your Account ID and Gateway name for use in later steps.

To add authentication to your gateway, refer to Authenticated Gateway.

3. Construct the gateway URL

Replace the standard Replicate API base URL with the AI Gateway URL:

# Instead of:
https://api.replicate.com/v1

# Use:
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/replicate

For example, if your account ID is abc123 and your gateway is my-gateway:

https://gateway.ai.cloudflare.com/v1/abc123/my-gateway/replicate

4. Generate a video

P-video predictions generally complete within 30 seconds. Because this is under Replicate's 60-second synchronous limit, you can use the Prefer: wait header to send a request and get the result in a single call:

curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/replicate/predictions \
  --header "Authorization: Bearer {replicate_api_token}" \
  --header "cf-aig-authorization: Bearer {cloudflare_api_token}" \
  --header "Content-Type: application/json" \
  --header "Prefer: wait" \
  --data '{
    "version": "prunaai/p-video",
    "input": {
      "prompt": "A cat walking through a field of flowers in slow motion",
      "duration": 5,
      "aspect_ratio": "16:9",
      "resolution": "720p",
      "fps": 24
    }
  }'

Authorization — your Replicate API token (authenticates with Replicate).
cf-aig-authorization — your Cloudflare API token (for authenticated gateways).
Prefer: wait — blocks until the prediction completes instead of returning immediately.

For a full list of available input parameters, check out the prunaai/p-video model page ↗ on Replicate.

When the prediction completes, the response includes the output field with a URL to the generated video file.

5. (Optional) Use async polling for longer requests

If your request may exceed 60 seconds (for example, with longer durations or higher resolutions), use async mode instead. Send the request without the Prefer: wait header:

curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/replicate/predictions \
  --header "Authorization: Bearer {replicate_api_token}" \
  --header "cf-aig-authorization: Bearer {cloudflare_api_token}" \
  --header "Content-Type: application/json" \
  --data '{
    "version": "prunaai/p-video",
    "input": {
      "prompt": "A cat walking through a field of flowers in slow motion",
      "duration": 5,
      "aspect_ratio": "16:9",
      "resolution": "720p",
      "fps": 24
    }
  }'

The response includes a prediction id:

{
  "id": "xyz789...",
  "status": "starting",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/xyz789...",
    "cancel": "https://api.replicate.com/v1/predictions/xyz789.../cancel"
  }
}

Poll the prediction status until it completes:

curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/replicate/predictions/{prediction_id} \
  --header "Authorization: Bearer {replicate_api_token}" \
  --header "cf-aig-authorization: Bearer {cloudflare_api_token}"

Keep polling until status is succeeded (or failed). When complete, the output field contains a URL to the generated video file.

Next steps

From here you can:

Use logging to monitor requests and debug issues.
Set up rate limiting to control usage.
Use other models on Replicate or our other supported providers through AI Gateway.