Skip to content
Start here

Crawl websites.

POST/accounts/{account_id}/browser-rendering/crawl

Starts a crawl job for the provided URL and its children. Check available options like gotoOptions and waitFor* to control page load behaviour.

Security
API Token

The preferred authorization scheme for interacting with the Cloudflare API. Create a token.

Example:Authorization: Bearer Sn3lZJTBX6kkg7OdcBUAxOO963GEIyGQqnFTOFYY
API Email + API Key

The previous authorization scheme for interacting with the Cloudflare API, used in conjunction with a Global API key.

Example:X-Auth-Email: user@example.com

The previous authorization scheme for interacting with the Cloudflare API. When possible, use API tokens instead of Global API keys.

Example:X-Auth-Key: 144c9defac04969c7bfad8efaa8ea194
Accepted Permissions (at least one required)
Browser Rendering Write
Path ParametersExpand Collapse
account_id: string

Account ID.

Query ParametersExpand Collapse
cacheTTL: optional number

Cache TTL default is 5s. Set to 0 to disable.

maximum86400
Body ParametersJSONExpand Collapse
body: object { url, actionTimeout, addScriptTag, 25 more } or object { render, url, crawlPurposes, 8 more }
One of the following:
object { url, actionTimeout, addScriptTag, 25 more }
url: string

URL to navigate to, eg. https://example.com.

formaturi
actionTimeout: optional number

The maximum duration allowed for the browser action to complete after the page has loaded (such as taking screenshots, extracting content, or generating PDFs). If this time limit is exceeded, the action stops and returns a timeout error.

maximum120000
addScriptTag: optional array of object { id, content, type, url }

Adds a <script> tag into the page with the desired URL or content.

id: optional string
content: optional string
type: optional string
url: optional string
addStyleTag: optional array of object { content, url }

Adds a <link rel="stylesheet"> tag into the page with the desired URL or a <style type="text/css"> tag with the content.

content: optional string
url: optional string
allowRequestPattern: optional array of string

Only allow requests that match the provided regex patterns, eg. '/^.*.(css)'.

allowResourceTypes: optional array of "document" or "stylesheet" or "image" or 15 more

Only allow requests that match the provided resource types, eg. 'image' or 'script'.

One of the following:
"document"
"stylesheet"
"image"
"media"
"font"
"script"
"texttrack"
"xhr"
"fetch"
"prefetch"
"eventsource"
"websocket"
"manifest"
"signedexchange"
"ping"
"cspviolationreport"
"preflight"
"other"
authenticate: optional object { password, username }

Provide credentials for HTTP authentication.

password: string
minLength1
username: string
minLength1
bestAttempt: optional boolean

Attempt to proceed when 'awaited' events fail or timeout.

cookies: optional array of object { name, value, domain, 11 more }

Check options.

name: string

Cookie name.

value: string
domain: optional string
expires: optional number
httpOnly: optional boolean
partitionKey: optional string
path: optional string
priority: optional "Low" or "Medium" or "High"
One of the following:
"Low"
"Medium"
"High"
sameParty: optional boolean
sameSite: optional "Strict" or "Lax" or "None"
One of the following:
"Strict"
"Lax"
"None"
secure: optional boolean
sourcePort: optional number
sourceScheme: optional "Unset" or "NonSecure" or "Secure"
One of the following:
"Unset"
"NonSecure"
"Secure"
url: optional string
crawlPurposes: optional array of "search" or "ai-input" or "ai-train"

List of crawl purposes to respect Content-Signal directives in robots.txt. Allowed values: 'search', 'ai-input', 'ai-train'. Learn more: https://contentsignals.org/. Default: ['search', 'ai-input', 'ai-train'].

One of the following:
"search"
"ai-input"
"ai-train"
depth: optional number

Maximum number of levels deep the crawler will traverse from the starting URL.

maximum100000
minimum1
emulateMediaType: optional string
formats: optional array of "html" or "markdown" or "json"

Formats to return. Default is html.

One of the following:
"html"
"markdown"
"json"
gotoOptions: optional object { referer, referrerPolicy, timeout, waitUntil }

Check options.

referer: optional string
referrerPolicy: optional string
timeout: optional number
maximum60000
waitUntil: optional "load" or "domcontentloaded" or "networkidle0" or "networkidle2" or array of "load" or "domcontentloaded" or "networkidle0" or "networkidle2"
One of the following:
"load" or "domcontentloaded" or "networkidle0" or "networkidle2"
One of the following:
"load"
"domcontentloaded"
"networkidle0"
"networkidle2"
array of "load" or "domcontentloaded" or "networkidle0" or "networkidle2"
One of the following:
"load"
"domcontentloaded"
"networkidle0"
"networkidle2"
jsonOptions: optional object { custom_ai, prompt, response_format }

Options for JSON extraction.

custom_ai: optional array of object { authorization, model }

Optional list of custom AI models to use for the request. The models will be tried in the order provided, and in case a model returns an error, the next one will be used as fallback.

authorization: string

Authorization token for the AI model: Bearer <token>.

model: string

AI model to use for the request. Must be formed as <provider>/<model_name>, e.g. workers-ai/@cf/meta/llama-3.3-70b-instruct-fp8-fast.

prompt: optional string
response_format: optional object { type, json_schema }
type: string
json_schema: optional map[string or number or boolean or 2 more]

Schema for the response format. More information here: https://developers.cloudflare.com/workers-ai/json-mode/.

One of the following:
string
number
boolean
unknown
array of string
limit: optional number

Maximum number of URLs to crawl.

maximum100000
minimum1
maxAge: optional number

Maximum age of a resource that can be returned from cache in seconds. Default is 1 day.

maximum604800
minimum0
modifiedSince: optional number

Unix timestamp (seconds since epoch) indicating to only crawl pages that were modified since this time. For sitemap URLs with a lastmod field, this is compared directly. For other URLs, the crawler will use If-Modified-Since header when fetching. URLs without modification information (no lastmod in sitemap and no Last-Modified header support) will be crawled. Note: This works in conjunction with maxAge - both filters must pass for a cached resource to be used. Must be within the last year and not in the future.

exclusiveMinimum
minimum0
options: optional object { excludePatterns, includeExternalLinks, includePatterns, includeSubdomains }

Additional options for the crawler.

excludePatterns: optional array of string

Exclude links matching the provided wildcard patterns in the crawl job. Example: 'https://example.com/privacy/**'.

includePatterns: optional array of string

Include only links matching the provided wildcard patterns in the crawl job. Include patterns are evaluated before exclude patterns. URLs that match any of the specified include patterns will be included in the crawl job. Example: 'https://example.com/blog/**'.

includeSubdomains: optional boolean

Include links to subdomains in the crawl job. This option is ignored if includeExternalLinks is true.

rejectRequestPattern: optional array of string

Block undesired requests that match the provided regex patterns, eg. '/^.*.(css)'.

rejectResourceTypes: optional array of "document" or "stylesheet" or "image" or 15 more

Block undesired requests that match the provided resource types, eg. 'image' or 'script'.

One of the following:
"document"
"stylesheet"
"image"
"media"
"font"
"script"
"texttrack"
"xhr"
"fetch"
"prefetch"
"eventsource"
"websocket"
"manifest"
"signedexchange"
"ping"
"cspviolationreport"
"preflight"
"other"
render: optional true

Whether to render the page or fetch static content. True by default.

setExtraHTTPHeaders: optional map[string]
setJavaScriptEnabled: optional boolean
source: optional "sitemaps" or "links" or "all"

Source of links to crawl. 'sitemaps' - only crawl URLs from sitemaps, 'links' - only crawl URLs scraped from pages, 'all' - crawl both sitemap and scraped links (default).

One of the following:
"sitemaps"
"links"
"all"
viewport: optional object { height, width, deviceScaleFactor, 3 more }

Check options.

height: number
width: number
deviceScaleFactor: optional number
hasTouch: optional boolean
isLandscape: optional boolean
isMobile: optional boolean
waitForSelector: optional object { selector, hidden, timeout, visible }

Wait for the selector to appear in page. Check options.

selector: string
hidden: optional true
timeout: optional number
maximum120000
visible: optional true
waitForTimeout: optional number

Waits for a specified timeout before continuing.

maximum120000
object { render, url, crawlPurposes, 8 more }
render: false

Whether to render the page or fetch static content. True by default.

url: string

URL to navigate to, eg. https://example.com.

formaturi
crawlPurposes: optional array of "search" or "ai-input" or "ai-train"

List of crawl purposes to respect Content-Signal directives in robots.txt. Allowed values: 'search', 'ai-input', 'ai-train'. Learn more: https://contentsignals.org/. Default: ['search', 'ai-input', 'ai-train'].

One of the following:
"search"
"ai-input"
"ai-train"
depth: optional number

Maximum number of levels deep the crawler will traverse from the starting URL.

maximum100000
minimum1
formats: optional array of "html" or "markdown" or "json"

Formats to return. Default is html.

One of the following:
"html"
"markdown"
"json"
jsonOptions: optional object { custom_ai, prompt, response_format }

Options for JSON extraction.

custom_ai: optional array of object { authorization, model }

Optional list of custom AI models to use for the request. The models will be tried in the order provided, and in case a model returns an error, the next one will be used as fallback.

authorization: string

Authorization token for the AI model: Bearer <token>.

model: string

AI model to use for the request. Must be formed as <provider>/<model_name>, e.g. workers-ai/@cf/meta/llama-3.3-70b-instruct-fp8-fast.

prompt: optional string
response_format: optional object { type, json_schema }
type: string
json_schema: optional map[string or number or boolean or 2 more]

Schema for the response format. More information here: https://developers.cloudflare.com/workers-ai/json-mode/.

One of the following:
string
number
boolean
unknown
array of string
limit: optional number

Maximum number of URLs to crawl.

maximum100000
minimum1
maxAge: optional number

Maximum age of a resource that can be returned from cache in seconds. Default is 1 day.

maximum604800
minimum0
modifiedSince: optional number

Unix timestamp (seconds since epoch) indicating to only crawl pages that were modified since this time. For sitemap URLs with a lastmod field, this is compared directly. For other URLs, the crawler will use If-Modified-Since header when fetching. URLs without modification information (no lastmod in sitemap and no Last-Modified header support) will be crawled. Note: This works in conjunction with maxAge - both filters must pass for a cached resource to be used. Must be within the last year and not in the future.

exclusiveMinimum
minimum0
options: optional object { excludePatterns, includeExternalLinks, includePatterns, includeSubdomains }

Additional options for the crawler.

excludePatterns: optional array of string

Exclude links matching the provided wildcard patterns in the crawl job. Example: 'https://example.com/privacy/**'.

includePatterns: optional array of string

Include only links matching the provided wildcard patterns in the crawl job. Include patterns are evaluated before exclude patterns. URLs that match any of the specified include patterns will be included in the crawl job. Example: 'https://example.com/blog/**'.

includeSubdomains: optional boolean

Include links to subdomains in the crawl job. This option is ignored if includeExternalLinks is true.

source: optional "sitemaps" or "links" or "all"

Source of links to crawl. 'sitemaps' - only crawl URLs from sitemaps, 'links' - only crawl URLs scraped from pages, 'all' - crawl both sitemap and scraped links (default).

One of the following:
"sitemaps"
"links"
"all"
ReturnsExpand Collapse
result: string

Crawl job ID.

success: boolean

Response status.

errors: optional array of object { code, message }
code: number

Error code.

message: string

Error message.

Crawl websites.

curl https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/browser-rendering/crawl \
    -H 'Content-Type: application/json' \
    -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
    -d '{
          "url": "https://example.com"
        }'
{
  "result": "result",
  "success": true,
  "errors": [
    {
      "code": 0,
      "message": "message"
    }
  ]
}
{
  "errors": [
    {
      "code": 2001,
      "message": "Rate limit exceeded"
    }
  ],
  "success": false
}
Returns Examples
{
  "result": "result",
  "success": true,
  "errors": [
    {
      "code": 0,
      "message": "message"
    }
  ]
}
{
  "errors": [
    {
      "code": 2001,
      "message": "Rate limit exceeded"
    }
  ],
  "success": false
}