Crawl websites.

browser_rendering.crawl.create() -> CrawlCreateResponse

POST/accounts/{account_id}/browser-rendering/crawl

Starts a crawl job for the provided URL and its children. Check available options like gotoOptions and waitFor* to control page load behaviour.

Security

API Token

The preferred authorization scheme for interacting with the Cloudflare API. Create a token.

Example:Authorization: Bearer Sn3lZJTBX6kkg7OdcBUAxOO963GEIyGQqnFTOFYY

API Email + API Key

The previous authorization scheme for interacting with the Cloudflare API, used in conjunction with a Global API key.

Example:X-Auth-Email: user@example.com

The previous authorization scheme for interacting with the Cloudflare API. When possible, use API tokens instead of Global API keys.

Example:X-Auth-Key: 144c9defac04969c7bfad8efaa8ea194

Accepted Permissions (at least one required)

Browser Rendering Write

ParametersExpand Collapse

account_id: str

Account ID.

url: str

URL to navigate to, eg. https://example.com.

formaturi

cache_ttl: Optional[float]

Cache TTL default is 5s. Set to 0 to disable.

maximum86400

minimum0

action_timeout: Optional[float]

The maximum duration allowed for the browser action to complete after the page has loaded (such as taking screenshots, extracting content, or generating PDFs). If this time limit is exceeded, the action stops and returns a timeout error.

maximum120000

add_script_tag: Optional[Iterable[Variant0AddScriptTag]]

Adds a <script> tag into the page with the desired URL or content.

id: Optional[str]

content: Optional[str]

type: Optional[str]

url: Optional[str]

formaturi

add_style_tag: Optional[Iterable[Variant0AddStyleTag]]

Adds a <link rel="stylesheet"> tag into the page with the desired URL or a <style type="text/css"> tag with the content.

content: Optional[str]

url: Optional[str]

formaturi

allow_request_pattern: Optional[Sequence[str]]

Only allow requests that match the provided regex patterns, eg. ’/^.*.(css)’.

allow_resource_types: Optional[List[Literal["document", "stylesheet", "image", 15 more]]]

Only allow requests that match the provided resource types, eg. ‘image’ or ‘script’.

One of the following:

"document"

"stylesheet"

"image"

"media"

"font"

"script"

"texttrack"

"xhr"

"fetch"

"prefetch"

"eventsource"

"websocket"

"manifest"

"signedexchange"

"ping"

"cspviolationreport"

"preflight"

"other"

authenticate: Optional[Variant0Authenticate]

Provide credentials for HTTP authentication.

password: str

minLength1

username: str

minLength1

best_attempt: Optional[bool]

Attempt to proceed when ‘awaited’ events fail or timeout.

cookies: Optional[Iterable[Variant0Cookie]]

Check options.

name: str

Cookie name.

value: str

domain: Optional[str]

expires: Optional[float]

http_only: Optional[bool]

partition_key: Optional[str]

path: Optional[str]

priority: Optional[Literal["Low", "Medium", "High"]]

One of the following:

"Low"

"Medium"

"High"

same_party: Optional[bool]

same_site: Optional[Literal["Strict", "Lax", "None"]]

One of the following:

"Strict"

"Lax"

"None"

secure: Optional[bool]

source_port: Optional[float]

source_scheme: Optional[Literal["Unset", "NonSecure", "Secure"]]

One of the following:

"Unset"

"NonSecure"

"Secure"

url: Optional[str]

crawl_purposes: Optional[List[Literal["search", "ai-input", "ai-train"]]]

List of crawl purposes to respect Content-Signal directives in robots.txt. Allowed values: ‘search’, ‘ai-input’, ‘ai-train’. Learn more: https://contentsignals.org/. Default: [‘search’, ‘ai-input’, ‘ai-train’].

One of the following:

"search"

"ai-input"

"ai-train"

depth: Optional[float]

Maximum number of levels deep the crawler will traverse from the starting URL.

maximum100000

minimum1

emulate_media_type: Optional[str]

formats: Optional[List[Literal["html", "markdown", "json"]]]

Formats to return. Default is html.

One of the following:

"html"

"markdown"

"json"

goto_options: Optional[Variant0GotoOptions]

Check options.

referer: Optional[str]

referrer_policy: Optional[str]

timeout: Optional[float]

maximum60000

wait_until: Optional[Union[Literal["load", "domcontentloaded", "networkidle0", "networkidle2"], List[Literal["load", "domcontentloaded", "networkidle0", "networkidle2"]]]]

One of the following:

Literal["load", "domcontentloaded", "networkidle0", "networkidle2"]

One of the following:

"load"

"domcontentloaded"

"networkidle0"

"networkidle2"

List[Literal["load", "domcontentloaded", "networkidle0", "networkidle2"]]

One of the following:

"load"

"domcontentloaded"

"networkidle0"

"networkidle2"

json_options: Optional[Variant0JsonOptions]

Options for JSON extraction.

custom_ai: Optional[Iterable[Variant0JsonOptionsCustomAI]]

Optional list of custom AI models to use for the request. The models will be tried in the order provided, and in case a model returns an error, the next one will be used as fallback.

model: str

AI model to use for the request. Must be formed as <provider>/<model_name>, e.g. workers-ai/@cf/meta/llama-3.3-70b-instruct-fp8-fast.

authorization: Optional[str]

Authorization token for the AI model: Bearer <token>. Not needed for workers-ai models.

prompt: Optional[str]

response_format: Optional[Variant0JsonOptionsResponseFormat]

type: str

json_schema: Optional[Dict[str, Union[str, float, bool, 2 more]]]

Schema for the response format. More information here: https://developers.cloudflare.com/workers-ai/json-mode/

One of the following:

str

float

bool

Sequence[str]

object

limit: Optional[float]

Maximum number of URLs to crawl.

maximum100000

minimum1

max_age: Optional[float]

Maximum age of a resource that can be returned from cache in seconds. Default is 1 day.

maximum604800

minimum0

modified_since: Optional[int]

Unix timestamp (seconds since epoch) indicating to only crawl pages that were modified since this time. For sitemap URLs with a lastmod field, this is compared directly. For other URLs, the crawler will use If-Modified-Since header when fetching. URLs without modification information (no lastmod in sitemap and no Last-Modified header support) will be crawled. Note: This works in conjunction with maxAge - both filters must pass for a cached resource to be used. Must be within the last year and not in the future.

exclusiveMinimum

minimum0

options: Optional[Variant0Options]

Additional options for the crawler.

exclude_patterns: Optional[Sequence[str]]

Exclude links matching the provided wildcard patterns in the crawl job. Example: ‘https://example.com/privacy/**’.

include_external_links: Optional[bool]

Include external links in the crawl job. If set to true, includeSubdomains is ignored.

include_patterns: Optional[Sequence[str]]

Include only links matching the provided wildcard patterns in the crawl job. Include patterns are evaluated before exclude patterns. URLs that match any of the specified include patterns will be included in the crawl job. Example: ‘https://example.com/blog/**’.

include_subdomains: Optional[bool]

Include links to subdomains in the crawl job. This option is ignored if includeExternalLinks is true.

reject_request_pattern: Optional[Sequence[str]]

Block undesired requests that match the provided regex patterns, eg. ’/^.*.(css)’.

reject_resource_types: Optional[List[Literal["document", "stylesheet", "image", 15 more]]]

Block undesired requests that match the provided resource types, eg. ‘image’ or ‘script’.

One of the following:

"document"

"stylesheet"

"image"

"media"

"font"

"script"

"texttrack"

"xhr"

"fetch"

"prefetch"

"eventsource"

"websocket"

"manifest"

"signedexchange"

"ping"

"cspviolationreport"

"preflight"

"other"

render: Optional[Literal[true]]

Whether to render the page or fetch static content. True by default.

set_extra_http_headers: Optional[Dict[str, str]]

set_java_script_enabled: Optional[bool]

source: Optional[Literal["sitemaps", "links", "all"]]

Source of links to crawl. ‘sitemaps’ - only crawl URLs from sitemaps, ‘links’ - only crawl URLs scraped from pages, ‘all’ - crawl both sitemap and scraped links (default).

One of the following:

"sitemaps"

"links"

"all"

viewport: Optional[Variant0Viewport]

Check options.

height: float

width: float

device_scale_factor: Optional[float]

has_touch: Optional[bool]

is_landscape: Optional[bool]

is_mobile: Optional[bool]

wait_for_selector: Optional[Variant0WaitForSelector]

Wait for the selector to appear in page. Check options.

selector: str

hidden: Optional[Literal[true]]

timeout: Optional[float]

maximum120000

visible: Optional[Literal[true]]

wait_for_timeout: Optional[float]

Waits for a specified timeout before continuing.

maximum120000

ReturnsExpand Collapse

str

Crawl job ID.

Crawl websites.

import os
from cloudflare import Cloudflare

client = Cloudflare(
    api_token=os.environ.get("CLOUDFLARE_API_TOKEN"),  # This is the default and can be omitted
)
crawl = client.browser_rendering.crawl.create(
    account_id="account_id",
    url="https://example.com",
)
print(crawl)

{
  "result": "result",
  "success": true,
  "errors": [
    {
      "code": 0,
      "message": "message"
    }
  ]
}

{
  "errors": [
    {
      "code": 2001,
      "message": "Rate limit exceeded"
    }
  ],
  "success": false
}

Returns Examples

{
  "result": "result",
  "success": true,
  "errors": [
    {
      "code": 0,
      "message": "message"
    }
  ]
}

{
  "errors": [
    {
      "code": 2001,
      "message": "Rate limit exceeded"
    }
  ],
  "success": false
}