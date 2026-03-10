The /crawl endpoint scrapes content from a starting URL and follows links across the site, up to a configurable depth or page limit. Responses can be returned as HTML, Markdown, or JSON.

Endpoint

https://api.cloudflare.com/client/v4/accounts/<account_id>/browser-rendering/crawl

Required fields

url (string)

Refer to optional parameters for additional customization options.

Common use cases

Building knowledge bases or training AI systems (such as RAG applications) with up-to-date web content

Scraping and analyzing content across multiple pages for research, summarization, or monitoring

How it works

There are two steps to using the /crawl endpoint:

Initiate the crawl job — A POST request where you initiate the crawl and receive a response with a job id . Request results of the crawl job — A GET request where you request the status or results of the crawl.

Crawl jobs have a maximum run time of seven days. If a job does not finish within this time, it will be cancelled due to timeout. Job results are available for 14 days after the job completes, after which the job data is deleted.

Free plan limitations Users on the Workers Free plan are subject to additional crawl-specific restrictions. Refer to crawl endpoint limits for details.

Initiate the crawl job

Send a POST request with a url to start a crawl job. The API responds immediately with a job id you will use to retrieve results. Refer to optional parameters for additional customization options.

Terminal window curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \ -H 'Authorization: Bearer <apiToken>' \ -H 'Content-Type: application/json' \ -d '{ "url": "https://developers.cloudflare.com/workers/" }'

Example response:

{ " success " : true , " result " : "c7f8s2d9-a8e7-4b6e-8e4d-3d4a1b2c3f4e" }

Request results of the crawl job

To check the status or request the results of your crawl job, use the job id you received:

Terminal window curl -X GET 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl/c7f8s2d9-a8e7-4b6e-8e4d-3d4a1b2c3f4e' \ -H 'Authorization: Bearer YOUR_API_TOKEN'

The response includes a status field indicating the current state of the crawl job. The possible job statuses are:

running — The crawl job is currently in progress.

— The crawl job is currently in progress. cancelled_due_to_timeout — The crawl job exceeded the maximum run time of seven days.

— The crawl job exceeded the maximum run time of seven days. cancelled_due_to_limits — The crawl job was cancelled because it hit account limits.

— The crawl job was cancelled because it hit account limits. cancelled_by_user — The crawl job was manually cancelled by the user.

— The crawl job was manually cancelled by the user. errored — The crawl job encountered an error.

— The crawl job encountered an error. completed — The crawl job finished successfully.

Polling for completion

Since crawl jobs run asynchronously, you can poll the endpoint periodically to check when the job finishes. Add ?limit=1 to the request URL so the response stays lightweight — you only need the job status , not the full set of crawled records.

JavaScript async function waitForCrawl ( accountId , jobId , apiToken ) { const maxAttempts = 60 ; const delayMs = 5000 ; for ( let i = 0 ; i < maxAttempts ; i ++ ) { const response = await fetch ( `https://api.cloudflare.com/client/v4/accounts/ ${ accountId } /browser-rendering/crawl/ ${ jobId } ?limit=1` , { headers : { Authorization : `Bearer ${ apiToken } ` , }, }, ) ; const data = await response . json () ; const status = data . result . status ; if ( status !== "running" ) { return data . result ; } await new Promise ( ( resolve ) => setTimeout ( resolve , delayMs )) ; } throw new Error ( "Crawl job did not complete within timeout" ) ; }

Once the job reaches a terminal status, fetch the full results without the limit parameter. You can also use the following query parameters to filter and paginate results:

cursor — Cursor for pagination. If the response exceeds 10 MB, a cursor value will be included. Pass it as a query parameter to retrieve the next page of results.

— Cursor for pagination. If the response exceeds 10 MB, a value will be included. Pass it as a query parameter to retrieve the next page of results. limit — Maximum number of records to return.

— Maximum number of records to return. status — Filter by URL status: queued , completed , disallowed , skipped , errored , or cancelled .

Example with query parameters:

Terminal window curl -X GET 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl/c7f8s2d9-a8e7-4b6e-8e4d-3d4a1b2c3f4e?cursor=10&limit=10&status=completed' \ -H 'Authorization: Bearer YOUR_API_TOKEN'

Example response:

{ " result " : { " id " : "c7f8s2d9-a8e7-4b6e-8e4d-3d4a1b2c3f4e" , " status " : "completed" , " browserSecondsUsed " : 134.7 , " total " : 50 , " finished " : 50 , " records " : [ { " url " : "https://developers.cloudflare.com/workers/" , " status " : "completed" , " markdown " : "# Cloudflare Workers

Build and deploy serverless applications..." , " metadata " : { " status " : 200 , " title " : "Cloudflare Workers · Cloudflare Workers docs" , " url " : "https://developers.cloudflare.com/workers/" } }, { " url " : "https://developers.cloudflare.com/workers/get-started/quickstarts/" , " status " : "completed" , " markdown " : "## Quickstarts

Get up and running with a simple Hello World..." , " metadata " : { " status " : 200 , " title " : "Quickstarts · Cloudflare Workers docs" , " url " : "https://developers.cloudflare.com/workers/get-started/quickstarts/" } } // ... 48 more entries omitted for brevity ], " cursor " : 10 }, " success " : true }

Cancel a crawl job

To cancel a crawl job that is currently in progress, use the job id you received:

Terminal window curl -X DELETE 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl/c7f8s2d9-a8e7-4b6e-8e4d-3d4a1b2c3f4e' \ -H 'Authorization: Bearer YOUR_API_TOKEN'

A successful cancellation will return a 200 OK status code. The job status will be updated to cancelled, and all URLs that have been queued to be crawled will be cancelled.

Optional parameters

The following optional parameters can be used in your crawl request, in addition to the required url parameter. For the full list, refer to the API docs.

Optional parameter Type Description limit Number Maximum number of pages to crawl (default is 10, maximum is 100,000). depth Number Maximum link depth to crawl from the starting URL (default is 100,000, maximum is 100,000). source String Source for discovering URLs. Options are all , sitemaps , or links . Default is all . formats Array of strings Response format (default is HTML, other options are Markdown and JSON). The JSON format leverages Workers AI by default for data extraction, which incurs usage on Workers AI. Refer to the /json endpoint to learn more, including how to use a custom model and fallbacks. render Boolean If false, does a fast HTML fetch without executing JavaScript (default is true, learn more about render ). jsonOptions Object Only required if formats includes json . Contains prompt , response_format , and custom_ai properties (same types as the /json endpoint). maxAge Number Maximum length of time in seconds the crawler can use a cached resource before it must re-fetch it from the origin server (default is 86,400, maximum is 604,800). Cache is served from R2 only if the URL and parameters exactly match. modifiedSince Number Unix timestamp (in seconds) indicating to only crawl pages that were modified since this time. options.includeExternalLinks Boolean If true, follows links to external domains (default is false). options.includeSubdomains Boolean If true, follows links to subdomains of the starting URL (default is false). options.includePatterns Array of strings Only visits URLs that match one of these wildcard patterns. Use * to match any characters except / , or ** to match any characters including / . options.excludePatterns Array of strings Does not visit URLs that match any of these wildcard patterns. Use * to match any characters except / , or ** to match any characters including / .

Pattern behavior

excludePatterns has strictly higher priority. If a URL matches an exclude rule, it is skipped, regardless of whether it matches an include rule.

No rules — Everything is indexed.

— Everything is indexed. Exclude only — Everything is indexed except items matching the exclude patterns.

— Everything is indexed except items matching the exclude patterns. Include only — Only items matching the include patterns are indexed; everything else is ignored.

Viewing skipped URLs

To view URLs that were discovered but skipped, query the crawl job results with status=skipped . URLs can be skipped due to includeExternalLinks , includeSubdomains , includePatterns / excludePatterns , or the modifiedSince parameter. Skipped URLs will also be visible in the dashboard in a future release.

Terminal window curl -X GET 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl/{job_id}?status=skipped' \ -H 'Authorization: Bearer YOUR_API_TOKEN'

render parameter

If you use render: true , which is the default, the crawl endpoint spins up a headless browser and executes page JavaScript. If you use render: false , the crawl endpoint does a fast HTML fetch without executing JavaScript.

Use render: true when the page builds content in the browser. Use render: false when the content you need is already in the initial HTML response.

Crawls that use render: true use a headless browser and are billed under typical Browser Rendering pricing. Crawls that use render: false run on Workers instead of a headless browser. During the beta, render: false crawls are not billed. After the beta, they will be billed under Workers pricing.

Example with all optional parameters

Terminal window curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \ -H 'Authorization: Bearer <apiToken>' \ -H 'Content-Type: application/json' \ -d '{ "url": "https://www.exampledocs.com/docs/", "limit": 50, "depth": 2, "formats": ["markdown"], "render": false, "maxAge": 7200, "modifiedSince": 1704067200, "source": "all", "options": { "includeExternalLinks": true, "includeSubdomains": true, "includePatterns": [ "**/api/v1/*" ], "excludePatterns": [ "*/learning-paths/*" ] } }'

Advanced usage

Looking for more parameters? Visit the Browser Rendering API reference for all available parameters, such as setting HTTP credentials using authenticate , setting cookies , and customizing load behavior using gotoOptions .

Documentation site crawl

Crawl only documentation pages and exclude specific sections:

Terminal window curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \ -H 'Authorization: Bearer <apiToken>' \ -H 'Content-Type: application/json' \ -d '{ "url": "https://example.com/docs", "limit": 200, "depth": 5, "formats": ["markdown"], "options": { "includePatterns": [ "https://example.com/docs/**" ], "excludePatterns": [ "https://example.com/docs/changelog/**", "https://example.com/docs/archive/**" ] } }'

Product catalog extraction with AI

Extract structured product data using the json format. This leverages Workers AI by default.

Terminal window curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \ -H 'Authorization: Bearer <apiToken>' \ -H 'Content-Type: application/json' \ -d '{ "url": "https://shop.example.com/products", "limit": 50, "formats": ["json"], "jsonOptions": { "prompt": "Extract product name, price, description, and availability", "response_format": { "type": "json_schema", "json_schema": { "name": "product", "properties": { "name": "string", "price": "number", "currency": "string", "description": "string", "inStock": "boolean" } } } }, "options": { "includePatterns": [ "https://shop.example.com/products/*" ] } }'

Fast static content fetch

Fetch static HTML without rendering for faster crawling of static sites:

Terminal window curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \ -H 'Authorization: Bearer <apiToken>' \ -H 'Content-Type: application/json' \ -d '{ "url": "https://example.com", "limit": 100, "render": false, "formats": ["html", "markdown"] }'

Crawl with authentication

Crawl pages behind HTTP authentication or with custom headers:

Terminal window curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \ -H 'Authorization: Bearer <apiToken>' \ -H 'Content-Type: application/json' \ -d '{ "url": "https://secure.example.com", "limit": 50, "authenticate": { "username": "user", "password": "pass" } }'

You can also use cookies or custom headers for token-based authentication:

Terminal window curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \ -H 'Authorization: Bearer <apiToken>' \ -H 'Content-Type: application/json' \ -d '{ "url": "https://api.example.com/docs", "limit": 100, "setExtraHTTPHeaders": { "X-API-Key": "your-api-key" } }'

Wait for dynamic content

Crawl single-page applications that load content dynamically:

Terminal window curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \ -H 'Authorization: Bearer <apiToken>' \ -H 'Content-Type: application/json' \ -d '{ "url": "https://app.example.com", "limit": 50, "gotoOptions": { "waitUntil": "networkidle2", "timeout": 60000 }, "waitForSelector": { "selector": "[data-content-loaded]", "timeout": 30000, "visible": true } }'

Block unnecessary resources

Speed up crawling by blocking images and media:

Terminal window curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \ -H 'Authorization: Bearer <apiToken>' \ -H 'Content-Type: application/json' \ -d '{ "url": "https://example.com", "limit": 100, "rejectResourceTypes": [ "image", "media", "font", "stylesheet" ] }'

Crawler behavior

How the crawler discovers URLs

The crawler discovers and processes URLs in the following order (when using source: all , the default):

Starting URL — The URL specified in your request. Sitemap links — URLs found in the site's sitemap. Page links — Links scraped from pages, if not already found in the sitemap.

Use the source parameter to customize which sources the crawler uses. The available options are:

all — Uses both sitemaps and page links (default).

— Uses both sitemaps and page links (default). sitemaps — Only crawls URLs found in the site's sitemap.

— Only crawls URLs found in the site's sitemap. links — Only crawls links found on pages, ignoring sitemaps.

robots.txt and bot protection

The /crawl endpoint respects the directives of robots.txt files, including crawl-delay . All URLs that /crawl is directed not to crawl are listed in the response with "status": "disallowed" . For guidance on configuring robots.txt and sitemaps for sites you plan to crawl, refer to robots.txt and sitemaps.

Bot protection may block crawling If you use Cloudflare products that control or restrict bot traffic such as Bot Management, Web Application Firewall (WAF), or Turnstile, the same rules will apply to the Browser Rendering crawler. Refer to Will Browser Rendering bypass Cloudflare's Bot Protection? for instructions on creating a WAF skip rule.

Set a custom user agent

You can change the user agent at the page level by passing userAgent as a top-level parameter in the JSON body. This is useful if the target website serves different content based on the user agent.

Note The userAgent parameter does not bypass bot protection. Requests from Browser Rendering will always be identified as a bot.

Troubleshooting

Crawl job returns no results or all URLs are skipped

If your crawl job completes but returns an empty records array, or all URLs show skipped or disallowed status:

robots.txt blocking — The crawler respects robots.txt rules. Check the target site's robots.txt file to verify your user agent is allowed. Blocked URLs appear with "status": "disallowed" .

— The crawler respects rules. Check the target site's file to verify your user agent is allowed. Blocked URLs appear with . Pattern filters too restrictive — Your includePatterns may not match any URLs on the site. Try crawling without patterns first to confirm URLs are discoverable, then add patterns.

— Your may not match any URLs on the site. Try crawling without patterns first to confirm URLs are discoverable, then add patterns. No links found — The starting URL may not contain links. Try using source: "sitemaps" , increasing the depth parameter, or setting includeSubdomains or includeExternalLinks to true .

Crawl job takes too long

If a crawl job remains in running status for an extended period:

Slow page loads — Pages with heavy JavaScript take longer to render. Use render: false if the content you need is in the initial HTML.

— Pages with heavy JavaScript take longer to render. Use if the content you need is in the initial HTML. Rate limiting — Sites with strict rate limits slow crawling. The crawler respects robots.txt Crawl-delay and implements backoff. Reduce limit and run multiple smaller crawls.

— Sites with strict rate limits slow crawling. The crawler respects and implements backoff. Reduce and run multiple smaller crawls. Unnecessary resources — Block resources that are not needed for content extraction using rejectResourceTypes (for example, image , media , font ).

Crawl job cancelled due to limits

A cancelled_due_to_limits status means your account hit its browser time limit. Workers Free plan accounts are capped at 10 minutes of browser use per day. To resolve this:

Upgrade to a Workers Paid plan for higher limits.

Use render: false for static content to avoid consuming browser time.

for static content to avoid consuming browser time. Increase maxAge to use cached results where possible.

to use cached results where possible. Reduce the limit parameter.

JSON extraction errors

If the json format returns null or empty results:

Provide a clear prompt — Be specific about what data to extract and where it appears on the page (for example, "Extract the product name, price, and description from the main product section").

— Be specific about what data to extract and where it appears on the page (for example, "Extract the product name, price, and description from the main product section"). Define a response schema — Use response_format with a JSON schema to enforce the expected output structure.

— Use with a JSON schema to enforce the expected output structure. Use a custom model — If the default Workers AI model does not produce the desired results, use the custom_ai parameter to specify a different model. Refer to Using a custom model (BYO API Key) for details.

If you have questions or encounter other errors, refer to the Browser Rendering FAQ and troubleshooting guide.

