Crawl websites.
Starts a crawl job for the provided URL and its children. Check available options like gotoOptions and waitFor* to control page load behaviour.
Security
API Token
The preferred authorization scheme for interacting with the Cloudflare API. Create a token.
Authorization: Bearer Sn3lZJTBX6kkg7OdcBUAxOO963GEIyGQqnFTOFYYAPI Email + API Key
The previous authorization scheme for interacting with the Cloudflare API, used in conjunction with a Global API key.
X-Auth-Email: user@example.comThe previous authorization scheme for interacting with the Cloudflare API. When possible, use API tokens instead of Global API keys.
X-Auth-Key: 144c9defac04969c7bfad8efaa8ea194Accepted Permissions (at least one required)
Browser Rendering WriteParametersExpand Collapse
params CrawlNewParams
Query param: Cache TTL default is 5s. Set to 0 to disable.
Body param: The maximum duration allowed for the browser action to complete after the page has loaded (such as taking screenshots, extracting content, or generating PDFs). If this time limit is exceeded, the action stops and returns a timeout error.
AddScriptTag param.Field[[]CrawlNewParamsVariant0AddScriptTag]optionalBody param: Adds a <script> tag into the page with the desired URL or content.
Body param: Adds a <script> tag into the page with the desired URL or content.
AddStyleTag param.Field[[]CrawlNewParamsVariant0AddStyleTag]optionalBody param: Adds a <link rel="stylesheet"> tag into the page with the desired URL or a <style type="text/css"> tag with the content.
Body param: Adds a <link rel="stylesheet"> tag into the page with the desired URL or a <style type="text/css"> tag with the content.
Body param: Only allow requests that match the provided regex patterns, eg. '/^.*.(css)'.
AllowResourceTypes param.Field[[]CrawlNewParamsVariant0AllowResourceType]optionalBody param: Only allow requests that match the provided resource types, eg. 'image' or 'script'.
Body param: Only allow requests that match the provided resource types, eg. 'image' or 'script'.
Body param: Attempt to proceed when 'awaited' events fail or timeout.
Cookies param.Field[[]CrawlNewParamsVariant0Cookie]optionalBody param: Check options.
Body param: Check options.
CrawlPurposes param.Field[[]CrawlNewParamsVariant0CrawlPurpose]optionalBody param: List of crawl purposes to respect Content-Signal directives in robots.txt. Allowed values: 'search', 'ai-input', 'ai-train'. Learn more: https://contentsignals.org/. Default: ['search', 'ai-input', 'ai-train'].
Body param: List of crawl purposes to respect Content-Signal directives in robots.txt. Allowed values: 'search', 'ai-input', 'ai-train'. Learn more: https://contentsignals.org/. Default: ['search', 'ai-input', 'ai-train'].
Body param: Maximum number of levels deep the crawler will traverse from the starting URL.
Formats param.Field[[]CrawlNewParamsVariant0Format]optionalBody param: Formats to return. Default is html.
Body param: Formats to return. Default is html.
Body param: Check options.
Body param: Check options.
WaitUntil CrawlNewParamsVariant0GotoOptionsWaitUntilUnionoptional
CrawlNewParamsVariant0GotoOptionsWaitUntilString
CrawlNewParamsVariant0GotoOptionsWaitUntilArray
Body param: Options for JSON extraction.
Body param: Options for JSON extraction.
CustomAI []CrawlNewParamsVariant0JsonOptionsCustomAIoptionalOptional list of custom AI models to use for the request. The models will be tried in the order provided, and in case a model returns an error, the next one will be used as fallback.
Optional list of custom AI models to use for the request. The models will be tried in the order provided, and in case a model returns an error, the next one will be used as fallback.
ResponseFormat CrawlNewParamsVariant0JsonOptionsResponseFormatoptional
JsonSchema map[string, CrawlNewParamsVariant0JsonOptionsResponseFormatJsonSchemaUnion]optionalSchema for the response format. More information here: https://developers.cloudflare.com/workers-ai/json-mode/.
Schema for the response format. More information here: https://developers.cloudflare.com/workers-ai/json-mode/.
Body param: Maximum number of URLs to crawl.
Body param: Maximum age of a resource that can be returned from cache in seconds. Default is 1 day.
Body param: Unix timestamp (seconds since epoch) indicating to only crawl pages that were modified since this time. For sitemap URLs with a lastmod field, this is compared directly. For other URLs, the crawler will use If-Modified-Since header when fetching. URLs without modification information (no lastmod in sitemap and no Last-Modified header support) will be crawled. Note: This works in conjunction with maxAge - both filters must pass for a cached resource to be used. Must be within the last year and not in the future.
Body param: Additional options for the crawler.
Body param: Additional options for the crawler.
Exclude links matching the provided wildcard patterns in the crawl job. Example: 'https://example.com/privacy/**'.
Include external links in the crawl job. If set to true, includeSubdomains is ignored.
Include only links matching the provided wildcard patterns in the crawl job. Include patterns are evaluated before exclude patterns. URLs that match any of the specified include patterns will be included in the crawl job. Example: 'https://example.com/blog/**'.
Body param: Block undesired requests that match the provided regex patterns, eg. '/^.*.(css)'.
RejectResourceTypes param.Field[[]CrawlNewParamsVariant0RejectResourceType]optionalBody param: Block undesired requests that match the provided resource types, eg. 'image' or 'script'.
Body param: Block undesired requests that match the provided resource types, eg. 'image' or 'script'.
Body param: Source of links to crawl. 'sitemaps' - only crawl URLs from sitemaps, 'links' - only crawl URLs scraped from pages, 'all' - crawl both sitemap and scraped links (default).
Body param: Source of links to crawl. 'sitemaps' - only crawl URLs from sitemaps, 'links' - only crawl URLs scraped from pages, 'all' - crawl both sitemap and scraped links (default).
Body param: Check options.
Body param: Check options.
Body param: Wait for the selector to appear in page. Check options.
Body param: Wait for the selector to appear in page. Check options.
Crawl websites.
package main
import (
"context"
"fmt"
"github.com/cloudflare/cloudflare-go"
"github.com/cloudflare/cloudflare-go/browser_rendering"
"github.com/cloudflare/cloudflare-go/option"
)
func main() {
client := cloudflare.NewClient(
option.WithAPIToken("Sn3lZJTBX6kkg7OdcBUAxOO963GEIyGQqnFTOFYY"),
)
crawl, err := client.BrowserRendering.Crawl.New(context.TODO(), browser_rendering.CrawlNewParams{
AccountID: cloudflare.F("account_id"),
Body: browser_rendering.CrawlNewParamsBodyObject{
URL: cloudflare.F("https://example.com"),
},
})
if err != nil {
panic(err.Error())
}
fmt.Printf("%+v\n", crawl)
}
{
"result": "result",
"success": true,
"errors": [
{
"code": 0,
"message": "message"
}
]
}{
"errors": [
{
"code": 2001,
"message": "Rate limit exceeded"
}
],
"success": false
}Returns Examples
{
"result": "result",
"success": true,
"errors": [
{
"code": 0,
"message": "message"
}
]
}{
"errors": [
{
"code": 2001,
"message": "Rate limit exceeded"
}
],
"success": false
}