robots.txt and sitemaps
This page provides general guidance on configuring robots.txt and sitemaps for websites you plan to access with Browser Rendering.
Requests can be identified by the automatic headers that Cloudflare attaches:
cf-brapi-request-id— Unique identifier for REST API requestsSignature-agent— Pointer to Cloudflare's bot verification keys
Browser Rendering has a bot detection ID of 128292352. Use this to create WAF rules that allow or block Browser Rendering traffic. For the default user agent and other identification details, refer to Automatic request headers.
A well-configured robots.txt helps crawlers understand which parts of your site they can access.
Include a reference to your sitemap in robots.txt so crawlers can discover your URLs:
User-agent: *Allow: /
Sitemap: https://example.com/sitemap.xmlYou can list multiple sitemaps:
User-agent: *Allow: /
Sitemap: https://example.com/sitemap.xmlSitemap: https://example.com/blog-sitemap.xmlUse crawl-delay to control how frequently crawlers request pages from your server:
User-agent: *Crawl-delay: 2Allow: /
Sitemap: https://example.com/sitemap.xmlThe value is in seconds. A crawl-delay of 2 means the crawler waits two seconds between requests.
Structure your sitemap to help crawlers process your site efficiently:
<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>https://example.com/important-page</loc> <lastmod>2025-01-15T00:00:00+00:00</lastmod> <priority>1.0</priority> </url> <url> <loc>https://example.com/other-page</loc> <lastmod>2025-01-10T00:00:00+00:00</lastmod> <priority>0.5</priority> </url></urlset>| Attribute | Purpose | Recommendation |
|---|---|---|
<loc> | URL of the page | Required. Use full URLs. |
<lastmod> | Last modification date | Include to help the crawler identify updated content. Use ISO 8601 format. |
<priority> | Relative importance (0.0-1.0) | Set higher values for important pages. The crawler will process pages in priority order. |
For large sites with multiple sitemaps, use a sitemap index file. Browser Rendering uses the depth parameter to control how many levels of nested sitemaps are crawled:
<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> ...</urlset><sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap> <loc>https://www.example.com/sitemap-products.xml</loc> </sitemap> <sitemap> <loc>https://www.example.com/sitemap-blog.xml</loc> </sitemap></sitemapindex>Browser Rendering periodically refetches sitemaps to keep content fresh. Serve your sitemap with Last-Modified or ETag response headers so the crawler can detect whether the sitemap has changed since the last fetch.
- Include
<lastmod>on all URLs to help identify which pages have changed. Use ISO 8601 format (for example,2025-01-15T00:00:00+00:00). - Use sitemap index files for large sites with multiple sitemaps.
- Compress large sitemaps using
.gzformat to reduce bandwidth. - Keep sitemaps under 50 MB and 50,000 URLs per file (standard sitemap limits).
- FAQ: Will Browser Rendering bypass Cloudflare's Bot Protection? — Instructions for creating a WAF skip rule