robots.txt and sitemaps
This page provides general guidance on configuring robots.txt and sitemaps for websites you plan to access with Browser Rendering.
Requests can be identified by the automatic headers that Cloudflare attaches:
- User-Agent — Each Browser Rendering method has a different default User-Agent, which you can use to write targeted
robots.txtrules cf-brapi-request-id— Unique identifier for REST API requestsSignature-agent— Pointer to Cloudflare's bot verification keys
To allow or block Browser Rendering traffic using WAF rules instead of robots.txt, use the bot detection IDs on the automatic request headers page.
A well-configured robots.txt helps crawlers understand which parts of your site they can access.
Include a reference to your sitemap in robots.txt so crawlers can discover your URLs:
User-agent: *Allow: /
Sitemap: https://example.com/sitemap.xmlYou can list multiple sitemaps:
User-agent: *Allow: /
Sitemap: https://example.com/sitemap.xmlSitemap: https://example.com/blog-sitemap.xmlUse crawl-delay to control how frequently crawlers request pages from your server:
User-agent: *Crawl-delay: 2Allow: /
Sitemap: https://example.com/sitemap.xmlThe value is in seconds. A crawl-delay of 2 means the crawler waits two seconds between requests.
If you want to prevent Browser Rendering (or other crawlers) from accessing your site, you can configure your robots.txt to restrict access.
To prevent all crawlers from accessing any page on your site:
User-agent: *Disallow: /This is the most restrictive configuration and blocks all compliant bots, not just Browser Rendering.
The /crawl endpoint identifies itself with the User-Agent CloudflareBrowserRenderingCrawler/1.0. To block the /crawl endpoint while allowing all other traffic (including other Browser Rendering REST API endpoints, which use a different User-Agent):
User-agent: CloudflareBrowserRenderingCrawlerDisallow: /
User-agent: *Allow: /To allow the /crawl endpoint to access your site but block specific sections:
User-agent: CloudflareBrowserRenderingCrawlerDisallow: /admin/Disallow: /private/Allow: /
User-agent: *Allow: /Structure your sitemap to help crawlers process your site efficiently:
<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>https://example.com/important-page</loc> <lastmod>2025-01-15T00:00:00+00:00</lastmod> <priority>1.0</priority> </url> <url> <loc>https://example.com/other-page</loc> <lastmod>2025-01-10T00:00:00+00:00</lastmod> <priority>0.5</priority> </url></urlset>| Attribute | Purpose | Recommendation |
|---|---|---|
<loc> | URL of the page | Required. Use full URLs. |
<lastmod> | Last modification date | Include to help the crawler identify updated content. Use ISO 8601 format. |
<priority> | Relative importance (0.0-1.0) | Set higher values for important pages. The crawler will process pages in priority order. |
For large sites with multiple sitemaps, use a sitemap index file. Browser Rendering uses the depth parameter to control how many levels of nested sitemaps are crawled:
<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> ...</urlset><sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap> <loc>https://www.example.com/sitemap-products.xml</loc> </sitemap> <sitemap> <loc>https://www.example.com/sitemap-blog.xml</loc> </sitemap></sitemapindex>Browser Rendering periodically refetches sitemaps to keep content fresh. Serve your sitemap with Last-Modified or ETag response headers so the crawler can detect whether the sitemap has changed since the last fetch.
- Include
<lastmod>on all URLs to help identify which pages have changed. Use ISO 8601 format (for example,2025-01-15T00:00:00+00:00). - Use sitemap index files for large sites with multiple sitemaps.
- Compress large sitemaps using
.gzformat to reduce bandwidth. - Keep sitemaps under 50 MB and 50,000 URLs per file (standard sitemap limits).
- FAQ: Will Browser Rendering be detected by Bot Management? — How Browser Rendering interacts with bot protection and how to create a WAF skip rule
- Automatic request headers — User-Agent strings and non-configurable headers used by Browser Rendering