Skip to content
Cloudflare Docs

robots.txt and sitemaps

This page provides general guidance on configuring robots.txt and sitemaps for websites you plan to access with Browser Rendering.

Identifying Browser Rendering requests

Requests can be identified by the automatic headers that Cloudflare attaches:

  • User-Agent — Each Browser Rendering method has a different default User-Agent, which you can use to write targeted robots.txt rules
  • cf-brapi-request-id — Unique identifier for REST API requests
  • Signature-agent — Pointer to Cloudflare's bot verification keys

To allow or block Browser Rendering traffic using WAF rules instead of robots.txt, use the bot detection IDs on the automatic request headers page.

Best practices for robots.txt

A well-configured robots.txt helps crawlers understand which parts of your site they can access.

Reference your sitemap

Include a reference to your sitemap in robots.txt so crawlers can discover your URLs:

robots.txt
User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml

You can list multiple sitemaps:

robots.txt
User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/blog-sitemap.xml

Set a crawl delay

Use crawl-delay to control how frequently crawlers request pages from your server:

robots.txt
User-agent: *
Crawl-delay: 2
Allow: /
Sitemap: https://example.com/sitemap.xml

The value is in seconds. A crawl-delay of 2 means the crawler waits two seconds between requests.

Blocking crawlers with robots.txt

If you want to prevent Browser Rendering (or other crawlers) from accessing your site, you can configure your robots.txt to restrict access.

Block all bots from your entire site

To prevent all crawlers from accessing any page on your site:

robots.txt
User-agent: *
Disallow: /

This is the most restrictive configuration and blocks all compliant bots, not just Browser Rendering.

Block only the /crawl endpoint

The /crawl endpoint identifies itself with the User-Agent CloudflareBrowserRenderingCrawler/1.0. To block the /crawl endpoint while allowing all other traffic (including other Browser Rendering REST API endpoints, which use a different User-Agent):

robots.txt
User-agent: CloudflareBrowserRenderingCrawler
Disallow: /
User-agent: *
Allow: /

Block the /crawl endpoint on specific paths

To allow the /crawl endpoint to access your site but block specific sections:

robots.txt
User-agent: CloudflareBrowserRenderingCrawler
Disallow: /admin/
Disallow: /private/
Allow: /
User-agent: *
Allow: /

Best practices for sitemaps

Structure your sitemap to help crawlers process your site efficiently:

sitemap.xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/important-page</loc>
<lastmod>2025-01-15T00:00:00+00:00</lastmod>
<priority>1.0</priority>
</url>
<url>
<loc>https://example.com/other-page</loc>
<lastmod>2025-01-10T00:00:00+00:00</lastmod>
<priority>0.5</priority>
</url>
</urlset>
AttributePurposeRecommendation
<loc>URL of the pageRequired. Use full URLs.
<lastmod>Last modification dateInclude to help the crawler identify updated content. Use ISO 8601 format.
<priority>Relative importance (0.0-1.0)Set higher values for important pages. The crawler will process pages in priority order.

Sitemap index files

For large sites with multiple sitemaps, use a sitemap index file. Browser Rendering uses the depth parameter to control how many levels of nested sitemaps are crawled:

sitemap.xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
...
</urlset>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://www.example.com/sitemap-products.xml</loc>
</sitemap>
<sitemap>
<loc>https://www.example.com/sitemap-blog.xml</loc>
</sitemap>
</sitemapindex>

Caching headers

Browser Rendering periodically refetches sitemaps to keep content fresh. Serve your sitemap with Last-Modified or ETag response headers so the crawler can detect whether the sitemap has changed since the last fetch.

Recommendations

  • Include <lastmod> on all URLs to help identify which pages have changed. Use ISO 8601 format (for example, 2025-01-15T00:00:00+00:00).
  • Use sitemap index files for large sites with multiple sitemaps.
  • Compress large sitemaps using .gz format to reduce bandwidth.
  • Keep sitemaps under 50 MB and 50,000 URLs per file (standard sitemap limits).