Skip to content
Cloudflare Docs

robots.txt and sitemaps

This page provides general guidance on configuring robots.txt and sitemaps for websites you plan to access with Browser Rendering.

Identifying Browser Rendering requests

Requests can be identified by the automatic headers that Cloudflare attaches:

  • cf-brapi-request-id — Unique identifier for REST API requests
  • Signature-agent — Pointer to Cloudflare's bot verification keys

Browser Rendering has a bot detection ID of 128292352. Use this to create WAF rules that allow or block Browser Rendering traffic. For the default user agent and other identification details, refer to Automatic request headers.

Best practices for robots.txt

A well-configured robots.txt helps crawlers understand which parts of your site they can access.

Reference your sitemap

Include a reference to your sitemap in robots.txt so crawlers can discover your URLs:

robots.txt
User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml

You can list multiple sitemaps:

robots.txt
User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/blog-sitemap.xml

Set a crawl delay

Use crawl-delay to control how frequently crawlers request pages from your server:

robots.txt
User-agent: *
Crawl-delay: 2
Allow: /
Sitemap: https://example.com/sitemap.xml

The value is in seconds. A crawl-delay of 2 means the crawler waits two seconds between requests.

Best practices for sitemaps

Structure your sitemap to help crawlers process your site efficiently:

sitemap.xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/important-page</loc>
<lastmod>2025-01-15T00:00:00+00:00</lastmod>
<priority>1.0</priority>
</url>
<url>
<loc>https://example.com/other-page</loc>
<lastmod>2025-01-10T00:00:00+00:00</lastmod>
<priority>0.5</priority>
</url>
</urlset>
AttributePurposeRecommendation
<loc>URL of the pageRequired. Use full URLs.
<lastmod>Last modification dateInclude to help the crawler identify updated content. Use ISO 8601 format.
<priority>Relative importance (0.0-1.0)Set higher values for important pages. The crawler will process pages in priority order.

Sitemap index files

For large sites with multiple sitemaps, use a sitemap index file. Browser Rendering uses the depth parameter to control how many levels of nested sitemaps are crawled:

sitemap.xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
...
</urlset>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://www.example.com/sitemap-products.xml</loc>
</sitemap>
<sitemap>
<loc>https://www.example.com/sitemap-blog.xml</loc>
</sitemap>
</sitemapindex>

Caching headers

Browser Rendering periodically refetches sitemaps to keep content fresh. Serve your sitemap with Last-Modified or ETag response headers so the crawler can detect whether the sitemap has changed since the last fetch.

Recommendations

  • Include <lastmod> on all URLs to help identify which pages have changed. Use ISO 8601 format (for example, 2025-01-15T00:00:00+00:00).
  • Use sitemap index files for large sites with multiple sitemaps.
  • Compress large sitemaps using .gz format to reduce bandwidth.
  • Keep sitemaps under 50 MB and 50,000 URLs per file (standard sitemap limits).