Instruct AI crawlers with managed robots.txt
Protect your website or application from AI crawlers by implementing a robots.txt
file on your domain to direct AI bot operators on what content they can and cannot scrape for AI model training.
AI bots are expected to follow the robots.txt
directives.
robots.txt
files express your preferences. They do not prevent crawler operators from crawling your content at a technical level. Some crawler operators may disregard your robots.txt
preferences and crawl your content regardless of what your robots.txt
file says.
Cloudflare will independently check whether your website has an existing robots.txt
file and update the behavior of this feature based on your website.
If your website already has a robots.txt
file — verified by a HTTP 200
response — Cloudflare will prepend our managed robots.txt
before your existing robots.txt
, combining both into a single response.
For example, without this feature enabled, the robots.txt
content of crawlstop.com
would be:
User-agent: *Disallow: /lpDisallow: /feedbackDisallow: /langtest
Sitemap: https://www.crawlstop.com/sitemap.xml
With the managed robots.txt
enabled, Cloudflare will prepend our managed content before your original content, resulting in what you can view at https://www.crawlstop.com/robots.txt ↗.
# As a condition of accessing this website, you agree to abide by the# following content-signals:
# (a) If a content-signal = yes, you may collect content for the# corresponding use.# (b) If a content-signal = no, you may not collect content for the# corresponding use.# (c) If the website operator does not include a content signal for a# corresponding use, the website operator neither grants nor restricts# permission via content signal with respect to the corresponding use.
# The content signals and their meanings are:
# search: building a search index and providing search results (e.g., returning# hyperlinks and short excerpts from your website's contents). Search# does not include providing AI-generated search summaries.# ai-input: inputting content into one or more AI models (e.g., retrieval# augmented generation, grounding, or other real-time taking of# content for generative AI search answers).# ai-train: training or fine-tuning AI models.
# ANY RESTRICTIONS EXPRESSED VIA CONTENT-SIGNALS ARE EXPRESS RESERVATIONS OF# RIGHTS UNDER ARTICLE 4 OF THE EUROPEAN UNION DIRECTIVE 2019/790 ON COPYRIGHT# AND RELATED RIGHTS IN THE DIGITAL SINGLE MARKET.
# BEGIN Cloudflare Managed content
User-Agent: *Content-signal: search=yes,ai-train=noAllow: /
User-agent: AmazonbotDisallow: /
User-agent: Applebot-ExtendedDisallow: /
User-agent: BytespiderDisallow: /
User-agent: CCBotDisallow: /
User-agent: ClaudeBotDisallow: /
User-agent: Google-ExtendedDisallow: /
User-agent: GPTBotDisallow: /
User-agent: meta-externalagentDisallow: /
# END Cloudflare Managed ContentUser-agent: *Disallow: /lpDisallow: /feedbackDisallow: /langtest
Sitemap: https://www.crawlstop.com/sitemap.xml
If your website does not have a robots.txt
file, Cloudflare creates a new file with our managed block directives and serves it for you.
To implement a robots.txt
file on your domain:
- Log in to the Cloudflare dashboard ↗, and select your account and domain.
- Go to Security > Bots.
- Select Configure Bot Fight Mode.
- Turn Instruct bot traffic with robots.txt on.
-
In the Cloudflare dashboard, go to the Security Settings page.
Go to Settings -
Filter by Bot traffic.
-
Go to Instruct AI bot traffic with robots.txt.
-
Turn Instruct AI bot traffic with robots.txt on.
Free zones that do not have their own robots.txt
file and do not use the managed robots.txt
feature will display the Content Signals Policy when a crawler requests the robots.txt
file for your zone.
This file only outlines the Content Signals framework. It does not express your preferences or rights associated with your content.
# As a condition of accessing this website, you agree to abide by the# following content-signals:
# (a) If a content-signal = yes, you may collect content for the# corresponding use.# (b) If a content-signal = no, you may not collect content for the# corresponding use.# (c) If the website operator does not include a content signal for a# corresponding use, the website operator neither grants nor restricts# permission via content signal with respect to the corresponding use.
# The content signals and their meanings are:
# search: building a search index and providing search results (e.g., returning# hyperlinks and short excerpts from your website's contents). Search# does not include providing AI-generated search summaries.# ai-input: inputting content into one or more AI models (e.g., retrieval# augmented generation, grounding, or other real-time taking of# content for generative AI search answers).# ai-train: training or fine-tuning AI models.
# ANY RESTRICTIONS EXPRESSED VIA CONTENT-SIGNALS ARE EXPRESS RESERVATIONS OF# RIGHTS UNDER ARTICLE 4 OF THE EUROPEAN UNION DIRECTIVE 2019/790 ON COPYRIGHT# AND RELATED RIGHTS IN THE DIGITAL SINGLE MARKET.
Cloudflare's Content Signals Policy is included by default in the robots.txt
file when you turn on Instruct AI bot traffic with robots.txt.
If you would like to opt out of displaying the policy in your robots.txt
file, you can uncheck Display Content Signals Policy under Control AI Crawlers in your zone's overview.
Alternatively, you can use Security Settings.
Managed robots.txt
for AI crawlers is available on all plans.
Was this helpful?
- Resources
- API
- New to Cloudflare?
- Directory
- Sponsorships
- Open Source
- Support
- Help Center
- System Status
- Compliance
- GDPR
- Company
- cloudflare.com
- Our team
- Careers
- © 2025 Cloudflare, Inc.
- Privacy Policy
- Terms of Use
- Report Security Issues
- Trademark
-