Troubleshooting crawl errors
Cloudflare allows search engine crawlers and bots. If you observe crawl issues or Cloudflare challenges presented to the search engine crawler or bot, contact Cloudflare support with the information you gather when troubleshooting the crawl errors via the methods outlined in this guide.
Search engine crawlers' requests, when proxied through Cloudflare, can be blocked by anti-bot modules installed on your origin server. Try disabling any anti-bot modules to prevent your origin from blocking these requests.
To optimize CDN performance, Google and Bing assign special crawl rates to websites that use CDN services in order. Special crawl rates do not negatively affect Search Engine Optimization (SEO) and Search Engine Results Pages (SERPs). To change your crawl rates for Bing and Google, follow the guides below:
- Change the Google crawl rate by reviewing Google’s documentation ↗.
- Change your Bing crawl rate via guidance from Bing’s documentation:
Review the following recommendations to prevent crawler errors:
-
Monitor the performance and availability of your website using a third-party tool:
-
Do not block Google crawler IP addresses via custom rules or IP Access rules within the Security app. If you are using rate limiting rules, make sure they do not apply to the Google crawler.
Confirm an IP address belongs to Google by consulting Google’s documentation on verifying googlebot IP addresses ↗.
- Do not block the United States via custom rules or IP Access rules within the Security app.
- Do not block or User-Agents in your .htaccess, server configuration, robots.txt ↗, or web application.
Google uses a variety of User-Agents ↗ to crawl your website. You can test your robots.txt via Google ↗.
- Do not allow crawling of files in the /cdn-cgi/ directory. This path is used internally by Cloudflare and Google encounters errors when crawling it. Disallow crawls of cdn-cgi via robots.txt:
Disallow: /cdn-cgi/
- Ensure your robots.txt file allows the AdSense crawler ↗.
- Restore original visitor IP addresses ↗ in your server logs.
Troubleshooting steps for the most commonly reported crawl errors are mentioned below.
HTTP 4XX errors ↗ are the most common type of crawl error. Cloudflare delivers these errors from your web server to Google. These errors are caused for various reasons such as a missing page on your web server or a malformed link in your HTML. The solution depends upon the problem encountered.
HTTP 5XX errors ↗ indicate that either Cloudflare or your origin web server experienced an internal error. To correlate occurrences of crawl errors with site outages, monitor your origin web server’s health. Monitoring your website health both through Cloudflare and directly to your origin web server IPs determines whether errors occurred due to Cloudflare or your origin web server.
Troubleshooting steps vary depending on whether your domain is on Cloudflare via a Full or CNAME setup. To verify which setup your domain uses, open a terminal and execute the following command (replace www.example.com ↗ with your Cloudflare domain):
dig +short SOA
_www.example.com_
For domains on a CNAME setup, the result response contains cdn.cloudflare.net. For example:
example.com.cdn.cloudflare.net.
For domains on a Full setup, the result response contains the cloudflare.com domain in the nameservers listed. For example:
josh.ns.cloudflare.com. dns.cloudflare.com. 2013050901 10000 2400 604800 3600
Once you’ve confirmed how your domain was setup with Cloudflare, proceed with the troubleshooting steps appropriate to your domain setup.
CNAME
Contact your hosting provider to investigate DNS errors and provide the date Google encountered DNS errors. Additionally, review the Cloudflare System Status ↗ page for any network outages on the date the errors were encountered by Google.
Full
Contact Cloudflare support and provide the date and time that Google observed the errors.
If the above troubleshooting steps do not resolve your crawl errors, follow the steps below to export crawler errors as a .csv file from your Google Webmaster Tools Dashboard. Include this .csv file when contacting Cloudflare Support.
- Log in to your Google Webmaster Tools account and navigate to the Health section of the affected domain.
- Click Crawl Errors in the left hand navigation.
- Click Download to export the list of errors as a .csv file.
- Provide the downloaded .csv file to Cloudflare support.
Google’s documentation on crawl errors and troubleshooting ↗