Back to all writing
Playbook Jun 7, 2026 5 min read

Is Cloudflare hiding your site from AI? How to check and fix it.

Cloudflare now blocks AI crawlers by default — including the ones that read your site for ChatGPT, Claude, Perplexity, and Google AI Overviews. Here's the 60-second check and the fix.

Three AI crawler robots blocked by a red shield from reaching a website, illustrating Cloudflare's default AI bot blocking.

Last week I ran a routine check on our own robots.txt and found this sitting at the top of the file:

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

I never wrote those lines. Cloudflare added them. And they were quietly telling the exact AI engines we help brands appear in — ChatGPT, Claude, Google’s AI answers — to ignore our entire site.

If you’re on Cloudflare and you care about showing up in AI answers, you are very likely doing the same thing right now without knowing it. Here’s how to check in 60 seconds, what the setting actually is, and how to turn it off.

Why this is happening

In 2024–2025 Cloudflare rolled out AI crawler blocking by default for new zones, plus a managed robots.txt feature and its Content Signals Policy. The intent is reasonable: give site owners an easy switch to stop AI companies scraping their content for free.

The problem is that “AI crawler” is a blunt category. It lumps together two very different things:

  • Training crawlers — bots that hoover up your content to train or fine-tune models (GPTBot, Google-Extended, CCBot, Applebot-Extended, Bytespider). Blocking these is a legitimate IP decision.
  • Answer-engine crawlers — the bots that fetch your pages in real time to ground an answer a user just asked, then cite you (OAI-SearchBot, PerplexityBot, ClaudeBot for retrieval, Google’s indexing for AI Overviews).

Block the first group and you protect your content. Block the second group and you make yourself invisible in AI search — you opt out of the citations that send you traffic and build authority. Cloudflare’s default blocks across the board.

For a brand trying to win at GEO, that default is working directly against you.

The 60-second check

You don’t need to log into anything to find out. Your robots.txt is public. Open a terminal and run:

curl -A "GPTBot" https://yourdomain.com/robots.txt

(The -A "GPTBot" part pretends to be the OpenAI crawler, because Cloudflare sometimes serves bots a different file than humans.)

If you see this, you’re being blocked:

# BEGIN Cloudflare Managed content

User-agent: *
Content-Signal: search=yes,ai-train=no
Allow: /

User-agent: Amazonbot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: meta-externalagent
Disallow: /

# END Cloudflare Managed Content

That # BEGIN Cloudflare Managed content block is the tell. You didn’t write it — Cloudflare injects it on top of your real robots.txt. Every Disallow: / under an AI user-agent is a door closed on AI visibility.

If you see a clean file like this, you’re fine — only your own rules are present:

User-agent: *
Allow: /
Sitemap: https://yourdomain.com/sitemap-index.xml

Prefer not to use the terminal? Just visit https://yourdomain.com/robots.txt in your browser. Same file, minus the bot-spoofing.

What “Content-Signal” means

While you’re in there, you’ll likely see this line:

Content-Signal: search=yes,ai-train=no

This is Cloudflare’s Content Signals Policy — a newer, machine-readable way to express intent rather than just access. Three signals:

  • search — may you index this for traditional search results? (yes)
  • ai-input — may you feed this into a model in real time to generate an answer? (often unset)
  • ai-train — may you use this to train or fine-tune a model? (no)

Set correctly, this is actually the nuanced control you want: ai-train=no protects you from being training data, while still allowing answer engines to read and cite you. The trouble is the per-bot Disallow: / lines override that nuance with a hard no. The signals say “cite me, don’t train on me”; the bot blocks say “go away entirely.” The bot blocks win.

How to fix it in Cloudflare

This is a dashboard setting, not something in your codebase — you can’t fix it by editing files in your repo, because Cloudflare prepends its block at the edge.

  1. Log into the Cloudflare dashboard and select your domain.
  2. In the left sidebar, open AI Crawl Control (on older accounts this lives under Security → Bots, sometimes labelled “AI Audit” or “Block AI bots”).
  3. Find the toggle for blocking AI bots / the managed robots.txt feature.
  4. Turn it off — or, better, switch to a custom policy that blocks only the training crawlers and allows the answer-engine ones (see below).
  5. Save.

The change propagates across Cloudflare’s edge in seconds. It is not a deploy.

Verify the fix

Don’t trust the dashboard — confirm at the edge. Run the check again with a cache-buster so you’re not looking at a stale copy:

curl -A "GPTBot" "https://yourdomain.com/robots.txt?v=$(date +%s)"

The # BEGIN Cloudflare Managed content block should be gone, leaving only your own rules. If it’s still there, the toggle didn’t save — go back and try again. If you have caching rules on robots.txt, purge the cache in the Cloudflare dashboard too.

One caveat: crawlers cache robots.txt

Here’s the part people miss. Your file updates instantly, but the bots that read it don’t re-fetch it instantly. Crawlers cache robots.txt and only re-check periodically — Google roughly every 24 hours, others from hours to several days.

So the file is correct the moment you flip the switch, but a given AI crawler won’t act on the new permission until its next refresh. Fix it today; expect the doors to actually open over the following day or two. The sooner you flip it, the sooner that clock starts.

When you might actually want to block

To be fair: blocking AI crawlers isn’t always wrong. If your business is your content — a paywalled publisher, a research firm, a media company whose archive is the product — then keeping your work out of training sets and even out of free AI answers can be a deliberate, defensible strategy.

But for almost every brand using AI search as a growth channel, the math is the opposite. You want to be the source ChatGPT quotes, the site Perplexity links, the brand Google’s AI Overview names. You can’t be any of those if you’ve blocked the crawler at the door. The default setting optimizes for a problem most brands don’t have, at the cost of the visibility most brands want.

The takeaway

Go run the check right now:

curl -A "GPTBot" https://yourdomain.com/robots.txt

If you see Cloudflare’s managed block disallowing AI bots, you’ve been invisible to AI search — possibly for months, possibly without anyone on your team knowing. It’s a two-minute fix and a ~24-hour wait for crawlers to catch up.

Then make sure the fix actually moved the needle. Try BrandAxis free in early access — we track how your brand shows up across ChatGPT, Google AI Overviews, Perplexity, Claude, Gemini, and Grok, every day, so you can see your visibility climb once the door is open again.

Tags Playbook GEO Fundamentals