Robots.txt AI Bot Checker

How it works

Every known AI crawler.
Checked in one pass.

We parse your robots.txt the same way Google and OpenAI do, then map every rule against the full directory of AI user-agents. No data leaves your browser.

Full AI bot directory

GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, Claude-Web, PerplexityBot, Perplexity-User, Google-Extended, Bytespider, CCBot, Applebot-Extended, and more.

Training vs retrieval

Training crawlers feed future models. Retrieval crawlers drive citations and referral traffic. We label every bot so you can block one without losing the other.

Specific vs wildcard

The spec says a bot-specific group overrides the * fallback. We apply the same precedence so the result matches what the crawler will actually see.

Full-site blocks

A Disallow: / inside a bot's group means the entire site is off-limits. We flag every bot currently caught by that rule - whether it's intentional or added by a plugin.

Partial blocks

If a bot is restricted from specific paths but not the whole site, we show you every Disallow it's subject to so you can confirm it's intentional.

AEO health score

We grade your configuration based on which retrieval crawlers are allowed - the ones that actually drive AI citations and referral traffic from answer engines.

100% client-side

Parsing and matching happen entirely in your browser. Nothing is uploaded, logged, or stored. Your robots.txt never leaves the page.

Questions

Bot Checker FAQs.

What does the Robots.txt AI Bot Checker do?

The Robots.txt AI Bot Checker analyzes your robots.txt file and tells you exactly which AI crawlers and bots you're currently allowing or blocking. Paste in your robots.txt content and the tool maps every rule against a comprehensive list of known AI bots - including GPTBot, ChatGPT-User, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, and others - so you can see at a glance how your site is configured for AI crawling.

Why does my robots.txt matter for AEO?

Your robots.txt file is the front door for AI crawlers. If you're blocking bots like GPTBot or PerplexityBot - whether intentionally or because a plugin, hosting provider, or security setting added those rules without your knowledge - those AI engines can't crawl your content. Content that can't be crawled can't be cited. Many site owners are unknowingly blocking the exact AI crawlers they need access from, which silently kills their AEO visibility.

Which AI bots should I know about?

The major AI crawlers active right now include GPTBot (OpenAI's training crawler), ChatGPT-User (OpenAI's live retrieval crawler for ChatGPT search), ClaudeBot (Anthropic's crawler), PerplexityBot (Perplexity AI's crawler), Google-Extended (Google's AI training crawler, separate from Googlebot), and Bytespider (ByteDance's crawler). Each serves a different purpose, and the distinction between training crawlers and retrieval crawlers is important for deciding what to allow.

What's the difference between a training crawler and a retrieval crawler?

Training crawlers scrape your content to include in the datasets used to train AI models. Your content becomes part of the model's knowledge, but you don't get attribution or a link back. Retrieval crawlers fetch your content in real time to cite it in AI-generated answers - this is the one that drives referral traffic and visibility. GPTBot is primarily a training crawler, while ChatGPT-User is a retrieval crawler. The distinction matters because you might want to block one but allow the other.

Can I block AI training but still get cited in AI answers?

In some cases, yes. OpenAI separates its training crawler (GPTBot) from its retrieval crawler (ChatGPT-User), so you can block GPTBot to keep your content out of future training data while allowing ChatGPT-User so your content can still appear in ChatGPT search results with attribution. Not every AI company makes this distinction yet, though, so the level of control varies by platform. The tool shows you which bots support this separation and how your current rules handle each one.

How do AI bot blocks end up in my robots.txt without me adding them?

This happens more often than most site owners realize. WordPress security plugins, managed hosting providers, CDN configurations, and even some SEO plugins add AI bot blocks as part of their default or recommended settings. Some hosting companies started blocking AI crawlers across all customer sites as a blanket policy. If you didn't explicitly configure your robots.txt and you're seeing AI bot blocks, one of these sources likely added them.

Will blocking AI bots protect my content from being used in AI training?

Blocking training crawlers through robots.txt signals that you don't want your content included in future training data, and the major AI companies have publicly committed to respecting these directives. However, robots.txt is a voluntary protocol - there's no technical enforcement mechanism that prevents a crawler from ignoring it. For the major players like OpenAI and Anthropic, the robots.txt block is respected. For less established crawlers, compliance varies.

Does blocking Googlebot also block AI features?

No - Googlebot and Google-Extended are separate user agents. Blocking Google-Extended prevents Google from using your content for AI training and Gemini-related features, but it does not affect your traditional Google search rankings or Googlebot's crawling. However, blocking Google-Extended may affect whether your content appears in Google's AI Overviews, which is a significant AEO consideration. The tool flags this distinction so you can make an informed decision.

What's the recommended robots.txt setup for AEO?

There's no single right answer because it depends on your priorities, but the most AEO-friendly configuration allows all retrieval crawlers (ChatGPT-User, PerplexityBot, ClaudeBot) so your content can be cited in AI-generated answers. Whether you also allow training crawlers is a separate decision that comes down to how you feel about your content being used in model training without direct attribution. The tool highlights which of your current rules affect citation visibility versus training access so you can adjust based on your own priorities.

How often should I check my robots.txt for AI bot rules?

Check it any time you update a security plugin, change hosting providers, add a CDN, or install a new SEO or performance plugin - these are the most common triggers for unexpected robots.txt changes. Beyond that, the AI bot landscape is evolving quickly. New crawlers appear regularly, and existing companies sometimes introduce new user agents. Running a check every few months ensures your configuration still reflects your intentions and hasn't been overridden by an automated update.

Robots.txt AI Bot Checker

Paste your robots.txt

Bot-by-bot breakdown

Want to be sure every AI answer engine can cite you?

Every known AI crawler.
Checked in one pass.

Full AI bot directory

Training vs retrieval

Specific vs wildcard

Full-site blocks

Partial blocks

AEO health score

100% client-side

Bot Checker FAQs.

We make sure every AI engine can cite you.

Paste your robots.txt

Bot-by-bot breakdown

Want to be sure every AI answer engine can cite you?

Every known AI crawler.Checked in one pass.

Full AI bot directory

Training vs retrieval

Specific vs wildcard

Full-site blocks

Partial blocks

AEO health score

100% client-side

Bot Checker FAQs.

We make sure every AI engine can cite you.

Every known AI crawler.
Checked in one pass.