AI Crawler Database

80 AI crawlers tracked

Every entry includes the exact user-agent token, parent company, purpose, robots.txt directive, and the date it was first spotted. Last updated June 2026.

Bot	Company	User-Agent	Purpose	Robots.txt	First Spotted
GPTBot	OpenAI	`GPTBot`	Training	Yes	2023-08
OAI-SearchBot	OpenAI	`OAI-SearchBot`	AI Search	Yes	2024-05
ChatGPT-User	OpenAI	`ChatGPT-User`	User-Triggered	Partial	2023-06
ChatGPT Atlas	OpenAI	`(Chrome UA)`	Agentic Browser	No	2024-10
Operator (Agent Mode)	OpenAI	`(Chrome UA)`	Agentic Browser	No	2025-01
ClaudeBot	Anthropic	`ClaudeBot`	Training	Yes	2024-04
Claude-SearchBot	Anthropic	`Claude-SearchBot`	AI Search	Yes	2026-02
Claude-User	Anthropic	`Claude-User`	User-Triggered	Yes	2024-08
anthropic-ai	Anthropic	`anthropic-ai`	Training (Legacy)	Yes	2023-06
claude-web	Anthropic	`claude-web`	Browsing (Legacy)	Yes	2023-07
Google-Extended	Google	`Google-Extended`	Training Control	Yes	2023-09
Gemini-AI	Google	`Gemini`	AI Search	Yes	2024-02
Google-CloudVertexBot	Google	`Google-CloudVertexBot`	AI Platform	Yes	2024-04
GoogleAgent-Mariner	Google	`GoogleAgent-Mariner`	Agentic Browser	Yes	2025-05
Google-NotebookLM	Google	`Google-NotebookLM`	User-Triggered	Yes	2024-06
Gemini-Deep-Research	Google	`Gemini-Deep-Research`	AI Search	Yes	2025-03
GoogleAgent-URLContext	Google	`GoogleAgent-URLContext`	User-Triggered	Yes	2025-04
PerplexityBot	Perplexity	`PerplexityBot`	AI Search	Disputed	2022-12
Perplexity-User	Perplexity	`Perplexity-User`	User-Triggered	No	2024-06
Bingbot	Microsoft	`Bingbot`	Search + AI	Yes	2010-10
Applebot	Apple	`Applebot`	Search + AI	Yes	2015-05
Applebot-Extended	Apple	`Applebot-Extended`	Training Control	Yes	2024-06
FacebookBot	Meta	`FacebookBot`	Social + AI	Yes	2019-04
meta-externalagent	Meta	`meta-externalagent`	AI Training	Yes	2024-07
Meta-ExternalFetcher	Meta	`Meta-ExternalFetcher`	AI Fetcher	Yes	2024-09
Bytespider	ByteDance	`Bytespider`	Training + AI	Disputed	2021-07
TikTokSpider	ByteDance	`TikTokSpider`	Social + AI	Unclear	2023-11
GrokBot	xAI	`GrokBot`	AI Search / Training	Unclear	2024-02
xAI-Grok	xAI	`xAI-Grok`	AI Search	Unclear	2024-03
Grok-DeepSearch	xAI	`Grok-DeepSearch`	AI Search	Unclear	2024-11
xAI-Bot	xAI	`xAI-Bot`	Training	Unclear	2024-02
DeepSeekBot	DeepSeek	`DeepSeekBot`	Training	Unclear	2024-01
MistralAI-User	Mistral	`MistralAI-User`	User-Triggered	Yes	2024-10
MistralBot	Mistral	`MistralBot`	Training	Yes	2024-05
Amazonbot	Amazon	`Amazonbot`	AI Assistant	Yes	2018-11
bedrockbot	Amazon	`bedrockbot`	AI Platform	Yes	2024-07
CCBot	Common Crawl	`CCBot`	Open Data / Training	Yes	2008-01
cohere-ai	Cohere	`cohere-ai`	Training	Yes	2023-05
DuckAssistBot	DuckDuckGo	`DuckAssistBot`	AI Search	Yes	2023-03
Bravebot	Brave	`Bravebot`	AI Search	Yes	2021-06
YouBot	You.com	`YouBot`	AI Search	Yes	2022-09
Diffbot	Diffbot	`Diffbot`	Data Extraction	Yes	2015-03
LinkedInBot	LinkedIn	`LinkedInBot`	Social + AI	Yes	2015-01
AI2Bot	Allen Institute	`AI2Bot`	Research / Training	Yes	2024-07
AI2Bot-Dolma	Allen Institute	`AI2Bot-Dolma`	Training Data	Yes	2024-07
HuggingFaceBot	Hugging Face	`HuggingFaceBot`	Training	Yes	2024-08
PetalBot	Huawei	`PetalBot`	Search + AI	Yes	2020-02
ChatGLM-Spider	Zhipu AI	`ChatGLM-Spider`	Training	Unclear	2023-10
Baidu-Spider-AI	Baidu	`Baidu-Spider-AI`	Training + AI	Yes	2023-03
TencentBot	Tencent	`TencentBot`	AI Training	Unclear	2023-06
360Spider	Qihoo 360	`360Spider`	Search + AI	Unclear	2013-04
Sogou	Sogou	`Sogou`	Search + AI	Yes	2010-06
WRTNBot	WRTN	`WRTNBot`	AI Search	Unclear	2024-05
SBIntuitionsBot	SB Intuitions	`SBIntuitionsBot`	AI Training	Unclear	2024-04
Cloudflare-AI-Search	Cloudflare	`Cloudflare-AI-Search`	AI Search	Yes	2024-09
Cloudflare-AutoRAG	Cloudflare	`Cloudflare-AutoRAG`	AI RAG	Yes	2025-02
PhindBot	Phind	`PhindBot`	AI Search	Yes	2023-02
ExaBot	Exa	`ExaBot`	AI Search	Yes	2023-08
TavilyBot	Tavily	`TavilyBot`	AI Search API	Yes	2024-01
iaskspider	iAsk	`iaskspider`	AI Search	Yes	2023-09
AndiBot	Andi	`AndiBot`	AI Search	Yes	2022-04
kagi-fetcher	Kagi	`kagi-fetcher`	AI Search	Yes	2023-10
LinerBot	Liner	`LinerBot`	AI Search	Unclear	2024-03
Anomura	Anomura	`Anomura`	AI Search	Unclear	2024-08
Timpibot	Timpi	`Timpibot`	Decentralized Search	Yes	2023-07
Devin	Cognition	`Devin`	AI Agent	Unclear	2024-03
FirecrawlAgent	Firecrawl	`FirecrawlAgent`	Scraping API	Partial	2024-04
Crawl4AI	Crawl4AI	`Crawl4AI`	Scraping Tool	Partial	2024-06
ApifyBot	Apify	`ApifyBot`	Scraping Platform	Partial	2017-03
omgili	Omgili	`omgili`	Forum Indexing	Yes	2008-04
webzio-extended	Webz.io	`webzio-extended`	Data Broker	Yes	2023-11
ImagesiftBot	The Hive	`ImagesiftBot`	Image AI	Unclear	2023-05
Kangaroo Bot	Kangaroo	`Kangaroo Bot`	AI Search	Unclear	2024-02
Brightbot	Bright Data	`Brightbot`	Data Collection	Unclear	2023-12
SemrushBot-OCOB	Semrush	`SemrushBot-OCOB`	SEO + AI	Yes	2024-05
DataForSeoBot	DataForSEO	`DataForSeoBot`	SEO Data + AI	Yes	2021-08
TurnitinBot	Turnitin	`TurnitinBot`	AI Detection	Yes	2002-09
PanguBot	Huawei	`PanguBot`	Training	Unclear	2023-08
Sentibot	Sentibot	`Sentibot`	Sentiment Analysis	Unclear	2023-04
VelenPublicWebCrawler	Velen	`VelenPublicWebCrawler`	Data Collection	Unclear	2023-09

Want to know which of these bots your site is actually blocking right now?

Run your robots.txt through our free AI Bot Checker - it maps every rule against this database and grades your configuration.

Check your robots.txt →

Questions

AI Crawler Database FAQs.

What is the AI Crawler Database?

The AI Crawler Database is a living reference table of every known AI bot and crawler user agent string currently active on the web. Each entry includes the exact user agent string, the company behind it, whether it's used for training or real-time retrieval, how to block or allow it in your robots.txt, and when it was first documented. The database is updated monthly to keep pace with new bots as they appear.

How many AI crawlers are there?

More than most site owners realize. Beyond the well-known bots from OpenAI, Google, Anthropic, and Perplexity, there are dozens of AI crawlers operated by companies building AI products, search tools, data aggregation services, and large language models across the globe. The number grows every few months as new AI products launch and existing companies spin up additional crawlers for different purposes. The database tracks all of them in one place so you don't have to piece it together from scattered blog posts and changelog entries.

What information is included for each bot?

Every entry in the database includes the exact user agent string (what you'd reference in your robots.txt), the parent company or organization, the bot's primary purpose (model training, live retrieval, research indexing, etc.), the specific robots.txt directive to block or allow it, the date it was first spotted in the wild, and any known behavioral notes - like whether the company has publicly committed to respecting robots.txt directives.

Why does it matter whether a bot is for training or retrieval?

This is the single most important distinction in the database. Training crawlers scrape your content to feed into AI model training datasets - your content becomes embedded in the model's weights, but you receive no attribution or traffic in return. Retrieval crawlers fetch your content in real time when a user asks a question, and the AI engine cites your page as a source with a link. Blocking a training crawler protects your content from uncompensated use. Blocking a retrieval crawler cuts you off from AI-driven referral traffic and citations. These are very different decisions, and the database clearly labels each bot so you can make them separately.

How does this pair with the Robots.txt AI Bot Checker?

The two tools are designed to work together. The Robots.txt AI Bot Checker analyzes your current robots.txt file and shows you which AI bots you're blocking or allowing. The AI Crawler Database is the reference you use to understand what each of those bots actually does and make informed decisions about which ones you want to allow. Check your current configuration with the Robots.txt tool, then look up any unfamiliar bots in the database to decide whether your rules match your intentions.

How often is the database updated?

Monthly, at minimum. The AI crawler landscape shifts frequently - new bots appear, existing bots change user agent strings, companies launch separate crawlers for different functions, and some bots get deprecated entirely. Each monthly update includes new bots that have been documented, changes to existing entries, and removal of bots that are no longer active. Major changes between scheduled updates (like a major AI company launching a new crawler) are added as they're confirmed.

How do you identify and verify new AI bots?

New bots are identified through a combination of server log analysis, industry documentation, official company announcements, community reports, and monitoring of robots.txt changes across major websites. Each bot is verified against its parent company's public documentation where available. If a bot appears in logs but has no official documentation, it's included in the database with a note indicating its unverified status so you can make your own call on how to handle it.

Are all AI crawlers required to identify themselves?

No. User agent strings are self-reported, and there's no technical requirement that forces a crawler to accurately identify itself. Reputable companies use identifiable user agent strings and document them publicly. Less transparent operators may use generic or misleading user agent strings, making them harder to identify and block individually. The database notes where transparency concerns exist so you can factor that into your decisions.

Can I block all AI bots at once?

You can add individual disallow rules for each known AI bot user agent string, and the database provides the exact directives for every entry. Some site owners use a blanket approach with broad wildcard rules, but this carries risk - overly aggressive blocking can inadvertently affect legitimate crawlers or future bots that use similar naming patterns. The more targeted approach is using the specific user agent strings listed in the database so you control exactly what's blocked and what isn't.

Should I block AI crawlers I don't recognize?

It depends on your default posture. If your priority is protecting your content and you'd rather opt in to specific bots, blocking unfamiliar crawlers is the safer approach. If your priority is maximizing AI visibility and you'd rather opt out of specific bots, leaving unfamiliar crawlers unblocked gives you broader coverage. The database helps either way - it gives you the information to move any unfamiliar bot from the "unknown" column into an informed decision.

AI Crawler Database

80 AI crawlers tracked

Want to know which of these bots your site is actually blocking right now?

Every AI crawler.
One canonical table.

Every known bot

Training vs retrieval

Exact block snippets

Search, sort, filter

Compliance flags

First-spotted dates

Updated monthly

AI Crawler Database FAQs.

We make sure every AI engine can cite you.

80 AI crawlers tracked

Want to know which of these bots your site is actually blocking right now?

Every AI crawler.One canonical table.

Every known bot

Training vs retrieval

Exact block snippets

Search, sort, filter

Compliance flags

First-spotted dates

Updated monthly

AI Crawler Database FAQs.

We make sure every AI engine can cite you.

Every AI crawler.
One canonical table.