Home / Tools / AI Crawler Database

AI Crawler Database

A living reference table of every known AI bot user agent string on the web. Search, sort, and filter 80 crawlers across training, retrieval, search, and agentic categories - with the exact robots.txt directive to block or allow each one. Pairs perfectly with the Robots.txt AI Bot Checker.

80 AI crawlers tracked

Every entry includes the exact user-agent token, parent company, purpose, robots.txt directive, and the date it was first spotted. Last updated April 2026.

Bot Company User-Agent Purpose Robots.txt First Spotted
GPTBot OpenAI GPTBot Training Yes 2023-08
OAI-SearchBot OpenAI OAI-SearchBot AI Search Yes 2024-05
ChatGPT-User OpenAI ChatGPT-User User-Triggered Partial 2023-06
ChatGPT Atlas OpenAI (Chrome UA) Agentic Browser No 2024-10
Operator (Agent Mode) OpenAI (Chrome UA) Agentic Browser No 2025-01
ClaudeBot Anthropic ClaudeBot Training Yes 2024-04
Claude-SearchBot Anthropic Claude-SearchBot AI Search Yes 2026-02
Claude-User Anthropic Claude-User User-Triggered Yes 2024-08
anthropic-ai Anthropic anthropic-ai Training (Legacy) Yes 2023-06
claude-web Anthropic claude-web Browsing (Legacy) Yes 2023-07
Google-Extended Google Google-Extended Training Control Yes 2023-09
Gemini-AI Google Gemini AI Search Yes 2024-02
Google-CloudVertexBot Google Google-CloudVertexBot AI Platform Yes 2024-04
GoogleAgent-Mariner Google GoogleAgent-Mariner Agentic Browser Yes 2025-05
Google-NotebookLM Google Google-NotebookLM User-Triggered Yes 2024-06
Gemini-Deep-Research Google Gemini-Deep-Research AI Search Yes 2025-03
GoogleAgent-URLContext Google GoogleAgent-URLContext User-Triggered Yes 2025-04
PerplexityBot Perplexity PerplexityBot AI Search Disputed 2022-12
Perplexity-User Perplexity Perplexity-User User-Triggered No 2024-06
Bingbot Microsoft Bingbot Search + AI Yes 2010-10
Applebot Apple Applebot Search + AI Yes 2015-05
Applebot-Extended Apple Applebot-Extended Training Control Yes 2024-06
FacebookBot Meta FacebookBot Social + AI Yes 2019-04
meta-externalagent Meta meta-externalagent AI Training Yes 2024-07
Meta-ExternalFetcher Meta Meta-ExternalFetcher AI Fetcher Yes 2024-09
Bytespider ByteDance Bytespider Training + AI Disputed 2021-07
TikTokSpider ByteDance TikTokSpider Social + AI Unclear 2023-11
GrokBot xAI GrokBot AI Search / Training Unclear 2024-02
xAI-Grok xAI xAI-Grok AI Search Unclear 2024-03
Grok-DeepSearch xAI Grok-DeepSearch AI Search Unclear 2024-11
xAI-Bot xAI xAI-Bot Training Unclear 2024-02
DeepSeekBot DeepSeek DeepSeekBot Training Unclear 2024-01
MistralAI-User Mistral MistralAI-User User-Triggered Yes 2024-10
MistralBot Mistral MistralBot Training Yes 2024-05
Amazonbot Amazon Amazonbot AI Assistant Yes 2018-11
bedrockbot Amazon bedrockbot AI Platform Yes 2024-07
CCBot Common Crawl CCBot Open Data / Training Yes 2008-01
cohere-ai Cohere cohere-ai Training Yes 2023-05
DuckAssistBot DuckDuckGo DuckAssistBot AI Search Yes 2023-03
Bravebot Brave Bravebot AI Search Yes 2021-06
YouBot You.com YouBot AI Search Yes 2022-09
Diffbot Diffbot Diffbot Data Extraction Yes 2015-03
LinkedInBot LinkedIn LinkedInBot Social + AI Yes 2015-01
AI2Bot Allen Institute AI2Bot Research / Training Yes 2024-07
AI2Bot-Dolma Allen Institute AI2Bot-Dolma Training Data Yes 2024-07
HuggingFaceBot Hugging Face HuggingFaceBot Training Yes 2024-08
PetalBot Huawei PetalBot Search + AI Yes 2020-02
ChatGLM-Spider Zhipu AI ChatGLM-Spider Training Unclear 2023-10
Baidu-Spider-AI Baidu Baidu-Spider-AI Training + AI Yes 2023-03
TencentBot Tencent TencentBot AI Training Unclear 2023-06
360Spider Qihoo 360 360Spider Search + AI Unclear 2013-04
Sogou Sogou Sogou Search + AI Yes 2010-06
WRTNBot WRTN WRTNBot AI Search Unclear 2024-05
SBIntuitionsBot SB Intuitions SBIntuitionsBot AI Training Unclear 2024-04
Cloudflare-AI-Search Cloudflare Cloudflare-AI-Search AI Search Yes 2024-09
Cloudflare-AutoRAG Cloudflare Cloudflare-AutoRAG AI RAG Yes 2025-02
PhindBot Phind PhindBot AI Search Yes 2023-02
ExaBot Exa ExaBot AI Search Yes 2023-08
TavilyBot Tavily TavilyBot AI Search API Yes 2024-01
iaskspider iAsk iaskspider AI Search Yes 2023-09
AndiBot Andi AndiBot AI Search Yes 2022-04
kagi-fetcher Kagi kagi-fetcher AI Search Yes 2023-10
LinerBot Liner LinerBot AI Search Unclear 2024-03
Anomura Anomura Anomura AI Search Unclear 2024-08
Timpibot Timpi Timpibot Decentralized Search Yes 2023-07
Devin Cognition Devin AI Agent Unclear 2024-03
FirecrawlAgent Firecrawl FirecrawlAgent Scraping API Partial 2024-04
Crawl4AI Crawl4AI Crawl4AI Scraping Tool Partial 2024-06
ApifyBot Apify ApifyBot Scraping Platform Partial 2017-03
omgili Omgili omgili Forum Indexing Yes 2008-04
webzio-extended Webz.io webzio-extended Data Broker Yes 2023-11
ImagesiftBot The Hive ImagesiftBot Image AI Unclear 2023-05
Kangaroo Bot Kangaroo Kangaroo Bot AI Search Unclear 2024-02
Brightbot Bright Data Brightbot Data Collection Unclear 2023-12
SemrushBot-OCOB Semrush SemrushBot-OCOB SEO + AI Yes 2024-05
DataForSeoBot DataForSEO DataForSeoBot SEO Data + AI Yes 2021-08
TurnitinBot Turnitin TurnitinBot AI Detection Yes 2002-09
PanguBot Huawei PanguBot Training Unclear 2023-08
Sentibot Sentibot Sentibot Sentiment Analysis Unclear 2023-04
VelenPublicWebCrawler Velen VelenPublicWebCrawler Data Collection Unclear 2023-09

Want to know which of these bots your site is actually blocking right now?

Run your robots.txt through our free AI Bot Checker - it maps every rule against this database and grades your configuration.

Check your robots.txt →

Every AI crawler.
One canonical table.

The AI crawler landscape changes every month. We track new bots, deprecate old ones, and keep the user-agent strings accurate so you don't have to piece them together from scattered blog posts.

01

Every known bot

80 AI crawlers across OpenAI, Anthropic, Google, Perplexity, Meta, ByteDance, xAI, Baidu, Huawei, and dozens more - all in one searchable table.

02

Training vs retrieval

Every entry is clearly labeled. Training crawlers feed future models. Retrieval crawlers drive citations and referral traffic. The distinction drives very different decisions.

03

Exact block snippets

Expand any row to see the exact User-agent / Disallow directive for that bot. Copy it straight into your robots.txt with one click.

04

Search, sort, filter

Full-text search across all columns. Sort by company, purpose, or first-spotted date. Filter by purpose (training, retrieval, search, agentic) or robots.txt compliance.

05

Compliance flags

Every bot is tagged by whether it respects robots.txt - yes, partial, disputed, unclear, or no. Stealth crawlers and residential-IP scrapers are called out explicitly.

06

First-spotted dates

Every entry includes the month we first documented the user-agent in the wild, so you can tell brand-new crawlers from established ones at a glance.

07

Updated monthly

New bots appear, existing bots change user-agent strings, companies deprecate old crawlers. We refresh the database monthly so your reference stays current.

AI Crawler Database FAQs.

What is the AI Crawler Database?
The AI Crawler Database is a living reference table of every known AI bot and crawler user agent string currently active on the web. Each entry includes the exact user agent string, the company behind it, whether it's used for training or real-time retrieval, how to block or allow it in your robots.txt, and when it was first documented. The database is updated monthly to keep pace with new bots as they appear.
How many AI crawlers are there?
More than most site owners realize. Beyond the well-known bots from OpenAI, Google, Anthropic, and Perplexity, there are dozens of AI crawlers operated by companies building AI products, search tools, data aggregation services, and large language models across the globe. The number grows every few months as new AI products launch and existing companies spin up additional crawlers for different purposes. The database tracks all of them in one place so you don't have to piece it together from scattered blog posts and changelog entries.
What information is included for each bot?
Every entry in the database includes the exact user agent string (what you'd reference in your robots.txt), the parent company or organization, the bot's primary purpose (model training, live retrieval, research indexing, etc.), the specific robots.txt directive to block or allow it, the date it was first spotted in the wild, and any known behavioral notes - like whether the company has publicly committed to respecting robots.txt directives.
Why does it matter whether a bot is for training or retrieval?
This is the single most important distinction in the database. Training crawlers scrape your content to feed into AI model training datasets - your content becomes embedded in the model's weights, but you receive no attribution or traffic in return. Retrieval crawlers fetch your content in real time when a user asks a question, and the AI engine cites your page as a source with a link. Blocking a training crawler protects your content from uncompensated use. Blocking a retrieval crawler cuts you off from AI-driven referral traffic and citations. These are very different decisions, and the database clearly labels each bot so you can make them separately.
How does this pair with the Robots.txt AI Bot Checker?
The two tools are designed to work together. The Robots.txt AI Bot Checker analyzes your current robots.txt file and shows you which AI bots you're blocking or allowing. The AI Crawler Database is the reference you use to understand what each of those bots actually does and make informed decisions about which ones you want to allow. Check your current configuration with the Robots.txt tool, then look up any unfamiliar bots in the database to decide whether your rules match your intentions.
How often is the database updated?
Monthly, at minimum. The AI crawler landscape shifts frequently - new bots appear, existing bots change user agent strings, companies launch separate crawlers for different functions, and some bots get deprecated entirely. Each monthly update includes new bots that have been documented, changes to existing entries, and removal of bots that are no longer active. Major changes between scheduled updates (like a major AI company launching a new crawler) are added as they're confirmed.
How do you identify and verify new AI bots?
New bots are identified through a combination of server log analysis, industry documentation, official company announcements, community reports, and monitoring of robots.txt changes across major websites. Each bot is verified against its parent company's public documentation where available. If a bot appears in logs but has no official documentation, it's included in the database with a note indicating its unverified status so you can make your own call on how to handle it.
Are all AI crawlers required to identify themselves?
No. User agent strings are self-reported, and there's no technical requirement that forces a crawler to accurately identify itself. Reputable companies use identifiable user agent strings and document them publicly. Less transparent operators may use generic or misleading user agent strings, making them harder to identify and block individually. The database notes where transparency concerns exist so you can factor that into your decisions.
Can I block all AI bots at once?
You can add individual disallow rules for each known AI bot user agent string, and the database provides the exact directives for every entry. Some site owners use a blanket approach with broad wildcard rules, but this carries risk - overly aggressive blocking can inadvertently affect legitimate crawlers or future bots that use similar naming patterns. The more targeted approach is using the specific user agent strings listed in the database so you control exactly what's blocked and what isn't.
Should I block AI crawlers I don't recognize?
It depends on your default posture. If your priority is protecting your content and you'd rather opt in to specific bots, blocking unfamiliar crawlers is the safer approach. If your priority is maximizing AI visibility and you'd rather opt out of specific bots, leaving unfamiliar crawlers unblocked gives you broader coverage. The database helps either way - it gives you the information to move any unfamiliar bot from the "unknown" column into an informed decision.

We make sure every AI engine can cite you.

Our writers and editors audit robots.txt, schema, and fifty other AEO signals on every post we ship. First month is on us.

Claim Your Free Month →