AI Crawlers: The Complete Guide
AI companies are crawling the web to train and power their models. Understanding these crawlers helps you control how your content is used.
Known AI Crawlers
| Crawler | Company | Purpose |
|---|---|---|
GPTBot |
OpenAI | Training & live retrieval |
ChatGPT-User |
OpenAI | ChatGPT browsing feature |
ClaudeBot |
Anthropic | Claude training |
Anthropic-AI |
Anthropic | Claude training |
PerplexityBot |
Perplexity | Search answers |
Cohere-AI |
Cohere | Model training |
Google-Extended |
Gemini training | |
CCBot |
Common Crawl | Open dataset (used by many) |
Should You Allow or Block?
๐ค The Trade-off
Allow: Your content gets included in AI responses โ more visibility
Block: Your content stays private โ less AI visibility
Allow If You Want:
- AI to cite and recommend your product
- Higher GEO/AEO scores
- Visibility in ChatGPT, Perplexity, etc.
- To be part of the AI knowledge base
Block If You Want:
- Keep content exclusive (paywalls)
- Prevent training on your data
- Control over content usage
- Privacy for sensitive information
robots.txt Configuration
Allow All AI Crawlers (Recommended for AEO)
User-agent: *
Allow: /
# AI Crawlers Welcome
User-agent: GPTBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: Anthropic-AI
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /
Sitemap: https://yoursite.com/sitemap.xml
Block All AI Crawlers
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Anthropic-AI
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: CCBot
Disallow: /
Selective Access
# Allow browsing, block training
User-agent: ChatGPT-User
Allow: /
User-agent: GPTBot
Disallow: /
# Allow Perplexity (good for visibility)
User-agent: PerplexityBot
Allow: /
# Block others
User-agent: ClaudeBot
Disallow: /
Beyond robots.txt
robots.txt is advisory โ crawlers can ignore it. For stronger control:
- X-Robots-Tag headers โ Server-level control
- Meta robots tags โ Page-level control
- AI.txt โ Emerging standard for AI-specific permissions
- Legal terms โ TOS restrictions
Our Recommendation
For most businesses, allowing AI crawlers is beneficial:
- More visibility in AI-powered search
- Higher chance of being recommended
- Part of the AI knowledge base
- Better GEO and AEO scores
Unless you have specific privacy or licensing concerns, welcome the bots.