AI Bot Permissions in robots.txt: The Complete Guide
Your robots.txt file controls which bots can access your website. If the major AI crawlers are blocked - even accidentally - you are invisible to AI search. This guide shows you exactly how to fix that.
Why this is the most important GEO fix
Of all the changes you can make to improve AI search visibility, fixing your robots.txt is the most urgent - because if AI crawlers are blocked, no other GEO work matters. A website with perfect schema markup and brilliant content is still completely invisible to AI search if GPTBot cannot get in the door.
Our analysis of 12,000 websites found that 73% block at least one major AI crawler. The vast majority do this accidentally - through legacy robots.txt rules written before AI search existed.
The most common accidental AI block
The most frequent culprit is a blanket disallow rule:
User-agent: *
Disallow: /
This tells every bot - including all AI crawlers - that they cannot access any page on your site. It is often added to staging or development sites and accidentally left in place, or added to "protect" a site from spam bots without realising it blocks everything.
All major AI crawler user-agents
| Crawler name | AI engine | Purpose |
|---|---|---|
| GPTBot | OpenAI / ChatGPT | Web retrieval for ChatGPT Browse |
| OAI-SearchBot | OpenAI | Secondary OpenAI crawler |
| PerplexityBot | Perplexity AI | Web retrieval for Perplexity search |
| ClaudeBot | Anthropic | Web retrieval for Claude |
| anthropic-ai | Anthropic | Anthropic web crawler |
| cohere-ai | Cohere | Cohere retrieval |
| Googlebot | Google AI Overviews | Used for AI Overviews (same as standard Google) |
The recommended robots.txt configuration
To allow all major AI crawlers while maintaining any existing rules for other bots:
# Allow all major AI search crawlers
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: anthropic-ai
Allow: /
User-agent: cohere-ai
Allow: /
# Your existing rules below
User-agent: *
Disallow: /wp-admin/
Disallow: /private/
Separating search retrieval from training data
Some website owners want to allow AI search retrieval (citations) while blocking training data collection. The rules differ by provider. For OpenAI specifically, GPTBot is used for live retrieval while other agents are used for training. Check each provider's published crawler documentation for the most current guidance, as policies evolve frequently.
How to test your configuration
- Visit
yoursite.com/robots.txtand review the rules - Use Google Search Console's robots.txt tester (it works for any user-agent, not just Googlebot)
- After making changes, wait 24 to 48 hours before re-testing, as crawlers cache robots.txt files
- Run a SearchScore audit to verify AI citability signals are now passing
Check your AI search visibility
Free audit. Instant results. No sign-up required.
Get My Free GEO Score →