8 min read

AI Bot Permissions in robots.txt: The Complete Guide

Your robots.txt file controls which bots can access your website. If the major AI crawlers are blocked - even accidentally - you are invisible to AI search. This guide shows you exactly how to fix that.

Key Takeaway

Correctly configuring robots.txt for AI crawlers requires explicitly allowing each major crawler user-agent - GPTBot, PerplexityBot, ClaudeBot, and Google-Extended - since the common 'Disallow: /' pattern blocks all bots including AI search engines.

Why this is the most important GEO fix

Of all the changes you can make to improve AI search visibility, fixing your robots.txt is the most urgent - because if AI crawlers are blocked, no other GEO work matters. A website with perfect schema markup and brilliant content is still completely invisible to AI search if GPTBot cannot get in the door.

Our analysis of 12,000 websites found that 73% block at least one major AI crawler. The vast majority do this accidentally - through legacy robots.txt rules written before AI search existed.

The most common accidental AI block

The most frequent culprit is a blanket disallow rule:

User-agent: *
Disallow: /

This tells every bot - including all AI crawlers - that they cannot access any page on your site. It is often added to staging or development sites and accidentally left in place, or added to "protect" a site from spam bots without realising it blocks everything.

All major AI crawler user-agents

Crawler nameAI enginePurpose
GPTBotOpenAI / ChatGPTWeb retrieval for ChatGPT Browse
OAI-SearchBotOpenAISecondary OpenAI crawler
PerplexityBotPerplexity AIWeb retrieval for Perplexity search
ClaudeBotAnthropicWeb retrieval for Claude
anthropic-aiAnthropicAnthropic web crawler
cohere-aiCohereCohere retrieval
GooglebotGoogle AI OverviewsUsed for AI Overviews (same as standard Google)

The recommended robots.txt configuration

To allow all major AI crawlers while maintaining any existing rules for other bots:

# Allow all major AI search crawlers
User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: cohere-ai
Allow: /

# Your existing rules below
User-agent: *
Disallow: /wp-admin/
Disallow: /private/

Separating search retrieval from training data

Some website owners want to allow AI search retrieval (citations) while blocking training data collection. The rules differ by provider. For OpenAI specifically, GPTBot is used for live retrieval while other agents are used for training. Check each provider's published crawler documentation for the most current guidance, as policies evolve frequently.

How to test your configuration

  1. Visit yoursite.com/robots.txt and review the rules
  2. Use Google Search Console's robots.txt tester (it works for any user-agent, not just Googlebot)
  3. After making changes, wait 24 to 48 hours before re-testing, as crawlers cache robots.txt files
  4. Run a SearchScore audit to verify AI citability signals are now passing

Check your AI search visibility

Free audit. Instant results. No sign-up required.

Get My Free GEO Score →

More in this series

Back to pillar

S

GEO Research & Analysis

The SearchScore editorial team researches and writes about generative engine optimisation, AI search visibility and the signals that determine whether your website gets cited by ChatGPT, Perplexity and Google AI Overviews.

Sources & Further Reading

Frequently Asked Questions

What are the main AI crawler user-agent names?

The main AI crawler user-agents are: GPTBot (OpenAI/ChatGPT), PerplexityBot (Perplexity AI), ClaudeBot (Anthropic/Claude), anthropic-ai (Anthropic), Googlebot (Google AI Overviews), cohere-ai (Cohere), and OAI-SearchBot (also used by OpenAI).

How do I check if my robots.txt is blocking AI crawlers?

Visit yoursite.com/robots.txt and look for any User-agent: * rule with Disallow: / - this blocks all crawlers including AI bots. Also check for explicit blocks on GPTBot, PerplexityBot or ClaudeBot. Google Search Console's robots.txt tester can help verify specific crawler access.

Should I allow AI crawlers for training data collection?

This is your choice. You can allow AI search retrieval (so the bots can cite you in answers) while blocking training data collection. OpenAI provides separate user-agents for these two purposes. Check each AI company's documentation for their specific crawler names and policies.

Check your AI visibility

Enter your URL at SearchScore for a free AI visibility score out of 100. See how ChatGPT, Perplexity and Google AI see your site - and exactly what to fix.