8 min read

AI Bot Permissions in robots.txt: The Complete Guide

Your robots.txt file controls which bots can access your website. If the major AI crawlers are blocked - even accidentally - you are invisible to AI search. This guide shows you exactly how to fix that.

Why this is the most important GEO fix

Of all the changes you can make to improve AI search visibility, fixing your robots.txt is the most urgent - because if AI crawlers are blocked, no other GEO work matters. A website with perfect schema markup and brilliant content is still completely invisible to AI search if GPTBot cannot get in the door.

Our analysis of 12,000 websites found that 73% block at least one major AI crawler. The vast majority do this accidentally - through legacy robots.txt rules written before AI search existed.

The most common accidental AI block

The most frequent culprit is a blanket disallow rule:

User-agent: *
Disallow: /

This tells every bot - including all AI crawlers - that they cannot access any page on your site. It is often added to staging or development sites and accidentally left in place, or added to "protect" a site from spam bots without realising it blocks everything.

All major AI crawler user-agents

Crawler nameAI enginePurpose
GPTBotOpenAI / ChatGPTWeb retrieval for ChatGPT Browse
OAI-SearchBotOpenAISecondary OpenAI crawler
PerplexityBotPerplexity AIWeb retrieval for Perplexity search
ClaudeBotAnthropicWeb retrieval for Claude
anthropic-aiAnthropicAnthropic web crawler
cohere-aiCohereCohere retrieval
GooglebotGoogle AI OverviewsUsed for AI Overviews (same as standard Google)

The recommended robots.txt configuration

To allow all major AI crawlers while maintaining any existing rules for other bots:

# Allow all major AI search crawlers
User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: cohere-ai
Allow: /

# Your existing rules below
User-agent: *
Disallow: /wp-admin/
Disallow: /private/

Separating search retrieval from training data

Some website owners want to allow AI search retrieval (citations) while blocking training data collection. The rules differ by provider. For OpenAI specifically, GPTBot is used for live retrieval while other agents are used for training. Check each provider's published crawler documentation for the most current guidance, as policies evolve frequently.

How to test your configuration

  1. Visit yoursite.com/robots.txt and review the rules
  2. Use Google Search Console's robots.txt tester (it works for any user-agent, not just Googlebot)
  3. After making changes, wait 24 to 48 hours before re-testing, as crawlers cache robots.txt files
  4. Run a SearchScore audit to verify AI citability signals are now passing

Check your AI search visibility

Free audit. Instant results. No sign-up required.

Get My Free GEO Score →

More in this series

Back to pillar

S

GEO Research & Analysis

The SearchScore editorial team researches and writes about generative engine optimisation, AI search visibility and the signals that determine whether your website gets cited by ChatGPT, Perplexity and Google AI Overviews.

Sources & Further Reading