Technical GEO: How to Optimise Your Website for AI Search
Most GEO improvements are not about rewriting your content. They are technical changes - how your site is configured, what signals it sends to AI crawlers, and how well machines can understand its structure. This guide covers every technical change that materially improves AI search visibility.
In this guide
What to fix first
Not all technical GEO changes are equal. Our analysis of 12,000 websites found that three issues account for the majority of AI search invisibility - and two of them take under an hour to fix.
AI crawler permissions in robots.txt
Your robots.txt file tells web crawlers which parts of your site they can access. The problem is that most robots.txt files were written before AI search engines existed - and many contain blanket rules that accidentally block AI crawlers alongside spam bots.
The major AI crawlers and their user-agent names:
- GPTBot - OpenAI / ChatGPT
- PerplexityBot - Perplexity AI
- ClaudeBot - Anthropic / Claude
- Googlebot - Google AI Overviews (uses standard Googlebot)
- anthropic-ai - Anthropic web crawler
- cohere-ai - Cohere
Common mistake: Using User-agent: * with Disallow: / to block all bots will also block every AI crawler. This is the single most damaging GEO error and we see it on 73% of websites we audit.
To allow all major AI crawlers, add these lines to your robots.txt:
# Allow major AI crawlers
User-agent: GPTBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: anthropic-ai
Allow: /
User-agent: cohere-ai
Allow: /
If you need to block AI training data collection while allowing search, use more specific directives. OpenAI, Anthropic and others honour different bot names for training versus live retrieval.
Creating your llms.txt file
llms.txt is a plain text file placed at the root of your website (e.g. yoursite.com/llms.txt) that gives AI language models structured guidance about your site. Think of it as a sitemap for AI - not just where pages are, but what your site is, what your most important content covers, and how an AI should understand your brand.
The format is simple Markdown. A basic llms.txt looks like this:
# YourBrand
> One-line description of what your website/business does.
## About
[Brief description of who you are, what you do, and who you serve]
## Key pages
- [Home](https://yoursite.com/): Main landing page
- [About](https://yoursite.com/about/): Company background and team
- [Blog](https://yoursite.com/blog/): Articles and guides
## Key topics
This site covers [your main topic areas].
Our content is written by [credentials].
## Contact
[[email protected]]
Beyond the basics, you can also include a detailed llms-full.txt that contains the complete text of your most important pages - making it trivial for AI models to ingest your content without crawling your full site.
Quick win: Our data shows that 92% of websites have no llms.txt file at all. Simply creating one puts you ahead of almost all of your competitors from an AI search perspective.
Schema markup for AI citation
Schema.org markup is structured data embedded in your HTML that tells machines what your content means. Google has required it for rich results for years - but for GEO, it is even more important. AI engines use schema to verify facts, understand entities, attribute authorship and decide whether to cite your content.
Organisation schema
Organisation schema establishes your brand as a known entity. It should include your official name, URL, logo, contact details, social media profiles and, where applicable, your Wikipedia or Wikidata URL. This is the foundation of brand authority for AI citation.
{
"@context": "https://schema.org",
"@type": "Organization",
"name": "Your Company Name",
"url": "https://yoursite.com",
"logo": "https://yoursite.com/logo.png",
"sameAs": [
"https://twitter.com/yourhandle",
"https://linkedin.com/company/yourcompany",
"https://en.wikipedia.org/wiki/YourCompany"
]
}
Article schema
Every blog post and article should have Article schema with a named author, a datePublished, and a publisher reference. This is how AI engines attribute content to real, verified people - a critical E-E-A-T signal.
FAQPage schema
FAQPage schema is one of the most powerful GEO signals available. AI engines that synthesise answers frequently pull from structured Q&A content - and FAQ schema makes your Q&A pairs directly machine-readable. Add it to any page with questions and answers.
Structured data: the bigger picture
Beyond the core three schema types, consider adding structured data relevant to your business type:
- LocalBusiness - for businesses with a physical location
- Product - for ecommerce and software products
- Person - for author pages and personal brands
- HowTo - for instructional content (AI engines love step-by-step guides)
- BreadcrumbList - helps AI understand your site hierarchy
- WebSite with SearchAction - signals your site as a navigable entity
Implement schema as JSON-LD in the <head> of your pages. It is easier to maintain than inline microdata and is the format preferred by both Google and AI crawlers.
Platform and performance signals
AI crawlers face the same technical barriers as other bots. Slow load times, JavaScript-heavy rendering, broken pagination and inconsistent canonical URLs all reduce how effectively AI engines can parse your content.
- Core Web Vitals - fast LCP and low CLS improve crawl efficiency
- Semantic HTML - use proper heading hierarchy (H1 > H2 > H3), not divs styled to look like headings
- Alt text - all images labelled, helping AI understand visual content context
- Canonical tags - prevent AI engines from indexing duplicate content versions
- XML sitemap - ensure all important pages are discoverable
- HTTPS - a basic trust signal for all search engines, including AI
Technical GEO checklist
- ☐ GPTBot, ClaudeBot and PerplexityBot are not blocked in robots.txt
- ☐ llms.txt file exists at domain root
- ☐ llms.txt includes accurate site description, key pages and key topics
- ☐ Organisation schema implemented on homepage
- ☐ Article schema on all blog posts with named author
- ☐ FAQPage schema on key pages
- ☐ Person schema on author bio pages
- ☐ All schema validated with Google Rich Results Test
- ☐ Canonical tags on all pages
- ☐ XML sitemap submitted to Google Search Console
- ☐ HTTPS active across entire site
- ☐ No JavaScript rendering required to access main content
- ☐ H1 > H2 > H3 heading hierarchy consistent on all pages
- ☐ Image alt text complete
Run a free technical GEO audit
Find out which technical issues are hurting your AI search visibility - in seconds, for free.
Get My GEO Score →