7 min read

How ChatGPT Decides Which Websites to Cite

When ChatGPT answers a question, it cites a handful of sources from across the web. Getting into that shortlist is not random. Here is exactly how the selection process works - and what your website needs to qualify.

How ChatGPT retrieves web content

ChatGPT has two modes. The base model generates responses from its training data alone. The Browse-enabled version (available in ChatGPT Plus and the API) actively retrieves live content from the web when a query benefits from current information.

When Browse is active, ChatGPT sends its crawler - GPTBot - to fetch relevant pages. It then reads those pages, extracts the most useful information, synthesises an answer, and cites the sources it used.

The selection of which pages to retrieve and which to cite involves multiple layers of evaluation.

The five factors ChatGPT uses to select citations

1. Accessibility - can GPTBot actually read your page?

The first filter is purely technical. If GPTBot is blocked in your robots.txt file, ChatGPT cannot access your content at all. Our analysis of 12,000 websites found that 73% block at least one major AI crawler - often by accident, through legacy User-agent: * rules that predate AI search.

Check your robots.txt: Visit yoursite.com/robots.txt and look for any rule that blocks GPTBot or uses User-agent: * with Disallow: /. Either will prevent ChatGPT from reading your site.

2. Relevance - does your page directly answer the query?

ChatGPT evaluates how well a page's content matches the user's specific question. Pages that directly address the query - with the answer stated clearly near the top - perform significantly better than pages that cover the topic tangentially.

This is why heading structure matters so much for GEO. A page with an H2 that reads "How does X work?" followed immediately by a clear answer is far more citable than a page where the same information is buried in the fifth paragraph of a section titled "Overview."

3. Content quality and structure

ChatGPT's retrieval system favours content that is factually accurate, well-structured and easy to parse. Specifically:

4. Credibility signals

ChatGPT evaluates the trustworthiness of sources. Websites with strong brand authority - consistent presence across the web, mentions in reputable publications, verified entity information - are treated as more credible sources than anonymous or thin websites.

Organisation schema, named authors with credentials, and external brand mentions all contribute to the credibility profile that influences citation likelihood.

5. Structured data and llms.txt

Schema markup gives ChatGPT explicit, machine-readable information about your content. A page with FAQPage schema, Article schema and Organisation schema provides clear structured signals that support citation. An llms.txt file at your domain root further helps ChatGPT understand what your site covers and which content is most authoritative.

What ChatGPT does not use

Several traditional SEO factors that matter for Google rankings have little or no direct effect on ChatGPT citation:

This means a newer website with excellent GEO signals can outperform an established site that has never optimised for AI search.

A practical checklist to get cited by ChatGPT

Check your AI search visibility

Free audit. Instant results. No sign-up required.

Get My Free GEO Score →

More in this series

Back to pillar

S

GEO Research & Analysis

The SearchScore editorial team researches and writes about generative engine optimisation, AI search visibility and the signals that determine whether your website gets cited by ChatGPT, Perplexity and Google AI Overviews.

Sources & Further Reading