What AI Engines Actually Evaluate When Deciding Whether to Cite Your Site
AI does not rank sites. It selects sources. The criteria are fundamentally different from Google's ranking algorithm. There are no positions on a results page. There is only cited or not cited. Understanding the six evaluation criteria AI engines use, and how they are weighted, is the difference between being a source and being invisible.
The 6 Evaluation Criteria: An Overview
AI engines evaluate every candidate source across six criteria before deciding whether to cite it. These criteria are not equally weighted. Discoverability is table stakes. Extractability is the differentiator. Authority is the tiebreaker. Here is how they break down:
- Discoverability - Can the AI find your content at all?
- Extractability - Can the AI extract a coherent, accurate passage?
- Authority - Does the AI trust your content enough to cite it?
- Recency - Is your content current enough to be relevant?
- Entity clarity - Does the AI know who you are and what you do?
- Uniqueness - Does your content add something other sources do not?
Each criterion acts as a filter. Fail discoverability and nothing else matters. Pass discoverability but fail extractability and you are in the knowledge base but never surfaced. Pass both but fail authority and you get retrieved but not cited. The cascade is ruthless.
1. Discoverability: Can the AI Find Your Content?
What it means: Your pages are accessible to AI crawlers, indexed in the AI's knowledge base and findable when a relevant query is processed.
How AI measures it: Crawler access logs, robots.txt compliance, HTTP response codes and presence in the AI's content index. If GPTBot gets a 403 or the page returns no content, discoverability is zero.
What most sites get wrong: Blocking AI crawlers accidentally through broad robots.txt rules, Cloudflare bot protection or WordPress security plugins. Over 40% of audited sites have at least one AI crawler blocked unintentionally.
How to fix it: Check robots.txt for GPTBot, PerplexityBot, ClaudeBot and Bytespider. Remove or override any Disallow rules. Verify key pages return HTTP 200 to crawler user agents. Create and submit an XML sitemap. Add an llms.txt file to your domain root to accelerate discovery.
2. Extractability: Can the AI Extract a Useful Passage?
What it means: Your content can be split into self-contained chunks that directly answer specific questions. This is the most important criterion and the one most sites fail.
How AI measures it: Chunk coherence (does the passage make sense on its own?), answer directness (does the first sentence answer the implied question?), semantic density (is the passage information-rich or padded with filler?) and heading alignment (do headings describe the chunk content accurately?).
What most sites get wrong: Opening sections with brand positioning instead of answers. Writing 300-word paragraphs that cover three topics. Using vague, abstract headings like "Our Approach" instead of question-format headings like "How Does AI Evaluate Sources?"
How to fix it: Restructure your top pages: put direct answers at the top of every section, use question-format headings, keep paragraphs to one idea each and cut preamble. This is the single highest-impact change you can make. Learn more about the retrieval pipeline that depends on extractability.
3. Authority: Does the AI Trust Your Content?
What it means: Your site demonstrates expertise, credibility and trustworthiness. AI engines use authority signals to decide which sources to cite when multiple candidates are equally extractable.
How AI measures it: Author attribution (real names with credentials), cited sources within content, external validation (backlinks, mentions on authoritative domains), professional presentation and consistent E-E-A-T signals (experience, expertise, authoritativeness, trustworthiness).
What most sites get wrong: Publishing content without bylines, making unsupported claims ("studies show" without citing the study), inconsistent branding across the web and no external validation from trusted sources.
How to fix it: Add author bios to every article. Cite specific sources with links. Build consistent entity signals across your web presence. Get mentioned or linked to by authoritative publications in your niche. Authority compounds over time, so start now.
4. Recency: Is Your Content Current?
What it means: Your content is recent enough to be relevant for the query. AI engines have a strong recency bias for topics where timeliness matters (technology, news, pricing) and a weaker one for evergreen topics.
How AI measures it: DatePublished in Article schema, last-modified headers, visible dates on the page and content freshness (does the page reference current tools, prices, regulations?).
What most sites get wrong: Publishing a blog post once and never touching it again. Pages from 2023 discussing "the latest" tools of 2023 get deprioritised in favour of pages updated in 2026. Missing or incorrect datePublished schema is also common.
How to fix it: Update your highest-traffic pages regularly. Add or correct datePublished in Article schema. For time-sensitive topics, include the current year in headings and content. A page that says "updated May 2026" outperforms an identical page dated 2024.
5. Entity Clarity: Does the AI Know Who You Are?
What it means: The AI can unambiguously identify your organisation, what it does and what makes it distinct. Entity clarity is how the AI knows that "SearchScore" refers to the AI visibility audit tool, not a music search app.
How AI measures it: Homepage description clarity, Organisation schema completeness, consistent NAP (name, address, phone) across the web, consistent branding on social profiles and directory listings, and a clear about page.
What most sites get wrong: Vague homepage copy ("Empowering tomorrow's solutions today"), missing Organisation schema, inconsistent company descriptions across LinkedIn, Google Business Profile and their own site, and no about page or a generic one.
How to fix it: Write a one-sentence description of your business that a stranger could understand. Put it on your homepage, in your Organisation schema, on your about page and on every public profile. Consistency is the variable. The same clear description everywhere builds entity confidence. For a structured approach, see our AI visibility audit guide.
6. Uniqueness: Does Your Content Add Something New?
What it means: Your content offers information, perspective or data that other sources do not. AI engines prefer to cite diverse sources rather than repeating the same information from similar pages.
How AI measures it: Semantic distinctiveness (does this content say something different from other retrieved chunks?), proprietary data (original research, statistics or findings) and unique perspective (expert opinion, case studies, firsthand experience).
What most sites get wrong: Publishing generic content that paraphrases what is already available elsewhere. "10 tips for better SEO" that are identical to every other "10 tips for better SEO" article. No original data, no unique perspective, no reason for the AI to prefer it over the dozens of identical alternatives.
How to fix it: Include proprietary data (even small surveys or internal analytics count). Write from direct experience. Offer a specific, differentiated take. Content that says "we tested this across 875,000 sites and found X" is uniquely citable. Content that says "experts agree that X is important" is not.
The Weight Hierarchy: Which Criteria Matter Most?
Not all six criteria carry equal weight. The hierarchy works like this:
- Discoverability is table stakes. If you fail here, you are out. No amount of authority or extractability compensates for being invisible to crawlers.
- Extractability is the differentiator. Among discoverable sites, extractability is the strongest predictor of citation. Content that is chunked, direct and structured for extraction wins over content that is vague, long-winded or badly organised, regardless of domain authority.
- Authority is the tiebreaker. When multiple sources have equally extractable content on the same topic, authority determines which gets cited. A smaller site with better extractability beats a larger site with worse extractability. But when extractability is equal, authority tips the scale.
- Recency, entity clarity and uniqueness are modifiers. They strengthen or weaken your position but rarely determine it alone. A site with excellent extractability and modest authority but strong recency and uniqueness can beat a site with stronger authority but weaker performance on those modifiers.
The Citation Threshold: Why Mediocre on All Criteria Loses
There is a citation threshold. AI engines do not cite the "best" source. They cite any source that exceeds the threshold for a given query. A source that is excellent on extractability and authority but mediocre on recency and entity clarity will exceed the threshold. A source that is mediocre on all six criteria will not.
This is why breadth of optimisation is less effective than depth on the right criteria. Spending 10 hours improving all six criteria a little bit is less effective than spending 10 hours making extractability excellent. The threshold model rewards specificity over uniformity.
The practical implication: fix discoverability first (it is binary and quick), then go deep on extractability (it is the strongest citation predictor), then strengthen authority (it is the tiebreaker). Recency, entity clarity and uniqueness are important but secondary.
Frequently Asked Questions
How does ChatGPT choose which sources to cite?
ChatGPT evaluates sources across six criteria: discoverability, extractability, authority, recency, entity clarity and uniqueness. Extractability is the most important differentiator. If your content is not structured for extraction, it will not be cited regardless of authority.
The evaluation is not a single score. It is a cascade. Discoverability is checked first. Content that is not discoverable is discarded. Then extractability is evaluated: can the AI extract a coherent, accurate passage? Then authority, recency, entity clarity and uniqueness are applied as modifiers. The sources that exceed the citation threshold on the weighted combination of these criteria get cited.
Does domain authority matter for AI citation?
Partially. Authority acts as a tiebreaker when multiple sources have equally extractable content. But a niche site with clean structure regularly gets cited over high-authority sites with poor extractability. Structure beats authority at the citation stage.
SearchScore's data shows near-zero correlation between domain authority and AI visibility for sites with excellent extractability. Authority matters most when extractability is equal. The priority order should be: fix extractability first, then build authority. Authority compounds over months. Extractability can be fixed in hours.
What is the single most important factor for getting cited by AI?
Extractability. Content structured with question headings, direct answers at the top of sections and clear one-idea-per-paragraph formatting is dramatically easier for AI systems to extract and cite. Fix extractability before anything else.
Extractability is the criterion most sites fail on and the one with the highest impact when fixed. Unlike authority (which takes months to build) or entity clarity (which requires cross-web consistency), extractability is a structural change you can make to existing content in a few hours. The return on that investment is immediate and measurable in your next AI visibility audit.
Next step: See how your site scores on all six criteria. Run a free audit at searchscore.io and get a prioritised fix list based on your actual evaluation profile.
Check your AI visibility
Free audit. Instant results. No sign-up required.