Guide

How to Get Your Website Into AI Search Results

AI search traffic converts at 5x the rate of traditional search, but most websites are invisible to it. This guide covers how ChatGPT, Gemini, and Google AI Overviews find websites, why yours probably isn't showing up, and three practical steps to fix it.

How to Get Your Website Into AI Search Results

The shift is already here

ChatGPT now has over 900 million weekly active users. Google AI Overviews appear in roughly half of all search queries. Perplexity processes hundreds of millions of searches per month. AI isn't coming for traditional search. It's already absorbing it. For a breakdown of how these tools compare as consumer subscriptions, see which AI subscription is best for the average user.

For website owners, this creates a new problem. Your site might rank well on Google and still be completely invisible to the AI systems that are answering an increasing share of user questions. When someone asks ChatGPT "Who provides web design in Kettering?" or asks Gemini "What's the best CRM for small businesses?", the answer isn't a list of links. It's a direct, synthesised response that either includes your business or doesn't. We covered the small-business angle in detail in AI Visibility for Small Businesses: What It Actually Delivers, which uses the Lockerfella case as the worked example.

The numbers back this up. AI referral traffic grew 357% year-over-year in 2025. That traffic converts at five times the rate of traditional organic search. Visitors arriving from AI platforms spend 68% more time on site. The traffic is smaller than Google's, but it's higher quality and growing fast.

This guide covers what AI search engines actually need from your website, why most sites fail to provide it, and what you can do about it today.

How AI search engines find your website

Diagram showing how retrieval-augmented generation works: a user asks a question, the AI searches the web for relevant pages, retrieves and reads them, then generates an answer with citations
AI search engines don't just recall training data. They actively search the web, retrieve relevant pages, and synthesise answers with citations.

AI search engines don't work like Google. Understanding the difference is the first step to appearing in their results.

Traditional search engines crawl your site, index its pages, and rank them against competing pages for specific queries. The output is a list of links. You compete for position.

AI search engines use a process called retrieval-augmented generation (RAG). When a user asks a question, the AI system searches for relevant web pages in real time, retrieves and reads them, then generates a synthesised answer. The output isn't a list. It's a paragraph (or several) with citations linking back to the sources used. You don't compete for position. You compete for inclusion. We break this down stage-by-stage in the full retrieval pipeline explainer.

ChatGPT triggers a web search on an estimated 20-35% of its prompts, which works out to 500-875 million web-retrieval queries per day. That's about one in eleven of Google's daily search volume, and it's growing.

What matters in this model isn't keyword density or backlink profiles. What matters is whether your content is accessible, clearly structured, and machine-readable. The AI needs to be able to crawl your page, understand what your business does, and trust the information enough to cite it.

Three things determine whether your website makes the cut: access, clarity, and trust.

Why most websites are invisible to AI

Most websites aren't blocked from AI deliberately. They're invisible by default because the infrastructure that AI systems rely on simply isn't there.

Around 21% of the top 1,000 websites actively block GPTBot through robots.txt. Cloudflare now blocks AI crawlers by default on all new domains, affecting roughly 20% of the public web. Many websites block AI crawlers without realising it, through CDN rules, WAF settings, or overly broad robots.txt directives.

Even among sites that allow access, few provide the signals AI systems need. Our quarterly research shows that fewer than 7% of top websites have any AI Discovery Files. Only 41% of web pages include JSON-LD structured data. The gap between what AI systems need and what most websites provide is enormous.

The result? AI systems either skip your website entirely, or they piece together information from scattered, inconsistent sources and risk getting your business wrong. Both outcomes are preventable.

Step 1: Make sure AI can access your content

A clear, bright illustration of a robots.txt file on a screen with green checkmarks next to AI crawler names like GPTBot and ClaudeBot, showing they are allowed access
Before AI systems can cite your website, their crawlers need permission to read it.

Before any AI system can cite your website, its crawler needs to be able to read your pages. This is the most basic requirement and the one most often broken without anyone noticing.

Check your robots.txt. Open yourdomain.com/robots.txt and look for directives that mention AI user agents: GPTBot, ClaudeBot, PerplexityBot, Google-Extended, CCBot. If any are set to Disallow: /, those AI systems can't access your content at all. You can allow AI crawlers for search while still blocking them for training using the robots-ai.txt specification and ai.txt permission declarations.

Check your CDN and hosting settings. Cloudflare, Sucuri, and other CDN providers may block AI crawlers at the network level, before your robots.txt is even read. Review your firewall rules and bot management settings. Our technical visibility checklist walks through every common barrier and how to fix each one.

Check your server response. AI crawlers expect fast, clean HTML responses. Pages that rely heavily on client-side JavaScript rendering, return soft 404s, or require authentication will be skipped. Ensure your key pages return server-rendered HTML with proper status codes.

The cost of blocking AI crawlers is measurable. Publishers who block them have seen a 23% decline in monthly visits overall, not just from AI platforms. As AI-powered search features become embedded in traditional search results, blocking AI crawlers can hurt your visibility across the board.

Step 2: Tell AI systems who you are

Access alone isn't enough. Once an AI crawler can read your pages, it needs to understand who you are, what you do, and how to represent you accurately. Without clear signals, you're leaving that interpretation to chance.

"Write for humans, not for ranking systems, whether those systems are traditional search or LLM-powered experiences."

DS
Danny Sullivan
Public Search Liaison, Google

Sullivan's advice sounds simple, and that's what makes it easy to underestimate. "Write for humans" doesn't mean you can ignore machines. It means the best content for AI search is the same content that serves human readers well: clear, well-structured, and honest about who wrote it and why. What tripped me up the first time I heard this was the implication hiding underneath. If your content is good enough for humans but your site doesn't tell machines who you are, you're still invisible. The "writing" part is table stakes. The identity part is the gap most people miss.

Add Schema.org structured data. At minimum, your site should have Organization or LocalBusiness schema with your name, description, address, contact details, and sameAs links to your social profiles and directory listings. Research suggests that sites with proper schema markup are 2-3 times more likely to appear in AI-generated answers. Entity disambiguation schema (sameAs, knowsAbout) is particularly valuable because it helps AI systems confirm your identity against external sources.

Create AI Discovery Files. Schema.org describes page-level content. AI Discovery Files go further by declaring site-level identity, permissions, and brand terminology in formats designed specifically for AI consumption. Start with llms.txt, which gives AI systems a plain-text summary of your business. Then add identity.json for structured identity data and brand.txt for naming and terminology rules. The Quick Start guide prioritises which files to create first.

Review your existing content. AI systems that use RAG don't match keywords; they match meaning. A page about "affordable web design for small businesses" can be retrieved for a query like "who builds websites for startups on a budget?" even without those exact words. What matters is that your content clearly and completely describes what you offer, who you serve, and where you operate. If that information is vague, buried in marketing copy, or spread across dozens of pages without a clear summary, AI systems will struggle to extract it.

Step 3: Make your information trustworthy

A bright infographic showing trust signals for AI: consistent identity across website, structured data, and external directories all pointing to a central verified business profile
AI systems cross-reference multiple sources. Consistent identity across your website, structured data, and external profiles builds the trust needed for citation.

AI systems don't just find information. They evaluate whether to trust it. A page might answer the question perfectly, but if the AI can't verify the source, it may choose a less complete answer from a more trustworthy site instead.

Trust, in the context of AI search, comes from consistency and corroboration.

Internal consistency. Your business name, services, address, and contact details should say the same thing everywhere on your site: in the header, the footer, the About page, the Schema.org markup, and your AI Discovery Files. Contradictions between these sources create ambiguity, and AI systems handle ambiguity by reducing confidence in the source. The interoperability specification explains how AI Discovery Files relate to each other and how to avoid conflicts.

External corroboration. AI systems cross-reference your claims against third-party sources. If your website says you're a web design agency in Kettering, but Google Business Profile says you're in London, and your LinkedIn says you do "digital marketing," the AI can't resolve the contradiction. Keep your identity consistent across your website, Google Business Profile, LinkedIn company page, industry directories, and any other public profile. This is entity disambiguation in practice.

"Fewer than 1 in 100 runs produced the same list of brands, and fewer than 1 in 1,000 produced the same list in the same order."

Fishkin's research ran 2,961 prompts across ChatGPT, Claude, and Google AI, asking for brand recommendations across 12 categories. The lists changed almost every time. When I first saw these numbers, my gut reaction was discouragement: if AI recommendations are that volatile, what's the point of optimising? But the same research found that visibility percentage (how often a brand appears across repeated runs) is statistically meaningful. Some brands showed up almost every time. Others barely appeared at all. The randomness is in the ordering, not in who makes the list. Clear identity signals, external corroboration, and structured data are what separate the brands that reliably appear from those that don't.

Build external signals. Submit your site to relevant directories, including the AI Discovery Files Directory. Maintain accurate profiles on Google Business Profile, LinkedIn, and industry-specific platforms. Earn mentions and citations from credible third-party sources. These external references give AI systems additional data points to verify your identity against.

What doesn't work

A bright illustration showing common AI search myths crossed out with red X marks: keyword stuffing, prompt manipulation, and creating AI-only content, contrasted with green checkmarks for real solutions
Many popular "AI SEO" tactics are recycled from traditional SEO and don't address how AI systems actually work.

As AI search has grown, so has the volume of advice about how to "optimise" for it. Much of that advice is recycled from traditional SEO tactics that don't map to how AI systems actually work.

"Optimise your content for AI prompts." Some guides suggest writing content specifically designed to match common AI prompts. This is the AI equivalent of keyword stuffing. AI systems using RAG don't match prompts to pages; they match meaning. Write clearly about what you do and cover it thoroughly. That's it.

"Ask ChatGPT about your brand to see if you're visible." Testing prompts gives you a snapshot, not a strategy. Fishkin's research showed that AI recommendations are wildly inconsistent between runs. A single prompt test tells you almost nothing about your actual visibility. The definition of AI Visibility Checking draws a clear line between checking (validating your infrastructure) and tracking (monitoring AI outputs). Infrastructure is what you can control.

"Create AI-specific content pages." Building separate pages or Markdown versions of your content specifically for AI crawlers is fragile and unsustainable. These pages tend to go stale quickly and diverge from your actual content. The better approach is to make your existing pages machine-readable through structured data and AI Discovery Files, which sit alongside your content rather than duplicating it.

"Just focus on getting more backlinks." Backlinks matter for traditional search rankings, but AI systems using RAG prioritise content relevance and source trustworthiness over link authority. A page with zero backlinks but clear, accurate, well-structured information about a specific topic can be cited ahead of a high-authority page with vague or outdated content.

Check your AI visibility now

The steps in this guide are practical and measurable. You can verify each one. Check your robots.txt for AI crawler directives. Review your Schema.org markup with Google's Rich Results Test. Audit your AI Discovery Files for completeness and consistency. For a worked example of all three steps applied to a brand new site, see our case study of a three-week-old site that topped AI search.

Or, run a single check that covers all of it. The AI Visibility Checker analyses your site's AI Discovery Files, crawler access, identity consistency, and structural readiness, then gives you a deterministic score with specific recommendations. It takes under a minute and tells you exactly where you stand.

AI search isn't replacing traditional search overnight. But it is absorbing a growing share of how people find, evaluate, and choose businesses. The websites that prepare for both channels now will be the ones that appear in both sets of results later. AI Visibility and SEO aren't competing priorities. They're parallel investments in being found accurately, everywhere that matters.

Find out how AI search engines see your website

The AI Visibility Checker analyses your AI Discovery Files, crawler access, identity consistency, and structural readiness. Get a clear, actionable score in under a minute.

Check your AI visibility

Frequently asked questions

How does ChatGPT decide which websites to cite?

ChatGPT uses retrieval-augmented generation (RAG) to search the web in real time. When a user asks a question, the system searches for relevant pages, evaluates their content quality and structure, then synthesises an answer with citations. Pages that are accessible to AI crawlers, well-structured, and contain clear identity signals are more likely to be cited.

Do I need to create separate content for AI search engines?

No. Google's Danny Sullivan has confirmed that "SEO for AI is still SEO." Write for humans, structure your content clearly, and add machine-readable identity signals like AI Discovery Files and Schema.org markup. Creating separate bot-only content often leads to neglected or outdated material.

Will AI Discovery Files help my website appear in ChatGPT?

AI Discovery Files like llms.txt and identity.json give AI systems a clear, machine-readable source of truth about your business. While no single file guarantees inclusion, they reduce the chance of hallucination and increase the accuracy of how your business is represented. The full specification documents all 10 file types.

Is Schema.org markup enough for AI visibility?

Schema.org markup helps, but it's not enough on its own. Research suggests structured data can increase your chances of appearing in AI answers by 2-3x. But Schema.org describes page-level content. AI Discovery Files declare site-level identity, permissions, and brand terminology that Schema.org doesn't cover. You need both.

Does blocking AI crawlers protect or hurt my website?

Blocking AI crawlers has costs. Publishers who block AI crawlers have seen a 23% decline in monthly visits overall. Blocking prevents your business from appearing in AI answers, which now convert at 5x the rate of traditional search. Our blocking checklist helps you make an informed decision.

How do Google AI Overviews affect my website traffic?

AI Overviews now appear in roughly half of all Google searches. They correlate with a 58-61% drop in organic click-through rates. But brands that are cited inside AI Overviews earn 35% more organic clicks than those that aren't. The question isn't whether AI Overviews exist; it's whether your site is among the sources they cite.

How long does it take to become visible to AI search engines?

The foundational work can be done in a single afternoon. Creating an llms.txt file takes under an hour. Adding Schema.org markup and reviewing your crawler access settings takes a few hours more. If your site runs WordPress, the AI Discovery Files plugin automates most of the process.

Sources