9.4% adoption
AI Discovery File Adoption Research
Measuring how the world's top websites prepare for AI systems. Based on a crawl of 1,995 domains.
Summary
This Q3 2026 report analyses AI Discovery File adoption across 1,744 of the web's most prominent domains. 9.4% of domains have at least one AI Discovery File, 84.2% have no AI-specific crawler policy in their robots.txt, and the average AI readiness tier is 2.2 out of 5.0. Data is collected quarterly using the methodology described in our full methodology documentation.
AI Discovery Files are a set of 10 standardised root-level files — including llms.txt, ai.txt, ai.json, identity.json, and brand.txt — that help AI systems such as ChatGPT, Claude, and Gemini discover, interpret, and correctly represent a website. This research tracks their real-world adoption among the domains most likely to be referenced by AI systems when answering user questions.
Key Findings
The Q3 2026 crawl continues the upward trend in AI Discovery File adoption, which reached 9.4% of successfully crawled domains, up from 7.2% in Q2. The gain is driven almost entirely by llms.txt, now on 122 domains with 82 valid, complete implementations. The number of AI-Ready sites (Tier 4) grew from 33 to 44, and more organisations are explicitly permitting AI crawlers rather than staying silent. One caveat matters for reading these figures: this quarter 251 domains errored versus 90 in Q2, so the successfully crawled base fell to 1,744. Percentages are computed on that smaller sample, which softens some quarter-over-quarter comparisons that depend on the denominator.
- llms.txt adoption keeps climbing and now carries the whole category llms.txt grew from 93 to 122 domains (+31%), reaching 7% of the crawled sample. Valid, complete files rose from 61 to 82, and the wider quality overview shows complete files across all domains up from 64 to 85. This one file now accounts for the overwhelming majority of all AI Discovery File adoption. Every other file type stayed at or near zero, so the growth story and the concentration risk are the same story.
- AI-Ready sites grow by a third as Cloudflare and Adobe join the list Tier 4 (AI-Ready) domains rose from 33 to 44 (+33%). The top adopters now include Cloudflare (global rank 7) and Adobe (rank 68), alongside Bluehost, Dyson, Fox News, Klaviyo, and Groupon. These are high-traffic, infrastructure-scale properties, which matters because their choices tend to set defaults that smaller sites copy. Tier 5 (AI-Optimised) still stands at zero: no domain yet pairs three or more valid files with explicit crawler permission and Schema.org markup.
- More sites explicitly allow AI crawlers instead of staying silent Domains that explicitly permit AI crawlers rose from 23 to 31, lifting the explicit-allow share from 1.2% to 1.8%. Sites with no AI policy at all eased from 85% to 84.2%. Outright blocking barely moved (blocks-all 22 to 23). The shift is small in absolute terms but it points one way: where organisations are making a deliberate choice, more of them are choosing to opt in to AI access rather than block it.
- Crawl coverage dipped this quarter, which flatters some declines Only 1,744 of 1,995 domains were successfully crawled this quarter, down from 1,905 in Q2, because 251 domains errored versus 90. That matters when reading softer numbers: Schema.org presence fell from 29.7% to 26.3% and Partially Ready (Tier 3) dropped from 21.7% to 18.5%, but both are partly a function of the smaller, slightly different sample rather than a genuine retreat. The average readiness score held flat at 2.2 out of 5.0. We flag this openly rather than presenting the declines as behavioural change we cannot prove.
- Adoption is still one file deep Of the ten AI Discovery File types we track, only llms.txt shows meaningful adoption. ai.json, identity.json, brand.txt, faq-ai.txt, and developer-ai.txt sit at zero to two domains each, and llms.html actually slipped slightly from 47 to 46 found. Publishers are adopting the single best-known file and stopping there, which leaves identity, brand, and FAQ signals largely unaddressed across the web.
- Zero sites reference a formal specification All 164 domains with an AI Discovery File were published without any reference to a formal specification, exactly as in Q2. Adoption continues to run ahead of standardisation: publishers are writing these files by hand or from ad hoc templates, with no shared schema or version to validate against. That gap is the clearest opening for a canonical, machine-checkable standard to become the reference point.
— AI Visibility Research, July 2026
Changes from Q2 2026
Quarter-over-quarter changes in key metrics between Q2 2026 and Q3 2026.
| File | Q2 2026 | Q3 2026 | Change |
|---|---|---|---|
| llms.txt | 4.9% | 7.0% | +2.1 |
| llms.html | 2.5% | 2.6% | +0.1 |
| ai.txt | 0.1% | 0.1% | 0.0 |
| ai.json | 0.1% | 0.1% | 0.0 |
| identity.json | 0.0% | 0.0% | 0.0 |
| brand.txt | 0.1% | 0.0% | -0.1 |
| faq-ai.txt | 0.1% | 0.0% | -0.1 |
| developer-ai.txt | 0.0% | 0.0% | 0.0 |
| robots-ai.txt | 0.1% | 0.1% | 0.0 |
New Top Adopters
- adobe.com
- bluehost.com
- bunkbedsstore.uk
- cloudflare.com
- dyson.co.uk
- foxnews.com
- frontiersin.org
- groupon.co.uk
- gumgum.com
- klaviyo.com
- life360.com
No Longer in Top 20
- dell.com
- english-heritage.org.uk
- greenpeace.org.uk
- hobbycraft.co.uk
- mailchimp.com
- netgear.com
- nvidia.com
- onetrust.com
- opera.com
- optimizely.com
- plesk.com
ADF Adoption by File Type
How many of the top websites have each AI Discovery File — and how many of those files pass structural validation. Files are checked at their canonical root-level URL (e.g., example.com/llms.txt) and validated against the ADF specification.
View data table
| File | Found | Valid | Complete |
|---|---|---|---|
| llms.txt | 122 | 82 | 82 |
| llms.html | 46 | 9 | 3 |
| ai.txt | 1 | 0 | 0 |
| ai.json | 2 | 0 | 0 |
| identity.json | 0 | 0 | 0 |
| brand.txt | 0 | 0 | 0 |
| faq-ai.txt | 0 | 0 | 0 |
| developer-ai.txt | 0 | 0 | 0 |
| robots-ai.txt | 1 | 1 | 0 |
AI Crawler Access Policies
How websites use robots.txt to manage access for 15 known AI user agents — from OpenAI's GPTBot to Anthropic's ClaudeBot. Each domain is classified into one of five access policies based on its aggregate behaviour across all agents. The per-agent table below shows which AI crawlers are most frequently blocked.
| AI Crawler | Company | Purpose | Blocked | Blocked % | Allowed | Allowed % |
|---|---|---|---|---|---|---|
| CCBot | Common Crawl | Training | 186 | 10.7% | 14 | 0.8% |
| ClaudeBot | Anthropic | Training | 174 | 10.0% | 20 | 1.1% |
| GPTBot | OpenAI | Training | 173 | 9.9% | 29 | 1.7% |
| Bytespider | ByteDance | Training | 167 | 9.6% | 6 | 0.3% |
| meta-externalagent | Meta | Training | 151 | 8.7% | 4 | 0.2% |
| Applebot-Extended | Apple | Training | 146 | 8.4% | 10 | 0.6% |
| PerplexityBot | Perplexity | Search | 134 | 7.7% | 30 | 1.7% |
| Google-Extended | Training | 125 | 7.2% | 22 | 1.3% | |
| Diffbot | Diffbot | Extraction | 114 | 6.5% | 2 | 0.1% |
| cohere-ai | Cohere | Training | 108 | 6.2% | 3 | 0.2% |
| Amazonbot | Amazon | Training | 105 | 6.0% | 10 | 0.6% |
| OAI-SearchBot | OpenAI | Search | 100 | 5.7% | 29 | 1.7% |
| ChatGPT-User | OpenAI | Retrieval | 83 | 4.8% | 31 | 1.8% |
| Claude-User | Anthropic | Retrieval | 75 | 4.3% | 12 | 0.7% |
| FacebookBot | Meta | Preview | 75 | 4.3% | 3 | 0.2% |
View data table
| Policy | Domains | Percentage |
|---|---|---|
| Blocks All AI | 23 | 1.3% |
| Blocks Selectively | 217 | 12.4% |
| Rate-Limits AI | 4 | 0.2% |
| Explicitly Allows | 31 | 1.8% |
| No AI Policy | 1,469 | 84.2% |
File Quality Distribution
Among the ADF files that were found, how many meet the full specification versus providing only minimal content or containing errors. Quality is assessed using per-file structural checks — required fields must pass for a file to be considered valid; recommended fields distinguish "complete" from "minimal" implementations.
AI Readiness Tiers
Each domain receives a readiness tier from 0 (Unaware) to 5 (AI-Optimised) based on three inputs: valid ADF file count, AI crawler policy in robots.txt, and Schema.org presence on the homepage. The tier model is deterministic with no opaque weights — the full calculation logic is published.
View data table
| Tier | Domains | Percentage |
|---|---|---|
| Tier 5: AI-Optimised | 0 | 0.0% |
| Tier 4: AI-Ready | 44 | 2.5% |
| Tier 3: Partially Ready | 322 | 18.5% |
| Tier 2: Passive | 1,316 | 75.5% |
| Tier 1: Actively Blocking | 23 | 1.3% |
| Tier 0: Unaware | 39 | 2.2% |
ADF vs Other Web Standards
Comparing AI Discovery File adoption against established web standards. This contextualises where ADF adoption sits relative to conventions like robots.txt (RFC 9309), ads.txt (IAB Tech Lab), security.txt (RFC 9116), and humans.txt, all of which also require placing files at the domain root.
View data table
| Standard | Adoption |
|---|---|
| robots.txt | 50.2% |
| ads.txt | 16.1% |
| Schema.org | 26.3% |
| security.txt | 14.6% |
| humans.txt | 2.3% |
| Any ADF file | 9.4% |
Notable Adopters
The top 20 domains by AI readiness tier, showing which high-profile websites are leading ADF adoption. Readiness tiers are calculated using the combinatorial scoring model.
| Domain | Rank | Category | Files Found | Files Valid | Readiness |
|---|---|---|---|---|---|
| adobe.com | 68 | Global Top 1,000 | 1 | 1 | AI-Ready |
| asus.com | 710 | Global Top 1,000 | 1 | 1 | AI-Ready |
| bluehost.com | 666 | Global Top 1,000 | 2 | 1 | AI-Ready |
| bmmagazine.co.uk | 739 | UK Top 1,000 | 1 | 1 | AI-Ready |
| bunkbedsstore.uk | 989 | UK Top 1,000 | 1 | 1 | AI-Ready |
| classlink.com | 854 | Global Top 1,000 | 1 | 1 | AI-Ready |
| cloudflare.com | 7 | Global Top 1,000 | 1 | 1 | AI-Ready |
| cloudinary.com | 725 | Global Top 1,000 | 1 | 1 | AI-Ready |
| datadoghq.com | 788 | Global Top 1,000 | 1 | 1 | AI-Ready |
| dynatrace.com | 546 | Global Top 1,000 | 1 | 1 | AI-Ready |
| dyson.co.uk | 462 | UK Top 1,000 | 1 | 1 | AI-Ready |
| energysavingtrust.org.uk | 489 | UK Top 1,000 | 1 | 1 | AI-Ready |
| foxnews.com | 450 | Global Top 1,000 | 1 | 1 | AI-Ready |
| frontiersin.org | 917 | Global Top 1,000 | 1 | 1 | AI-Ready |
| groupon.co.uk | 553 | UK Top 1,000 | 1 | 1 | AI-Ready |
| gumgum.com | 722 | Global Top 1,000 | 1 | 1 | AI-Ready |
| hostgator.com.br | 842 | Global Top 1,000 | 1 | 1 | AI-Ready |
| kingsfund.org.uk | 905 | UK Top 1,000 | 1 | 1 | AI-Ready |
| klaviyo.com | 675 | Global Top 1,000 | 1 | 1 | AI-Ready |
| life360.com | 753 | Global Top 1,000 | 1 | 1 | AI-Ready |
Download the Data
Raw datasets from this quarter's crawl, licensed under CC BY 4.0. Use them for your own research, analysis, or reporting. When citing, please reference the quarter (e.g., "Q3 2026") and link to the methodology.
Methodology
How We Collect This Data
Our crawler checks the top 1,000 global and top 1,000 UK domains (deduplicated to ~1,995) for all 10 AI Discovery Files, validates each against the specification, analyses robots.txt AI crawler policies across 15 known agents, and scores each domain's overall AI readiness using a deterministic tier model. The full methodology — including validation rules, soft 404 detection, redirect classification, and scoring logic — is published for transparency.