6.5% adoption
AI Discovery File Adoption Research
Measuring how the world's top websites prepare for AI systems. Based on a crawl of 1,995 domains.
Summary
This Q1 2026 report analyses AI Discovery File adoption across 1,460 of the web's most prominent domains. 6.5% of domains have at least one AI Discovery File, 87.5% have no AI-specific crawler policy in their robots.txt, and the average AI readiness tier is 2.2 out of 5.0. Data is collected quarterly using the methodology described in our full methodology documentation.
AI Discovery Files are a set of 10 standardised root-level files — including llms.txt, ai.txt, ai.json, identity.json, and brand.txt — that help AI systems such as ChatGPT, Claude, and Gemini discover, interpret, and correctly represent a website. This research tracks their real-world adoption among the domains most likely to be referenced by AI systems when answering user questions.
Key Findings
The first quarterly crawl establishes a baseline for AI Discovery File adoption across the web's most prominent domains. At 6.5%, adoption is nascent but already outpacing several established web standards at equivalent points in their lifecycle. The more pressing finding is not how few files exist, but how many of those that do exist are broken: over half fail validation. The web is not ignoring AI — it simply has no infrastructure for it yet.
- Quality is the real gap, not awareness Of the 100 AI Discovery Files found across all domains, 56% are invalid — containing URL dumps, placeholder content, or malformed syntax. Only 36 files are complete and specification-compliant. Previous industry studies that concluded llms.txt had "zero measurable impact" never examined whether the files they found actually worked. Having a file and having it work are two different things.
- llms.txt leads adoption but carries the highest error rate 55 domains serve an llms.txt file (3.8% of those crawled), making it the most adopted AI Discovery File by a wide margin. However, 20 of those files are invalid — a 36% failure rate. The llms.html companion file fares worse: 41 found, but only 7 valid and just 1 complete. Early adoption is concentrating around the files AI systems are most likely to request, but without validation, many of these files are actively misleading.
- 87.5% of top websites have no AI crawler policy whatsoever The vast majority of websites have not yet addressed AI crawlers in their robots.txt. Of the 12.5% that have, most block selectively — targeting specific crawlers rather than setting a blanket policy. CCBot is the most frequently blocked agent (9.6%), followed by ClaudeBot (8.6%) and GPTBot (8.5%). Only 0.8% of sites explicitly allow AI access. The web has not decided how to handle AI crawlers; it has mostly not considered the question at all.
- Nobody has reached Tier 5 — the ceiling is still open Zero domains in the sample achieved "AI-Optimised" status. The highest tier reached is Tier 4 (AI-Ready), occupied by just 22 domains — 1.5% of the sample. Meanwhile, 76.5% sit at Tier 2 (Passive), meaning they have a robots.txt but no AI-specific signals. Reaching Tier 4 today places a business ahead of 98.5% of the web's top domains. The competitive advantage for early movers is extraordinary, precisely because the bar is still so low.
- Adoption is already outpacing comparable web standards At 6.5%, AI Discovery File adoption has already surpassed humans.txt (2.5%), a standard proposed in 2011. For context, robots.txt took roughly 15 years to reach 25% adoption; Schema.org took 8 years. ADF adoption appears to be on a significantly compressed timeline, likely driven by the commercial urgency of AI integration. If security.txt (12.7%, RFC published 2022) is any guide, ADF adoption could reach double digits within the next two to three quarters.
- Early adopters span industries and geographies The 22 Tier 4 domains are not clustered in a single sector. They include global technology platforms (Shopify, Stripe, Opera), developer infrastructure (SourceForge, Dynatrace, Optimizely), UK public services (ScotRail, English Heritage, Energy Saving Trust), recruitment (Reed), and SaaS platforms (Mailchimp, Qualtrics, OneTrust). AI readiness is not a tech-sector concern — it is a cross-industry infrastructure decision.
— AI Visibility Research, February 2026
ADF Adoption by File Type
How many of the top websites have each AI Discovery File — and how many of those files pass structural validation. Files are checked at their canonical root-level URL (e.g., example.com/llms.txt) and validated against the ADF specification.
View data table
| File | Found | Valid | Complete |
|---|---|---|---|
| llms.txt | 55 | 35 | 35 |
| llms.html | 41 | 7 | 1 |
| ai.txt | 1 | 0 | 0 |
| ai.json | 1 | 0 | 0 |
| identity.json | 0 | 0 | 0 |
| brand.txt | 0 | 0 | 0 |
| faq-ai.txt | 0 | 0 | 0 |
| developer-ai.txt | 0 | 0 | 0 |
| robots-ai.txt | 1 | 1 | 0 |
AI Crawler Access Policies
How websites use robots.txt to manage access for 15 known AI user agents — from OpenAI's GPTBot to Anthropic's ClaudeBot. Each domain is classified into one of five access policies based on its aggregate behaviour across all agents. The per-agent table below shows which AI crawlers are most frequently blocked.
| AI Crawler | Company | Purpose | Blocked | Blocked % | Allowed | Allowed % |
|---|---|---|---|---|---|---|
| CCBot | Common Crawl | Training | 140 | 9.6% | 5 | 0.3% |
| ClaudeBot | Anthropic | Training | 126 | 8.6% | 8 | 0.5% |
| GPTBot | OpenAI | Training | 124 | 8.5% | 13 | 0.9% |
| Bytespider | ByteDance | Training | 118 | 8.1% | 2 | 0.1% |
| Applebot-Extended | Apple | Training | 101 | 6.9% | 4 | 0.3% |
| meta-externalagent | Meta | Training | 98 | 6.7% | 1 | 0.1% |
| Google-Extended | Training | 96 | 6.6% | 10 | 0.7% | |
| PerplexityBot | Perplexity | Search | 96 | 6.6% | 13 | 0.9% |
| Diffbot | Diffbot | Extraction | 89 | 6.1% | 2 | 0.1% |
| cohere-ai | Cohere | Training | 87 | 6.0% | 0 | 0.0% |
| Amazonbot | Amazon | Training | 72 | 4.9% | 2 | 0.1% |
| ChatGPT-User | OpenAI | Retrieval | 70 | 4.8% | 13 | 0.9% |
| OAI-SearchBot | OpenAI | Search | 70 | 4.8% | 14 | 1.0% |
| FacebookBot | Meta | Preview | 67 | 4.6% | 3 | 0.2% |
| Claude-User | Anthropic | Retrieval | 48 | 3.3% | 3 | 0.2% |
View data table
| Policy | Domains | Percentage |
|---|---|---|
| Blocks All AI | 19 | 1.3% |
| Blocks Selectively | 151 | 10.3% |
| Rate-Limits AI | 2 | 0.1% |
| Explicitly Allows | 11 | 0.8% |
| No AI Policy | 1,277 | 87.5% |
File Quality Distribution
Among the ADF files that were found, how many meet the full specification versus providing only minimal content or containing errors. Quality is assessed using per-file structural checks — required fields must pass for a file to be considered valid; recommended fields distinguish "complete" from "minimal" implementations.
AI Readiness Tiers
Each domain receives a readiness tier from 0 (Unaware) to 5 (AI-Optimised) based on three inputs: valid ADF file count, AI crawler policy in robots.txt, and Schema.org presence on the homepage. The tier model is deterministic with no opaque weights — the full calculation logic is published.
View data table
| Tier | Domains | Percentage |
|---|---|---|
| Tier 5: AI-Optimised | 0 | 0.0% |
| Tier 4: AI-Ready | 22 | 1.5% |
| Tier 3: Partially Ready | 284 | 19.5% |
| Tier 2: Passive | 1,117 | 76.5% |
| Tier 1: Actively Blocking | 19 | 1.3% |
| Tier 0: Unaware | 18 | 1.2% |
ADF vs Other Web Standards
Comparing AI Discovery File adoption against established web standards. This contextualises where ADF adoption sits relative to conventions like robots.txt (RFC 9309), ads.txt (IAB Tech Lab), security.txt (RFC 9116), and humans.txt, all of which also require placing files at the domain root.
View data table
| Standard | Adoption |
|---|---|
| robots.txt | 45.3% |
| ads.txt | 15.3% |
| Schema.org | 25.6% |
| security.txt | 12.7% |
| humans.txt | 2.5% |
| Any ADF file | 6.5% |
Notable Adopters
The top 20 domains by AI readiness tier, showing which high-profile websites are leading ADF adoption. Readiness tiers are calculated using the combinatorial scoring model.
| Domain | Rank | Category | Files Found | Files Valid | Readiness |
|---|---|---|---|---|---|
| bmmagazine.co.uk | 739 | UK Top 1,000 | 1 | 1 | AI-Ready |
| classlink.com | 854 | Global Top 1,000 | 1 | 1 | AI-Ready |
| dynatrace.com | 546 | Global Top 1,000 | 1 | 1 | AI-Ready |
| energysavingtrust.org.uk | 489 | UK Top 1,000 | 1 | 1 | AI-Ready |
| english-heritage.org.uk | 227 | UK Top 1,000 | 1 | 1 | AI-Ready |
| kingsfund.org.uk | 905 | UK Top 1,000 | 1 | 1 | AI-Ready |
| mailchimp.com | 694 | Global Top 1,000 | 1 | 1 | AI-Ready |
| mainlinemenswear.co.uk | 951 | UK Top 1,000 | 1 | 1 | AI-Ready |
| netgear.com | 891 | Global Top 1,000 | 1 | 1 | AI-Ready |
| onetrust.com | 558 | Global Top 1,000 | 1 | 1 | AI-Ready |
| opera.com | 87 | Global Top 1,000 | 1 | 1 | AI-Ready |
| optimizely.com | 566 | Global Top 1,000 | 1 | 1 | AI-Ready |
| qualtrics.com | 855 | Global Top 1,000 | 1 | 1 | AI-Ready |
| reed.co.uk | 291 | UK Top 1,000 | 1 | 1 | AI-Ready |
| scotrail.co.uk | 957 | UK Top 1,000 | 1 | 1 | AI-Ready |
| shopify.com | 164 | Global Top 1,000 | 1 | 1 | AI-Ready |
| singular.net | 751 | Global Top 1,000 | 1 | 1 | AI-Ready |
| smartsurvey.co.uk | 625 | UK Top 1,000 | 1 | 1 | AI-Ready |
| sourceforge.net | 214 | Global Top 1,000 | 1 | 1 | AI-Ready |
| stripe.com | 261 | Global Top 1,000 | 1 | 1 | AI-Ready |
Download the Data
Raw datasets from this quarter's crawl, licensed under CC BY 4.0. Use them for your own research, analysis, or reporting. When citing, please reference the quarter (e.g., "Q1 2026") and link to the methodology.
Methodology
How We Collect This Data
Our crawler checks the top 1,000 global and top 1,000 UK domains (deduplicated to ~1,995) for all 10 AI Discovery Files, validates each against the specification, analyses robots.txt AI crawler policies across 15 known agents, and scores each domain's overall AI readiness using a deterministic tier model. The full methodology — including validation rules, soft 404 detection, redirect classification, and scoring logic — is published for transparency.