7.2% adoption
AI Discovery File Adoption Research
Measuring how the world's top websites prepare for AI systems. Based on a crawl of 1,995 domains.
Summary
This Q2 2026 report analyses AI Discovery File adoption across 1,905 of the web's most prominent domains. 7.2% of domains have at least one AI Discovery File, 85.0% have no AI-specific crawler policy in their robots.txt, and the average AI readiness tier is 2.2 out of 5.0. Data is collected quarterly using the methodology described in our full methodology documentation.
AI Discovery Files are a set of 10 standardised root-level files — including llms.txt, ai.txt, ai.json, identity.json, and brand.txt — that help AI systems such as ChatGPT, Claude, and Gemini discover, interpret, and correctly represent a website. This research tracks their real-world adoption among the domains most likely to be referenced by AI systems when answering user questions.
Key Findings
The Q2 2026 crawl shows meaningful growth across every dimension of AI readiness. ADF adoption rose from 6.5% to 7.2%, but the real story is in the infrastructure: 445 fewer domains errored this quarter, meaning our sample now covers 1,905 of 1,995 domains — a far more complete picture. Selective AI crawler blocking jumped 2 percentage points as more organisations make deliberate decisions about AI access, and the number of domains with any AI policy in robots.txt is growing faster than ADF file adoption itself.
- llms.txt adoption accelerating — now the clear leader llms.txt grew from 55 to 93 domains (+69%), reaching 4.9% adoption. Valid files nearly doubled from 35 to 61, and complete implementations rose from 35 to 64. While the invalid count also grew (20 to 32), the ratio of valid-to-found improved from 64% to 66% — quality is keeping pace with volume for the first time.
- Selective AI crawler blocking surges past 12% Domains blocking AI crawlers selectively jumped from 151 to 235 (+56%), now reaching 12.3% of crawled domains. Meanwhile, "no AI policy" dropped 2.5 percentage points to 85%. This suggests organisations are actively choosing which AI systems can access their content rather than ignoring the question. PerplexityBot saw the steepest blocking increase (+1.5pp), possibly reflecting growing awareness of AI search agents.
- Crawl reliability transformed — error rate dropped from 27% to 5% Only 90 domains errored in Q2 versus 535 in Q1, bringing the success rate from 73% to 95%. This is an infrastructure improvement in the crawler itself, but it has a significant analytical impact: many previously unmeasured domains are now included, making the adoption and blocking figures more representative of actual web behaviour.
- Tier 3 (Partially Ready) grows while Passive shrinks The share of Passive domains (Tier 2) dropped from 76.5% to 73.7%, with most movement going to Partially Ready (Tier 3, up from 19.5% to 21.7%). AI-Ready (Tier 4) edged up from 22 to 33 domains. Tier 5 remains at zero — no domain yet combines 3+ valid ADFs, explicit AI crawler permission, and Schema.org markup.
- Major brands join the top adopters — NVIDIA, Dell, ASUS enter the list Nine new domains entered the top 20 adopters, including NVIDIA, Dell, ASUS, Datadog, and Cloudinary. Nine dropped out, including Stripe, Shopify, and SourceForge. The new entrants are predominantly enterprise technology companies, suggesting that AI readiness is becoming a priority in corporate web strategy rather than remaining a niche concern.
- Schema.org adoption jumps to nearly 30% Schema.org presence on homepages rose from 25.6% to 29.7% (+4.1pp, 374 to 566 domains). This is the largest single-metric jump in the dataset and reflects the broader trend of websites investing in machine-readable signals. Since Schema.org is a prerequisite for the highest readiness tiers, this growth creates a foundation for future ADF tier upgrades.
— AI Visibility Research, April 2026
Changes from Q1 2026
Quarter-over-quarter changes in key metrics between Q1 2026 and Q2 2026.
| File | Q1 2026 | Q2 2026 | Change |
|---|---|---|---|
| llms.txt | 3.8% | 4.9% | +1.1 |
| llms.html | 2.8% | 2.5% | -0.3 |
| ai.txt | 0.1% | 0.1% | 0.0 |
| ai.json | 0.1% | 0.1% | 0.0 |
| identity.json | 0.0% | 0.0% | 0.0 |
| brand.txt | 0.0% | 0.1% | +0.1 |
| faq-ai.txt | 0.0% | 0.1% | +0.1 |
| developer-ai.txt | 0.0% | 0.0% | 0.0 |
| robots-ai.txt | 0.1% | 0.1% | 0.0 |
New Top Adopters
- asus.com
- cloudinary.com
- datadoghq.com
- dell.com
- greenpeace.org.uk
- hobbycraft.co.uk
- hostgator.com.br
- nvidia.com
- plesk.com
No Longer in Top 20
- mainlinemenswear.co.uk
- qualtrics.com
- reed.co.uk
- scotrail.co.uk
- shopify.com
- singular.net
- smartsurvey.co.uk
- sourceforge.net
- stripe.com
ADF Adoption by File Type
How many of the top websites have each AI Discovery File — and how many of those files pass structural validation. Files are checked at their canonical root-level URL (e.g., example.com/llms.txt) and validated against the ADF specification.
View data table
| File | Found | Valid | Complete |
|---|---|---|---|
| llms.txt | 93 | 61 | 61 |
| llms.html | 47 | 10 | 3 |
| ai.txt | 2 | 0 | 0 |
| ai.json | 2 | 0 | 0 |
| identity.json | 0 | 0 | 0 |
| brand.txt | 1 | 0 | 0 |
| faq-ai.txt | 1 | 0 | 0 |
| developer-ai.txt | 0 | 0 | 0 |
| robots-ai.txt | 1 | 1 | 0 |
AI Crawler Access Policies
How websites use robots.txt to manage access for 15 known AI user agents — from OpenAI's GPTBot to Anthropic's ClaudeBot. Each domain is classified into one of five access policies based on its aggregate behaviour across all agents. The per-agent table below shows which AI crawlers are most frequently blocked.
| AI Crawler | Company | Purpose | Blocked | Blocked % | Allowed | Allowed % |
|---|---|---|---|---|---|---|
| CCBot | Common Crawl | Training | 201 | 10.6% | 8 | 0.4% |
| GPTBot | OpenAI | Training | 190 | 10.0% | 23 | 1.2% |
| ClaudeBot | Anthropic | Training | 191 | 10.0% | 13 | 0.7% |
| Bytespider | ByteDance | Training | 172 | 9.0% | 4 | 0.2% |
| meta-externalagent | Meta | Training | 165 | 8.7% | 1 | 0.1% |
| Applebot-Extended | Apple | Training | 159 | 8.3% | 5 | 0.3% |
| PerplexityBot | Perplexity | Search | 154 | 8.1% | 23 | 1.2% |
| Google-Extended | Training | 134 | 7.0% | 16 | 0.8% | |
| Diffbot | Diffbot | Extraction | 127 | 6.7% | 2 | 0.1% |
| cohere-ai | Cohere | Training | 120 | 6.3% | 1 | 0.1% |
| OAI-SearchBot | OpenAI | Search | 115 | 6.0% | 27 | 1.4% |
| Amazonbot | Amazon | Training | 113 | 5.9% | 6 | 0.3% |
| ChatGPT-User | OpenAI | Retrieval | 97 | 5.1% | 25 | 1.3% |
| Claude-User | Anthropic | Retrieval | 89 | 4.7% | 8 | 0.4% |
| FacebookBot | Meta | Preview | 84 | 4.4% | 4 | 0.2% |
View data table
| Policy | Domains | Percentage |
|---|---|---|
| Blocks All AI | 22 | 1.2% |
| Blocks Selectively | 235 | 12.3% |
| Rate-Limits AI | 5 | 0.3% |
| Explicitly Allows | 23 | 1.2% |
| No AI Policy | 1,620 | 85.0% |
File Quality Distribution
Among the ADF files that were found, how many meet the full specification versus providing only minimal content or containing errors. Quality is assessed using per-file structural checks — required fields must pass for a file to be considered valid; recommended fields distinguish "complete" from "minimal" implementations.
AI Readiness Tiers
Each domain receives a readiness tier from 0 (Unaware) to 5 (AI-Optimised) based on three inputs: valid ADF file count, AI crawler policy in robots.txt, and Schema.org presence on the homepage. The tier model is deterministic with no opaque weights — the full calculation logic is published.
View data table
| Tier | Domains | Percentage |
|---|---|---|
| Tier 5: AI-Optimised | 0 | 0.0% |
| Tier 4: AI-Ready | 33 | 1.7% |
| Tier 3: Partially Ready | 413 | 21.7% |
| Tier 2: Passive | 1,404 | 73.7% |
| Tier 1: Actively Blocking | 22 | 1.2% |
| Tier 0: Unaware | 33 | 1.7% |
ADF vs Other Web Standards
Comparing AI Discovery File adoption against established web standards. This contextualises where ADF adoption sits relative to conventions like robots.txt (RFC 9309), ads.txt (IAB Tech Lab), security.txt (RFC 9116), and humans.txt, all of which also require placing files at the domain root.
View data table
| Standard | Adoption |
|---|---|
| robots.txt | 52.1% |
| ads.txt | 16.7% |
| Schema.org | 29.7% |
| security.txt | 14.3% |
| humans.txt | 2.5% |
| Any ADF file | 7.2% |
Notable Adopters
The top 20 domains by AI readiness tier, showing which high-profile websites are leading ADF adoption. Readiness tiers are calculated using the combinatorial scoring model.
| Domain | Rank | Category | Files Found | Files Valid | Readiness |
|---|---|---|---|---|---|
| asus.com | 710 | Global Top 1,000 | 1 | 1 | AI-Ready |
| bmmagazine.co.uk | 739 | UK Top 1,000 | 1 | 1 | AI-Ready |
| classlink.com | 854 | Global Top 1,000 | 1 | 1 | AI-Ready |
| cloudinary.com | 725 | Global Top 1,000 | 1 | 1 | AI-Ready |
| datadoghq.com | 788 | Global Top 1,000 | 1 | 1 | AI-Ready |
| dell.com | 368 | Global Top 1,000 | 1 | 1 | AI-Ready |
| dynatrace.com | 546 | Global Top 1,000 | 1 | 1 | AI-Ready |
| energysavingtrust.org.uk | 489 | UK Top 1,000 | 1 | 1 | AI-Ready |
| english-heritage.org.uk | 227 | UK Top 1,000 | 1 | 1 | AI-Ready |
| greenpeace.org.uk | 724 | UK Top 1,000 | 1 | 1 | AI-Ready |
| hobbycraft.co.uk | 502 | UK Top 1,000 | 1 | 1 | AI-Ready |
| hostgator.com.br | 842 | Global Top 1,000 | 1 | 1 | AI-Ready |
| kingsfund.org.uk | 905 | UK Top 1,000 | 1 | 1 | AI-Ready |
| mailchimp.com | 694 | Global Top 1,000 | 1 | 1 | AI-Ready |
| netgear.com | 891 | Global Top 1,000 | 1 | 1 | AI-Ready |
| nvidia.com | 371 | Global Top 1,000 | 1 | 1 | AI-Ready |
| onetrust.com | 558 | Global Top 1,000 | 1 | 1 | AI-Ready |
| opera.com | 87 | Global Top 1,000 | 1 | 1 | AI-Ready |
| optimizely.com | 566 | Global Top 1,000 | 1 | 1 | AI-Ready |
| plesk.com | 390 | Global Top 1,000 | 1 | 1 | AI-Ready |
Download the Data
Raw datasets from this quarter's crawl, licensed under CC BY 4.0. Use them for your own research, analysis, or reporting. When citing, please reference the quarter (e.g., "Q2 2026") and link to the methodology.
Methodology
How We Collect This Data
Our crawler checks the top 1,000 global and top 1,000 UK domains (deduplicated to ~1,995) for all 10 AI Discovery Files, validates each against the specification, analyses robots.txt AI crawler policies across 15 known agents, and scores each domain's overall AI readiness using a deterministic tier model. The full methodology — including validation rules, soft 404 detection, redirect classification, and scoring logic — is published for transparency.