Q2 2026 Report

7.2% adoption

AI Discovery File Adoption Research

Measuring how the world's top websites prepare for AI systems. Based on a crawl of 1,995 domains.

1,905
Domains Crawled
of 1,995 total
7.2%
ADF Adoption
137 domains
2.2
Avg Readiness
out of 5.0
85.0%
No AI Policy
in robots.txt
You are viewing the archived Q2 2026 report. View latest report →

Summary

This Q2 2026 report analyses AI Discovery File adoption across 1,905 of the web's most prominent domains. 7.2% of domains have at least one AI Discovery File, 85.0% have no AI-specific crawler policy in their robots.txt, and the average AI readiness tier is 2.2 out of 5.0. Data is collected quarterly using the methodology described in our full methodology documentation.

AI Discovery Files are a set of 10 standardised root-level files — including llms.txt, ai.txt, ai.json, identity.json, and brand.txt — that help AI systems such as ChatGPT, Claude, and Gemini discover, interpret, and correctly represent a website. This research tracks their real-world adoption among the domains most likely to be referenced by AI systems when answering user questions.

Key Findings

The Q2 2026 crawl shows meaningful growth across every dimension of AI readiness. ADF adoption rose from 6.5% to 7.2%, but the real story is in the infrastructure: 445 fewer domains errored this quarter, meaning our sample now covers 1,905 of 1,995 domains — a far more complete picture. Selective AI crawler blocking jumped 2 percentage points as more organisations make deliberate decisions about AI access, and the number of domains with any AI policy in robots.txt is growing faster than ADF file adoption itself.

  1. llms.txt adoption accelerating — now the clear leader llms.txt grew from 55 to 93 domains (+69%), reaching 4.9% adoption. Valid files nearly doubled from 35 to 61, and complete implementations rose from 35 to 64. While the invalid count also grew (20 to 32), the ratio of valid-to-found improved from 64% to 66% — quality is keeping pace with volume for the first time.
  2. Selective AI crawler blocking surges past 12% Domains blocking AI crawlers selectively jumped from 151 to 235 (+56%), now reaching 12.3% of crawled domains. Meanwhile, "no AI policy" dropped 2.5 percentage points to 85%. This suggests organisations are actively choosing which AI systems can access their content rather than ignoring the question. PerplexityBot saw the steepest blocking increase (+1.5pp), possibly reflecting growing awareness of AI search agents.
  3. Crawl reliability transformed — error rate dropped from 27% to 5% Only 90 domains errored in Q2 versus 535 in Q1, bringing the success rate from 73% to 95%. This is an infrastructure improvement in the crawler itself, but it has a significant analytical impact: many previously unmeasured domains are now included, making the adoption and blocking figures more representative of actual web behaviour.
  4. Tier 3 (Partially Ready) grows while Passive shrinks The share of Passive domains (Tier 2) dropped from 76.5% to 73.7%, with most movement going to Partially Ready (Tier 3, up from 19.5% to 21.7%). AI-Ready (Tier 4) edged up from 22 to 33 domains. Tier 5 remains at zero — no domain yet combines 3+ valid ADFs, explicit AI crawler permission, and Schema.org markup.
  5. Major brands join the top adopters — NVIDIA, Dell, ASUS enter the list Nine new domains entered the top 20 adopters, including NVIDIA, Dell, ASUS, Datadog, and Cloudinary. Nine dropped out, including Stripe, Shopify, and SourceForge. The new entrants are predominantly enterprise technology companies, suggesting that AI readiness is becoming a priority in corporate web strategy rather than remaining a niche concern.
  6. Schema.org adoption jumps to nearly 30% Schema.org presence on homepages rose from 25.6% to 29.7% (+4.1pp, 374 to 566 domains). This is the largest single-metric jump in the dataset and reflects the broader trend of websites investing in machine-readable signals. Since Schema.org is a prerequisite for the highest readiness tiers, this growth creates a foundation for future ADF tier upgrades.

— AI Visibility Research, April 2026

Changes from Q1 2026

Quarter-over-quarter changes in key metrics between Q1 2026 and Q2 2026.

ADF Adoption
6.5% 7.2%
+0.7
Avg Readiness Score
2.2 2.2
0.0
Domains Crawled
1,460 1,905
+445
No AI Policy
87.5% 85.0%
-2.5pp
Per-file adoption change: Q1 2026 to Q2 2026
File Q1 2026 Q2 2026 Change
llms.txt 3.8% 4.9% +1.1
llms.html 2.8% 2.5% -0.3
ai.txt 0.1% 0.1% 0.0
ai.json 0.1% 0.1% 0.0
identity.json 0.0% 0.0% 0.0
brand.txt 0.0% 0.1% +0.1
faq-ai.txt 0.0% 0.1% +0.1
developer-ai.txt 0.0% 0.0% 0.0
robots-ai.txt 0.1% 0.1% 0.0

New Top Adopters

  • asus.com
  • cloudinary.com
  • datadoghq.com
  • dell.com
  • greenpeace.org.uk
  • hobbycraft.co.uk
  • hostgator.com.br
  • nvidia.com
  • plesk.com

No Longer in Top 20

  • mainlinemenswear.co.uk
  • qualtrics.com
  • reed.co.uk
  • scotrail.co.uk
  • shopify.com
  • singular.net
  • smartsurvey.co.uk
  • sourceforge.net
  • stripe.com

ADF Adoption by File Type

How many of the top websites have each AI Discovery File — and how many of those files pass structural validation. Files are checked at their canonical root-level URL (e.g., example.com/llms.txt) and validated against the ADF specification.

View data table
AI Discovery File adoption across 1,905 domains
File Found Valid Complete
llms.txt 93 61 61
llms.html 47 10 3
ai.txt 2 0 0
ai.json 2 0 0
identity.json 0 0 0
brand.txt 1 0 0
faq-ai.txt 1 0 0
developer-ai.txt 0 0 0
robots-ai.txt 1 1 0

AI Crawler Access Policies

How websites use robots.txt to manage access for 15 known AI user agents — from OpenAI's GPTBot to Anthropic's ClaudeBot. Each domain is classified into one of five access policies based on its aggregate behaviour across all agents. The per-agent table below shows which AI crawlers are most frequently blocked.

AI Crawler Company Purpose Blocked Blocked % Allowed Allowed %
CCBot Common Crawl Training 201 10.6% 8 0.4%
GPTBot OpenAI Training 190 10.0% 23 1.2%
ClaudeBot Anthropic Training 191 10.0% 13 0.7%
Bytespider ByteDance Training 172 9.0% 4 0.2%
meta-externalagent Meta Training 165 8.7% 1 0.1%
Applebot-Extended Apple Training 159 8.3% 5 0.3%
PerplexityBot Perplexity Search 154 8.1% 23 1.2%
Google-Extended Google Training 134 7.0% 16 0.8%
Diffbot Diffbot Extraction 127 6.7% 2 0.1%
cohere-ai Cohere Training 120 6.3% 1 0.1%
OAI-SearchBot OpenAI Search 115 6.0% 27 1.4%
Amazonbot Amazon Training 113 5.9% 6 0.3%
ChatGPT-User OpenAI Retrieval 97 5.1% 25 1.3%
Claude-User Anthropic Retrieval 89 4.7% 8 0.4%
FacebookBot Meta Preview 84 4.4% 4 0.2%
View data table
AI crawler access policy distribution in robots.txt
Policy Domains Percentage
Blocks All AI 22 1.2%
Blocks Selectively 235 12.3%
Rate-Limits AI 5 0.3%
Explicitly Allows 23 1.2%
No AI Policy 1,620 85.0%

File Quality Distribution

Among the ADF files that were found, how many meet the full specification versus providing only minimal content or containing errors. Quality is assessed using per-file structural checks — required fields must pass for a file to be considered valid; recommended fields distinguish "complete" from "minimal" implementations.

43.0%
51.0%
Complete 43.0% Minimal 6.0% Invalid 51.0%

AI Readiness Tiers

Each domain receives a readiness tier from 0 (Unaware) to 5 (AI-Optimised) based on three inputs: valid ADF file count, AI crawler policy in robots.txt, and Schema.org presence on the homepage. The tier model is deterministic with no opaque weights — the full calculation logic is published.

View data table
AI readiness tier distribution (average score: 2.2 / 5.0)
Tier Domains Percentage
Tier 5: AI-Optimised 0 0.0%
Tier 4: AI-Ready 33 1.7%
Tier 3: Partially Ready 413 21.7%
Tier 2: Passive 1,404 73.7%
Tier 1: Actively Blocking 22 1.2%
Tier 0: Unaware 33 1.7%

ADF vs Other Web Standards

Comparing AI Discovery File adoption against established web standards. This contextualises where ADF adoption sits relative to conventions like robots.txt (RFC 9309), ads.txt (IAB Tech Lab), security.txt (RFC 9116), and humans.txt, all of which also require placing files at the domain root.

View data table
AI Discovery File adoption compared with established web standards
Standard Adoption
robots.txt 52.1%
ads.txt 16.7%
Schema.org 29.7%
security.txt 14.3%
humans.txt 2.5%
Any ADF file 7.2%

Notable Adopters

The top 20 domains by AI readiness tier, showing which high-profile websites are leading ADF adoption. Readiness tiers are calculated using the combinatorial scoring model.

Domain Rank Category Files Found Files Valid Readiness
asus.com 710 Global Top 1,000 1 1 AI-Ready
bmmagazine.co.uk 739 UK Top 1,000 1 1 AI-Ready
classlink.com 854 Global Top 1,000 1 1 AI-Ready
cloudinary.com 725 Global Top 1,000 1 1 AI-Ready
datadoghq.com 788 Global Top 1,000 1 1 AI-Ready
dell.com 368 Global Top 1,000 1 1 AI-Ready
dynatrace.com 546 Global Top 1,000 1 1 AI-Ready
energysavingtrust.org.uk 489 UK Top 1,000 1 1 AI-Ready
english-heritage.org.uk 227 UK Top 1,000 1 1 AI-Ready
greenpeace.org.uk 724 UK Top 1,000 1 1 AI-Ready
hobbycraft.co.uk 502 UK Top 1,000 1 1 AI-Ready
hostgator.com.br 842 Global Top 1,000 1 1 AI-Ready
kingsfund.org.uk 905 UK Top 1,000 1 1 AI-Ready
mailchimp.com 694 Global Top 1,000 1 1 AI-Ready
netgear.com 891 Global Top 1,000 1 1 AI-Ready
nvidia.com 371 Global Top 1,000 1 1 AI-Ready
onetrust.com 558 Global Top 1,000 1 1 AI-Ready
opera.com 87 Global Top 1,000 1 1 AI-Ready
optimizely.com 566 Global Top 1,000 1 1 AI-Ready
plesk.com 390 Global Top 1,000 1 1 AI-Ready

Download the Data

Raw datasets from this quarter's crawl, licensed under CC BY 4.0. Use them for your own research, analysis, or reporting. When citing, please reference the quarter (e.g., "Q2 2026") and link to the methodology.

Methodology

How We Collect This Data

Our crawler checks the top 1,000 global and top 1,000 UK domains (deduplicated to ~1,995) for all 10 AI Discovery Files, validates each against the specification, analyses robots.txt AI crawler policies across 15 known agents, and scores each domain's overall AI readiness using a deterministic tier model. The full methodology — including validation rules, soft 404 detection, redirect classification, and scoring logic — is published for transparency.

Full methodology

Other Reports