Q2 2026 Report

7.2% adoption

AI Discovery File Adoption Research

Measuring how the world's top websites prepare for AI systems. Based on a crawl of 1,995 domains.

1,905

Domains Crawled

of 1,995 total

7.2%

ADF Adoption

137 domains

2.2

Avg Readiness

out of 5.0

85.0%

No AI Policy

in robots.txt

You are viewing the archived Q2 2026 report. View latest report →

Summary

This Q2 2026 report analyses AI Discovery File adoption across 1,905 of the web's most prominent domains. 7.2% of domains have at least one AI Discovery File, 85.0% have no AI-specific crawler policy in their robots.txt, and the average AI readiness tier is 2.2 out of 5.0. Data is collected quarterly using the methodology described in our full methodology documentation.

AI Discovery Files are a set of 10 standardised root-level files — including llms.txt, ai.txt, ai.json, identity.json, and brand.txt — that help AI systems such as ChatGPT, Claude, and Gemini discover, interpret, and correctly represent a website. This research tracks their real-world adoption among the domains most likely to be referenced by AI systems when answering user questions.

Key Findings

The Q2 2026 crawl shows meaningful growth across every dimension of AI readiness. ADF adoption rose from 6.5% to 7.2%, but the real story is in the infrastructure: 445 fewer domains errored this quarter, meaning our sample now covers 1,905 of 1,995 domains — a far more complete picture. Selective AI crawler blocking jumped 2 percentage points as more organisations make deliberate decisions about AI access, and the number of domains with any AI policy in robots.txt is growing faster than ADF file adoption itself.

llms.txt adoption accelerating — now the clear leader llms.txt grew from 55 to 93 domains (+69%), reaching 4.9% adoption. Valid files nearly doubled from 35 to 61, and complete implementations rose from 35 to 64. While the invalid count also grew (20 to 32), the ratio of valid-to-found improved from 64% to 66% — quality is keeping pace with volume for the first time.
Selective AI crawler blocking surges past 12% Domains blocking AI crawlers selectively jumped from 151 to 235 (+56%), now reaching 12.3% of crawled domains. Meanwhile, "no AI policy" dropped 2.5 percentage points to 85%. This suggests organisations are actively choosing which AI systems can access their content rather than ignoring the question. PerplexityBot saw the steepest blocking increase (+1.5pp), possibly reflecting growing awareness of AI search agents.
Crawl reliability transformed — error rate dropped from 27% to 5% Only 90 domains errored in Q2 versus 535 in Q1, bringing the success rate from 73% to 95%. This is an infrastructure improvement in the crawler itself, but it has a significant analytical impact: many previously unmeasured domains are now included, making the adoption and blocking figures more representative of actual web behaviour.
Tier 3 (Partially Ready) grows while Passive shrinks The share of Passive domains (Tier 2) dropped from 76.5% to 73.7%, with most movement going to Partially Ready (Tier 3, up from 19.5% to 21.7%). AI-Ready (Tier 4) edged up from 22 to 33 domains. Tier 5 remains at zero — no domain yet combines 3+ valid ADFs, explicit AI crawler permission, and Schema.org markup.
Major brands join the top adopters — NVIDIA, Dell, ASUS enter the list Nine new domains entered the top 20 adopters, including NVIDIA, Dell, ASUS, Datadog, and Cloudinary. Nine dropped out, including Stripe, Shopify, and SourceForge. The new entrants are predominantly enterprise technology companies, suggesting that AI readiness is becoming a priority in corporate web strategy rather than remaining a niche concern.
Schema.org adoption jumps to nearly 30% Schema.org presence on homepages rose from 25.6% to 29.7% (+4.1pp, 374 to 566 domains). This is the largest single-metric jump in the dataset and reflects the broader trend of websites investing in machine-readable signals. Since Schema.org is a prerequisite for the highest readiness tiers, this growth creates a foundation for future ADF tier upgrades.

— AI Visibility Research, April 2026

Changes from Q1 2026

Quarter-over-quarter changes in key metrics between Q1 2026 and Q2 2026.

ADF Adoption

6.5% → 7.2%

+0.7

Avg Readiness Score

2.2 → 2.2

0.0

Domains Crawled

1,460 → 1,905

+445

No AI Policy

87.5% → 85.0%

-2.5pp

Per-file adoption change: Q1 2026 to Q2 2026
File	Q1 2026	Q2 2026	Change
llms.txt	3.8%	4.9%	+1.1
llms.html	2.8%	2.5%	-0.3
ai.txt	0.1%	0.1%	0.0
ai.json	0.1%	0.1%	0.0
identity.json	0.0%	0.0%	0.0
brand.txt	0.0%	0.1%	+0.1
faq-ai.txt	0.0%	0.1%	+0.1
developer-ai.txt	0.0%	0.0%	0.0
robots-ai.txt	0.1%	0.1%	0.0

New Top Adopters

asus.com
cloudinary.com
datadoghq.com
dell.com
greenpeace.org.uk
hobbycraft.co.uk
hostgator.com.br
nvidia.com
plesk.com

No Longer in Top 20

mainlinemenswear.co.uk
qualtrics.com
reed.co.uk
scotrail.co.uk
shopify.com
singular.net
smartsurvey.co.uk
sourceforge.net
stripe.com

ADF Adoption by File Type

How many of the top websites have each AI Discovery File — and how many of those files pass structural validation. Files are checked at their canonical root-level URL (e.g., example.com/llms.txt) and validated against the ADF specification.

View data table

AI Discovery File adoption across 1,905 domains
File	Found	Valid	Complete
llms.txt	93	61	61
llms.html	47	10	3
ai.txt	2	0	0
ai.json	2	0	0
identity.json	0	0	0
brand.txt	1	0	0
faq-ai.txt	1	0	0
developer-ai.txt	0	0	0
robots-ai.txt	1	1	0

AI Crawler Access Policies

How websites use robots.txt to manage access for 15 known AI user agents — from OpenAI's GPTBot to Anthropic's ClaudeBot. Each domain is classified into one of five access policies based on its aggregate behaviour across all agents. The per-agent table below shows which AI crawlers are most frequently blocked.

AI Crawler	Company	Purpose	Blocked	Blocked %	Allowed	Allowed %
CCBot	Common Crawl	Training	201	10.6%	8	0.4%
GPTBot	OpenAI	Training	190	10.0%	23	1.2%
ClaudeBot	Anthropic	Training	191	10.0%	13	0.7%
Bytespider	ByteDance	Training	172	9.0%	4	0.2%
meta-externalagent	Meta	Training	165	8.7%	1	0.1%
Applebot-Extended	Apple	Training	159	8.3%	5	0.3%
PerplexityBot	Perplexity	Search	154	8.1%	23	1.2%
Google-Extended	Google	Training	134	7.0%	16	0.8%
Diffbot	Diffbot	Extraction	127	6.7%	2	0.1%
cohere-ai	Cohere	Training	120	6.3%	1	0.1%
OAI-SearchBot	OpenAI	Search	115	6.0%	27	1.4%
Amazonbot	Amazon	Training	113	5.9%	6	0.3%
ChatGPT-User	OpenAI	Retrieval	97	5.1%	25	1.3%
Claude-User	Anthropic	Retrieval	89	4.7%	8	0.4%
FacebookBot	Meta	Preview	84	4.4%	4	0.2%

View data table

AI crawler access policy distribution in robots.txt
Policy	Domains	Percentage
Blocks All AI	22	1.2%
Blocks Selectively	235	12.3%
Rate-Limits AI	5	0.3%
Explicitly Allows	23	1.2%
No AI Policy	1,620	85.0%

File Quality Distribution

Among the ADF files that were found, how many meet the full specification versus providing only minimal content or containing errors. Quality is assessed using per-file structural checks — required fields must pass for a file to be considered valid; recommended fields distinguish "complete" from "minimal" implementations.

43.0%

51.0%

Complete 43.0% Minimal 6.0% Invalid 51.0%

AI Readiness Tiers

Each domain receives a readiness tier from 0 (Unaware) to 5 (AI-Optimised) based on three inputs: valid ADF file count, AI crawler policy in robots.txt, and Schema.org presence on the homepage. The tier model is deterministic with no opaque weights — the full calculation logic is published.

View data table

AI readiness tier distribution (average score: 2.2 / 5.0)
Tier	Domains	Percentage
Tier 5: AI-Optimised	0	0.0%
Tier 4: AI-Ready	33	1.7%
Tier 3: Partially Ready	413	21.7%
Tier 2: Passive	1,404	73.7%
Tier 1: Actively Blocking	22	1.2%
Tier 0: Unaware	33	1.7%

ADF vs Other Web Standards

Comparing AI Discovery File adoption against established web standards. This contextualises where ADF adoption sits relative to conventions like robots.txt (RFC 9309), ads.txt (IAB Tech Lab), security.txt (RFC 9116), and humans.txt, all of which also require placing files at the domain root.

View data table

AI Discovery File adoption compared with established web standards
Standard	Adoption
robots.txt	52.1%
ads.txt	16.7%
Schema.org	29.7%
security.txt	14.3%
humans.txt	2.5%
Any ADF file	7.2%

Notable Adopters

The top 20 domains by AI readiness tier, showing which high-profile websites are leading ADF adoption. Readiness tiers are calculated using the combinatorial scoring model.

Domain	Rank	Category	Files Found	Files Valid	Readiness
asus.com	710	Global Top 1,000	1	1	AI-Ready
bmmagazine.co.uk	739	UK Top 1,000	1	1	AI-Ready
classlink.com	854	Global Top 1,000	1	1	AI-Ready
cloudinary.com	725	Global Top 1,000	1	1	AI-Ready
datadoghq.com	788	Global Top 1,000	1	1	AI-Ready
dell.com	368	Global Top 1,000	1	1	AI-Ready
dynatrace.com	546	Global Top 1,000	1	1	AI-Ready
energysavingtrust.org.uk	489	UK Top 1,000	1	1	AI-Ready
english-heritage.org.uk	227	UK Top 1,000	1	1	AI-Ready
greenpeace.org.uk	724	UK Top 1,000	1	1	AI-Ready
hobbycraft.co.uk	502	UK Top 1,000	1	1	AI-Ready
hostgator.com.br	842	Global Top 1,000	1	1	AI-Ready
kingsfund.org.uk	905	UK Top 1,000	1	1	AI-Ready
mailchimp.com	694	Global Top 1,000	1	1	AI-Ready
netgear.com	891	Global Top 1,000	1	1	AI-Ready
nvidia.com	371	Global Top 1,000	1	1	AI-Ready
onetrust.com	558	Global Top 1,000	1	1	AI-Ready
opera.com	87	Global Top 1,000	1	1	AI-Ready
optimizely.com	566	Global Top 1,000	1	1	AI-Ready
plesk.com	390	Global Top 1,000	1	1	AI-Ready

Download the Data

Raw datasets from this quarter's crawl, licensed under CC BY 4.0. Use them for your own research, analysis, or reporting. When citing, please reference the quarter (e.g., "Q2 2026") and link to the methodology.

CSV Summary Statistics Headline adoption rates, readiness scores, and crawler access breakdowns. CSV Readiness Scores Per-domain readiness tiers, ADF file quality scores, and schema.org presence. CSV Crawler Blocking Per-domain AI crawler blocking data for 15 major AI user agents.

Methodology

How We Collect This Data

Our crawler checks the top 1,000 global and top 1,000 UK domains (deduplicated to ~1,995) for all 10 AI Discovery Files, validates each against the specification, analyses robots.txt AI crawler policies across 15 known agents, and scores each domain's overall AI readiness using a deterministic tier model. The full methodology — including validation rules, soft 404 detection, redirect classification, and scoring logic — is published for transparency.

Full methodology

Other Reports

Q1 2026 Report