Q1 2026 Report

6.5% adoption

AI Discovery File Adoption Research

Measuring how the world's top websites prepare for AI systems. Based on a crawl of 1,995 domains.

1,460

Domains Crawled

of 1,995 total

6.5%

ADF Adoption

95 domains

2.2

Avg Readiness

out of 5.0

87.5%

No AI Policy

in robots.txt

You are viewing the archived Q1 2026 report. View latest report →

Summary

This Q1 2026 report analyses AI Discovery File adoption across 1,460 of the web's most prominent domains. 6.5% of domains have at least one AI Discovery File, 87.5% have no AI-specific crawler policy in their robots.txt, and the average AI readiness tier is 2.2 out of 5.0. Data is collected quarterly using the methodology described in our full methodology documentation.

AI Discovery Files are a set of 10 standardised root-level files — including llms.txt, ai.txt, ai.json, identity.json, and brand.txt — that help AI systems such as ChatGPT, Claude, and Gemini discover, interpret, and correctly represent a website. This research tracks their real-world adoption among the domains most likely to be referenced by AI systems when answering user questions.

Key Findings

The first quarterly crawl establishes a baseline for AI Discovery File adoption across the web's most prominent domains. At 6.5%, adoption is nascent but already outpacing several established web standards at equivalent points in their lifecycle. The more pressing finding is not how few files exist, but how many of those that do exist are broken: over half fail validation. The web is not ignoring AI — it simply has no infrastructure for it yet.

Quality is the real gap, not awareness Of the 100 AI Discovery Files found across all domains, 56% are invalid — containing URL dumps, placeholder content, or malformed syntax. Only 36 files are complete and specification-compliant. Previous industry studies that concluded llms.txt had "zero measurable impact" never examined whether the files they found actually worked. Having a file and having it work are two different things.
llms.txt leads adoption but carries the highest error rate 55 domains serve an llms.txt file (3.8% of those crawled), making it the most adopted AI Discovery File by a wide margin. However, 20 of those files are invalid — a 36% failure rate. The llms.html companion file fares worse: 41 found, but only 7 valid and just 1 complete. Early adoption is concentrating around the files AI systems are most likely to request, but without validation, many of these files are actively misleading.
87.5% of top websites have no AI crawler policy whatsoever The vast majority of websites have not yet addressed AI crawlers in their robots.txt. Of the 12.5% that have, most block selectively — targeting specific crawlers rather than setting a blanket policy. CCBot is the most frequently blocked agent (9.6%), followed by ClaudeBot (8.6%) and GPTBot (8.5%). Only 0.8% of sites explicitly allow AI access. The web has not decided how to handle AI crawlers; it has mostly not considered the question at all.
Nobody has reached Tier 5 — the ceiling is still open Zero domains in the sample achieved "AI-Optimised" status. The highest tier reached is Tier 4 (AI-Ready), occupied by just 22 domains — 1.5% of the sample. Meanwhile, 76.5% sit at Tier 2 (Passive), meaning they have a robots.txt but no AI-specific signals. Reaching Tier 4 today places a business ahead of 98.5% of the web's top domains. The competitive advantage for early movers is extraordinary, precisely because the bar is still so low.
Adoption is already outpacing comparable web standards At 6.5%, AI Discovery File adoption has already surpassed humans.txt (2.5%), a standard proposed in 2011. For context, robots.txt took roughly 15 years to reach 25% adoption; Schema.org took 8 years. ADF adoption appears to be on a significantly compressed timeline, likely driven by the commercial urgency of AI integration. If security.txt (12.7%, RFC published 2022) is any guide, ADF adoption could reach double digits within the next two to three quarters.
Early adopters span industries and geographies The 22 Tier 4 domains are not clustered in a single sector. They include global technology platforms (Shopify, Stripe, Opera), developer infrastructure (SourceForge, Dynatrace, Optimizely), UK public services (ScotRail, English Heritage, Energy Saving Trust), recruitment (Reed), and SaaS platforms (Mailchimp, Qualtrics, OneTrust). AI readiness is not a tech-sector concern — it is a cross-industry infrastructure decision.

— AI Visibility Research, February 2026

ADF Adoption by File Type

How many of the top websites have each AI Discovery File — and how many of those files pass structural validation. Files are checked at their canonical root-level URL (e.g., example.com/llms.txt) and validated against the ADF specification.

View data table

AI Discovery File adoption across 1,460 domains
File	Found	Valid	Complete
llms.txt	55	35	35
llms.html	41	7	1
ai.txt	1	0	0
ai.json	1	0	0
identity.json	0	0	0
brand.txt	0	0	0
faq-ai.txt	0	0	0
developer-ai.txt	0	0	0
robots-ai.txt	1	1	0

AI Crawler Access Policies

How websites use robots.txt to manage access for 15 known AI user agents — from OpenAI's GPTBot to Anthropic's ClaudeBot. Each domain is classified into one of five access policies based on its aggregate behaviour across all agents. The per-agent table below shows which AI crawlers are most frequently blocked.

AI Crawler	Company	Purpose	Blocked	Blocked %	Allowed	Allowed %
CCBot	Common Crawl	Training	140	9.6%	5	0.3%
ClaudeBot	Anthropic	Training	126	8.6%	8	0.5%
GPTBot	OpenAI	Training	124	8.5%	13	0.9%
Bytespider	ByteDance	Training	118	8.1%	2	0.1%
Applebot-Extended	Apple	Training	101	6.9%	4	0.3%
meta-externalagent	Meta	Training	98	6.7%	1	0.1%
Google-Extended	Google	Training	96	6.6%	10	0.7%
PerplexityBot	Perplexity	Search	96	6.6%	13	0.9%
Diffbot	Diffbot	Extraction	89	6.1%	2	0.1%
cohere-ai	Cohere	Training	87	6.0%	0	0.0%
Amazonbot	Amazon	Training	72	4.9%	2	0.1%
ChatGPT-User	OpenAI	Retrieval	70	4.8%	13	0.9%
OAI-SearchBot	OpenAI	Search	70	4.8%	14	1.0%
FacebookBot	Meta	Preview	67	4.6%	3	0.2%
Claude-User	Anthropic	Retrieval	48	3.3%	3	0.2%

View data table

AI crawler access policy distribution in robots.txt
Policy	Domains	Percentage
Blocks All AI	19	1.3%
Blocks Selectively	151	10.3%
Rate-Limits AI	2	0.1%
Explicitly Allows	11	0.8%
No AI Policy	1,277	87.5%

File Quality Distribution

Among the ADF files that were found, how many meet the full specification versus providing only minimal content or containing errors. Quality is assessed using per-file structural checks — required fields must pass for a file to be considered valid; recommended fields distinguish "complete" from "minimal" implementations.

36.0%

8.0%

56.0%

Complete 36.0% Minimal 8.0% Invalid 56.0%

AI Readiness Tiers

Each domain receives a readiness tier from 0 (Unaware) to 5 (AI-Optimised) based on three inputs: valid ADF file count, AI crawler policy in robots.txt, and Schema.org presence on the homepage. The tier model is deterministic with no opaque weights — the full calculation logic is published.

View data table

AI readiness tier distribution (average score: 2.2 / 5.0)
Tier	Domains	Percentage
Tier 5: AI-Optimised	0	0.0%
Tier 4: AI-Ready	22	1.5%
Tier 3: Partially Ready	284	19.5%
Tier 2: Passive	1,117	76.5%
Tier 1: Actively Blocking	19	1.3%
Tier 0: Unaware	18	1.2%

ADF vs Other Web Standards

Comparing AI Discovery File adoption against established web standards. This contextualises where ADF adoption sits relative to conventions like robots.txt (RFC 9309), ads.txt (IAB Tech Lab), security.txt (RFC 9116), and humans.txt, all of which also require placing files at the domain root.

View data table

AI Discovery File adoption compared with established web standards
Standard	Adoption
robots.txt	45.3%
ads.txt	15.3%
Schema.org	25.6%
security.txt	12.7%
humans.txt	2.5%
Any ADF file	6.5%

Notable Adopters

The top 20 domains by AI readiness tier, showing which high-profile websites are leading ADF adoption. Readiness tiers are calculated using the combinatorial scoring model.

Domain	Rank	Category	Files Found	Files Valid	Readiness
bmmagazine.co.uk	739	UK Top 1,000	1	1	AI-Ready
classlink.com	854	Global Top 1,000	1	1	AI-Ready
dynatrace.com	546	Global Top 1,000	1	1	AI-Ready
energysavingtrust.org.uk	489	UK Top 1,000	1	1	AI-Ready
english-heritage.org.uk	227	UK Top 1,000	1	1	AI-Ready
kingsfund.org.uk	905	UK Top 1,000	1	1	AI-Ready
mailchimp.com	694	Global Top 1,000	1	1	AI-Ready
mainlinemenswear.co.uk	951	UK Top 1,000	1	1	AI-Ready
netgear.com	891	Global Top 1,000	1	1	AI-Ready
onetrust.com	558	Global Top 1,000	1	1	AI-Ready
opera.com	87	Global Top 1,000	1	1	AI-Ready
optimizely.com	566	Global Top 1,000	1	1	AI-Ready
qualtrics.com	855	Global Top 1,000	1	1	AI-Ready
reed.co.uk	291	UK Top 1,000	1	1	AI-Ready
scotrail.co.uk	957	UK Top 1,000	1	1	AI-Ready
shopify.com	164	Global Top 1,000	1	1	AI-Ready
singular.net	751	Global Top 1,000	1	1	AI-Ready
smartsurvey.co.uk	625	UK Top 1,000	1	1	AI-Ready
sourceforge.net	214	Global Top 1,000	1	1	AI-Ready
stripe.com	261	Global Top 1,000	1	1	AI-Ready

Download the Data

Raw datasets from this quarter's crawl, licensed under CC BY 4.0. Use them for your own research, analysis, or reporting. When citing, please reference the quarter (e.g., "Q1 2026") and link to the methodology.

CSV Summary Statistics Headline adoption rates, readiness scores, and crawler access breakdowns. CSV Readiness Scores Per-domain readiness tiers, ADF file quality scores, and schema.org presence. CSV Crawler Blocking Per-domain AI crawler blocking data for 15 major AI user agents.

Methodology

How We Collect This Data

Our crawler checks the top 1,000 global and top 1,000 UK domains (deduplicated to ~1,995) for all 10 AI Discovery Files, validates each against the specification, analyses robots.txt AI crawler policies across 15 known agents, and scores each domain's overall AI readiness using a deterministic tier model. The full methodology — including validation rules, soft 404 detection, redirect classification, and scoring logic — is published for transparency.

Full methodology

Other Reports

Q2 2026 Report