Q1 2026 Report

6.5% adoption

AI Discovery File Adoption Research

Measuring how the world's top websites prepare for AI systems. Based on a crawl of 1,995 domains.

1,460
Domains Crawled
of 1,995 total
6.5%
ADF Adoption
95 domains
2.2
Avg Readiness
out of 5.0
87.5%
No AI Policy
in robots.txt
You are viewing the archived Q1 2026 report. View latest report →

Summary

This Q1 2026 report analyses AI Discovery File adoption across 1,460 of the web's most prominent domains. 6.5% of domains have at least one AI Discovery File, 87.5% have no AI-specific crawler policy in their robots.txt, and the average AI readiness tier is 2.2 out of 5.0. Data is collected quarterly using the methodology described in our full methodology documentation.

AI Discovery Files are a set of 10 standardised root-level files — including llms.txt, ai.txt, ai.json, identity.json, and brand.txt — that help AI systems such as ChatGPT, Claude, and Gemini discover, interpret, and correctly represent a website. This research tracks their real-world adoption among the domains most likely to be referenced by AI systems when answering user questions.

Key Findings

The first quarterly crawl establishes a baseline for AI Discovery File adoption across the web's most prominent domains. At 6.5%, adoption is nascent but already outpacing several established web standards at equivalent points in their lifecycle. The more pressing finding is not how few files exist, but how many of those that do exist are broken: over half fail validation. The web is not ignoring AI — it simply has no infrastructure for it yet.

  1. Quality is the real gap, not awareness Of the 100 AI Discovery Files found across all domains, 56% are invalid — containing URL dumps, placeholder content, or malformed syntax. Only 36 files are complete and specification-compliant. Previous industry studies that concluded llms.txt had "zero measurable impact" never examined whether the files they found actually worked. Having a file and having it work are two different things.
  2. llms.txt leads adoption but carries the highest error rate 55 domains serve an llms.txt file (3.8% of those crawled), making it the most adopted AI Discovery File by a wide margin. However, 20 of those files are invalid — a 36% failure rate. The llms.html companion file fares worse: 41 found, but only 7 valid and just 1 complete. Early adoption is concentrating around the files AI systems are most likely to request, but without validation, many of these files are actively misleading.
  3. 87.5% of top websites have no AI crawler policy whatsoever The vast majority of websites have not yet addressed AI crawlers in their robots.txt. Of the 12.5% that have, most block selectively — targeting specific crawlers rather than setting a blanket policy. CCBot is the most frequently blocked agent (9.6%), followed by ClaudeBot (8.6%) and GPTBot (8.5%). Only 0.8% of sites explicitly allow AI access. The web has not decided how to handle AI crawlers; it has mostly not considered the question at all.
  4. Nobody has reached Tier 5 — the ceiling is still open Zero domains in the sample achieved "AI-Optimised" status. The highest tier reached is Tier 4 (AI-Ready), occupied by just 22 domains — 1.5% of the sample. Meanwhile, 76.5% sit at Tier 2 (Passive), meaning they have a robots.txt but no AI-specific signals. Reaching Tier 4 today places a business ahead of 98.5% of the web's top domains. The competitive advantage for early movers is extraordinary, precisely because the bar is still so low.
  5. Adoption is already outpacing comparable web standards At 6.5%, AI Discovery File adoption has already surpassed humans.txt (2.5%), a standard proposed in 2011. For context, robots.txt took roughly 15 years to reach 25% adoption; Schema.org took 8 years. ADF adoption appears to be on a significantly compressed timeline, likely driven by the commercial urgency of AI integration. If security.txt (12.7%, RFC published 2022) is any guide, ADF adoption could reach double digits within the next two to three quarters.
  6. Early adopters span industries and geographies The 22 Tier 4 domains are not clustered in a single sector. They include global technology platforms (Shopify, Stripe, Opera), developer infrastructure (SourceForge, Dynatrace, Optimizely), UK public services (ScotRail, English Heritage, Energy Saving Trust), recruitment (Reed), and SaaS platforms (Mailchimp, Qualtrics, OneTrust). AI readiness is not a tech-sector concern — it is a cross-industry infrastructure decision.

— AI Visibility Research, February 2026

ADF Adoption by File Type

How many of the top websites have each AI Discovery File — and how many of those files pass structural validation. Files are checked at their canonical root-level URL (e.g., example.com/llms.txt) and validated against the ADF specification.

View data table
AI Discovery File adoption across 1,460 domains
File Found Valid Complete
llms.txt 55 35 35
llms.html 41 7 1
ai.txt 1 0 0
ai.json 1 0 0
identity.json 0 0 0
brand.txt 0 0 0
faq-ai.txt 0 0 0
developer-ai.txt 0 0 0
robots-ai.txt 1 1 0

AI Crawler Access Policies

How websites use robots.txt to manage access for 15 known AI user agents — from OpenAI's GPTBot to Anthropic's ClaudeBot. Each domain is classified into one of five access policies based on its aggregate behaviour across all agents. The per-agent table below shows which AI crawlers are most frequently blocked.

AI Crawler Company Purpose Blocked Blocked % Allowed Allowed %
CCBot Common Crawl Training 140 9.6% 5 0.3%
ClaudeBot Anthropic Training 126 8.6% 8 0.5%
GPTBot OpenAI Training 124 8.5% 13 0.9%
Bytespider ByteDance Training 118 8.1% 2 0.1%
Applebot-Extended Apple Training 101 6.9% 4 0.3%
meta-externalagent Meta Training 98 6.7% 1 0.1%
Google-Extended Google Training 96 6.6% 10 0.7%
PerplexityBot Perplexity Search 96 6.6% 13 0.9%
Diffbot Diffbot Extraction 89 6.1% 2 0.1%
cohere-ai Cohere Training 87 6.0% 0 0.0%
Amazonbot Amazon Training 72 4.9% 2 0.1%
ChatGPT-User OpenAI Retrieval 70 4.8% 13 0.9%
OAI-SearchBot OpenAI Search 70 4.8% 14 1.0%
FacebookBot Meta Preview 67 4.6% 3 0.2%
Claude-User Anthropic Retrieval 48 3.3% 3 0.2%
View data table
AI crawler access policy distribution in robots.txt
Policy Domains Percentage
Blocks All AI 19 1.3%
Blocks Selectively 151 10.3%
Rate-Limits AI 2 0.1%
Explicitly Allows 11 0.8%
No AI Policy 1,277 87.5%

File Quality Distribution

Among the ADF files that were found, how many meet the full specification versus providing only minimal content or containing errors. Quality is assessed using per-file structural checks — required fields must pass for a file to be considered valid; recommended fields distinguish "complete" from "minimal" implementations.

36.0%
8.0%
56.0%
Complete 36.0% Minimal 8.0% Invalid 56.0%

AI Readiness Tiers

Each domain receives a readiness tier from 0 (Unaware) to 5 (AI-Optimised) based on three inputs: valid ADF file count, AI crawler policy in robots.txt, and Schema.org presence on the homepage. The tier model is deterministic with no opaque weights — the full calculation logic is published.

View data table
AI readiness tier distribution (average score: 2.2 / 5.0)
Tier Domains Percentage
Tier 5: AI-Optimised 0 0.0%
Tier 4: AI-Ready 22 1.5%
Tier 3: Partially Ready 284 19.5%
Tier 2: Passive 1,117 76.5%
Tier 1: Actively Blocking 19 1.3%
Tier 0: Unaware 18 1.2%

ADF vs Other Web Standards

Comparing AI Discovery File adoption against established web standards. This contextualises where ADF adoption sits relative to conventions like robots.txt (RFC 9309), ads.txt (IAB Tech Lab), security.txt (RFC 9116), and humans.txt, all of which also require placing files at the domain root.

View data table
AI Discovery File adoption compared with established web standards
Standard Adoption
robots.txt 45.3%
ads.txt 15.3%
Schema.org 25.6%
security.txt 12.7%
humans.txt 2.5%
Any ADF file 6.5%

Notable Adopters

The top 20 domains by AI readiness tier, showing which high-profile websites are leading ADF adoption. Readiness tiers are calculated using the combinatorial scoring model.

Domain Rank Category Files Found Files Valid Readiness
bmmagazine.co.uk 739 UK Top 1,000 1 1 AI-Ready
classlink.com 854 Global Top 1,000 1 1 AI-Ready
dynatrace.com 546 Global Top 1,000 1 1 AI-Ready
energysavingtrust.org.uk 489 UK Top 1,000 1 1 AI-Ready
english-heritage.org.uk 227 UK Top 1,000 1 1 AI-Ready
kingsfund.org.uk 905 UK Top 1,000 1 1 AI-Ready
mailchimp.com 694 Global Top 1,000 1 1 AI-Ready
mainlinemenswear.co.uk 951 UK Top 1,000 1 1 AI-Ready
netgear.com 891 Global Top 1,000 1 1 AI-Ready
onetrust.com 558 Global Top 1,000 1 1 AI-Ready
opera.com 87 Global Top 1,000 1 1 AI-Ready
optimizely.com 566 Global Top 1,000 1 1 AI-Ready
qualtrics.com 855 Global Top 1,000 1 1 AI-Ready
reed.co.uk 291 UK Top 1,000 1 1 AI-Ready
scotrail.co.uk 957 UK Top 1,000 1 1 AI-Ready
shopify.com 164 Global Top 1,000 1 1 AI-Ready
singular.net 751 Global Top 1,000 1 1 AI-Ready
smartsurvey.co.uk 625 UK Top 1,000 1 1 AI-Ready
sourceforge.net 214 Global Top 1,000 1 1 AI-Ready
stripe.com 261 Global Top 1,000 1 1 AI-Ready

Download the Data

Raw datasets from this quarter's crawl, licensed under CC BY 4.0. Use them for your own research, analysis, or reporting. When citing, please reference the quarter (e.g., "Q1 2026") and link to the methodology.

Methodology

How We Collect This Data

Our crawler checks the top 1,000 global and top 1,000 UK domains (deduplicated to ~1,995) for all 10 AI Discovery Files, validates each against the specification, analyses robots.txt AI crawler policies across 15 known agents, and scores each domain's overall AI readiness using a deterministic tier model. The full methodology — including validation rules, soft 404 detection, redirect classification, and scoring logic — is published for transparency.

Full methodology

Other Reports