AI Discovery Files vs Web Standards

AI Discovery Files don't replace existing web standards — they fill specific gaps that current standards leave open when AI systems try to understand your website.

Here's exactly what each existing standard does, what it doesn't do, and which AI Discovery File bridges the gap.

Illustration comparing traditional web standards on the left with AI Discovery Files on the right, connected by bridge-like lines showing how they complement each other

The Core Principle: Complement, Don't Replace

Every existing web standard was designed for a specific purpose — and each one does that job well. But none of them were designed to answer the questions AI systems now ask:

  • What is this business, exactly? — Not what a page is about, but who the entity behind the website is
  • What can I say about them? — Not whether I can crawl, but whether I can cite, quote, or recommend
  • What should I never claim? — Explicit boundaries that prevent hallucinated services or brand conflation
  • How should I refer to them? — Correct name, capitalisation, pronunciation, and terminology

AI Discovery Files answer these questions using simple, standardised file formats that AI systems can consume directly. The Interoperability Guide defines clear precedence rules for when information overlaps.

Existing Standard

robots.txt

What robots.txt Does

Controls which pages and directories web crawlers can access. Uses User-agent and Disallow directives to grant or deny access at the URL path level. Universally supported by search engine crawlers since 1994.

What It Doesn't Do for AI

  • No distinction between search crawlers and AI crawlers
  • Can't express "allow crawling but don't use for training"
  • No granular AI-specific permissions (citation, quoting, recommendation)
  • Binary allow/disallow only — no nuance

What robots-ai.txt Adds

AI-specific crawler directives with granular control over how different AI systems interact with your content. Extends the concept of robots.txt into AI-specific territory.

  • Named AI crawler directives (GPTBot, ClaudeBot, etc.)
  • Separate permissions for crawling vs training vs citation
  • Content-type specific rules (allow blog posts, restrict client data)
  • Works alongside robots.txt — robots.txt always takes precedence
Precedence: robots.txt always wins. If robots.txt blocks a crawler, robots-ai.txt cannot override that restriction. See interoperability rules.
Existing Standard

Schema.org Structured Data

What Schema.org Does

Provides structured data vocabulary for describing content within HTML pages. Powers rich search results (knowledge panels, FAQs, product listings). Embedded as JSON-LD, Microdata, or RDFa within individual pages.

What It Doesn't Do for AI

  • Page-scoped, not site-scoped — no single place for canonical identity
  • Describes content, not business identity or AI permissions
  • No mechanism for "do not claim we offer X" boundaries
  • Requires parsing HTML — not a standalone file AI systems can fetch directly

What identity.json Adds

A single, authoritative, standalone JSON file at the website root declaring canonical business identity. AI systems can fetch one file and know exactly who you are.

  • Site-wide canonical identity: name, description, URL, contact
  • Explicit service declarations and exclusions
  • Social profiles and authoritative URLs
  • Standalone file — no HTML parsing required
Precedence: identity.json takes precedence over Schema.org for business naming and identity when information conflicts. Schema.org remains authoritative for page-level content description. See interoperability rules.
Existing Standard

security.txt

What security.txt Does

Standardised file (RFC 9116) for publishing security vulnerability disclosure policies. Tells security researchers how to report issues, who to contact, and what your disclosure policy is. Published at /.well-known/security.txt.

What It Doesn't Do for AI

  • Addresses security researchers, not AI systems
  • No mechanism for AI-specific usage permissions
  • Can't declare what AI systems may or may not say about you
  • No coverage for content licensing, citation, or attribution rules

What ai.txt Adds

The AI equivalent of security.txt. A simple text file declaring your website's policies for AI system interaction — what's permitted, what's restricted, and how attribution should work.

  • Explicit AI usage permissions (training, citation, quoting)
  • Content licensing declarations
  • Attribution requirements for AI-generated citations
  • Opt-in/opt-out signals for AI use cases
Complementary: security.txt and ai.txt serve completely different audiences with no overlap. Both can and should coexist.
Existing Standard

humans.txt

What humans.txt Does

A plain text file crediting the people behind a website — developers, designers, project managers. An informal convention (not an RFC) for human-readable acknowledgement. Published at the website root.

What It Doesn't Do for AI

  • Informal, no standardised structure
  • Credits individuals, doesn't define brand identity
  • No naming rules, capitalisation guidelines, or terminology preferences
  • Not designed for machine parsing

What brand.txt Adds

Machine-readable brand guidelines for AI systems. Defines how your brand name should be written, pronounced, and referenced — and what terms to avoid.

  • Correct brand name, capitalisation, and spacing
  • Pronunciation guides for voice AI systems
  • Terms, abbreviations, and names to avoid
  • Structured format that AI systems can parse reliably
Complementary: humans.txt credits people; brand.txt defines how AI systems should refer to the brand. No conflict or overlap.
Existing Standard

ads.txt

What ads.txt Does

Declares authorised digital advertising sellers for a domain (IAB Tech Lab standard). Prevents ad fraud by letting advertisers verify that ad inventory is sold through legitimate channels. Plain text file at the website root.

What It Doesn't Do for AI

  • Specific to advertising supply chain
  • No mechanism for AI interaction permissions
  • Can't declare content licensing or usage restrictions
  • Addresses advertising platforms, not AI systems

What ai.json Adds

The machine-parseable counterpart to ai.txt. Where ads.txt declares authorised ad sellers, ai.json declares authorised AI interaction rules in a structured JSON format with JSON Schema validation.

  • Structured AI permissions and restrictions
  • Programmatic access for automated tools and validators
  • JSON Schema for automated validation
  • Granular content-type specific rules
Precedence: ai.json takes precedence over ai.txt for permissions when both exist and conflict. See interoperability rules.

What Only AI Discovery Files Provide

Some AI Discovery Files have no existing web standard equivalent at all. These address needs that simply didn't exist before AI systems became primary information sources.

llms.txt

AI-readable business context in Markdown format

No existing standard provides a structured, AI-optimised summary of a business. llms.txt gives AI systems a single document that explains who you are, what you do, and what context is important — written specifically for LLM consumption.

faq-ai.txt

Authoritative Q&A for AI retrieval

While Schema.org can mark up FAQ pages, faq-ai.txt is a standalone file of pre-authored answers specifically for AI citation. It ensures AI systems use your approved answers rather than generating their own from scattered page content.

developer-ai.txt

Technical context for AI systems

No standard communicates your technical platform, API availability, versioning conventions, or integration context to AI systems. developer-ai.txt gives AI the technical metadata it needs to provide accurate developer-facing responses.

llms.html

Human-readable reference version

A formatted HTML presentation of llms.txt content, giving humans a readable reference of what AI systems see. Bridges the gap between machine-readable and human-inspectable.

Quick Reference

Existing Standard Purpose AI Gap AI Discovery File
robots.txt Crawler access control No AI-specific granularity robots-ai.txt
Schema.org Page-level structured data No site-wide canonical identity identity.json
security.txt Vulnerability disclosure No AI usage permissions ai.txt
humans.txt Team credits No brand/naming rules for AI brand.txt
ads.txt Authorised ad sellers No AI interaction rules ai.json
None No AI-readable business summary llms.txt
None No pre-authored AI-ready FAQs faq-ai.txt
None No technical/developer context for AI developer-ai.txt

Frequently Asked Questions

Do AI Discovery Files replace robots.txt?

No. AI Discovery Files complement robots.txt, they do not replace it. robots.txt controls crawler access at the HTTP level. robots-ai.txt adds AI-specific granular directives, while other AI Discovery Files address identity, permissions, and context that robots.txt was never designed to handle.

Why not just use Schema.org for AI visibility?

Schema.org provides structured data for search engine result features like rich snippets. It is embedded within HTML pages and describes page-level content. AI Discovery Files are standalone root-level files that declare site-wide identity, permissions, and context specifically for AI systems. They address questions Schema.org was not designed for: What is the canonical business name? What can AI say about us? What services should AI not claim we offer?

Can I use both AI Discovery Files and existing standards?

Yes, and you should. AI Discovery Files are designed to work alongside existing standards, not replace them. The Interoperability Guide defines clear precedence rules for when information overlaps or conflicts between AI Discovery Files and standards like robots.txt or Schema.org.

What if AI Discovery Files conflict with robots.txt?

The Interoperability Guide establishes clear precedence: robots.txt always takes precedence over robots-ai.txt for access control. If robots.txt blocks a crawler, robots-ai.txt cannot override that. AI Discovery Files can only grant additional AI-specific permissions within the boundaries set by existing standards.

Start Implementing

AI Discovery Files work alongside your existing web infrastructure. You don't need to change anything you already have — just add the files that fill the gaps.

Free WordPress Plugin

Generate AI Discovery Files from your dashboard

Using WordPress? Install the plugin and create all 10 files in minutes — no coding, no configuration files to edit manually.

Get the Plugin

Register in the AI Visibility Directory

Once your AI Discovery Files are published, register your website in the AI Visibility Directory — the verified registry of websites implementing AI Discovery Files. Registration validates your implementation and lists your site for AI systems and industry peers to discover.

Basic Listing

Card entry in the directory with automated file validation. Open to any site with a valid llms.txt file. No cost.

Full Listing Recommended

Dedicated profile page on the directory with dofollow backlinks to your website — a genuine SEO authority signal from a topically relevant, verified source. Includes an attribution badge and enhanced visibility.