Relationship to Other Standards

How the AI Discovery Files specification relates to adjacent standards and conventions.

Status Stable

This page documents how AI Discovery Files relates to neighbouring standards. References will be updated as the standards landscape evolves. See specification conventions for status definitions.

Last updated: 11 May 2026

Purpose

The AI Discovery Files specification does not exist in isolation. This page positions it relative to adjacent standards and conventions so implementers can understand where the specification overlaps, where it complements, and where it deliberately diverges.

1. llmstxt.org

llmstxt.org is the original proposal by Jeremy Howard for a Markdown file at /llms.txt that provides AI-readable context about a website. The AI Discovery Files llms.txt specification (ADF-001) directly builds on this proposal.

Relationship: backwards compatible. An llmstxt.org-conformant file is a valid llms.txt at the AI Discovery Files Essential conformance class. Publishers who follow the llmstxt.org convention need no migration. The AI Discovery Files specification adds:

A formal data model documented in JSON and YAML twins
Conformance class membership (Essential, Recommended, Complete)
Cross-file relationships (paired with llms.html, redirected to from llm.txt)
Validation rules and a reference validator
Companion files (identity.json, ai.json, etc.) that together form the AI Discovery Files suite
An optional Lang: header at the top of the file (BCP 47 language declaration)

llmstxt.org and AI Discovery Files share the goal of making web content machine-readable for AI systems; AI Discovery Files extends the surface area beyond a single Markdown file.

2. IETF AI Preferences (draft)

The IETF has discussed AI Preferences mechanisms in working group drafts (notably around opt-out signals for AI training). The drafts have proposed approaches including HTTP headers, content-level metadata, and crawler-specific directives.

Relationship: complementary, not competing. If an IETF-blessed mechanism emerges (e.g. a standardised AI-Preferences HTTP header), AI Discovery Files SHOULD honour it. The current AI Discovery Files use file-level declarations because:

File-level signals work without server-side configuration changes (publishers can edit a file; HTTP headers require server access)
File-level signals are inspectable by humans and validators without needing to send HTTP requests with specific headers
File-level signals can be archived (a snapshot of ai.json at a point in time is meaningful; a snapshot of a header is harder to capture)

The AI Discovery Files maintainer tracks IETF developments and will publish a mapping document if a stable IETF mechanism emerges. See the governance page on external standards-body engagement.

3. robots.txt

RFC 9309 (the Robots Exclusion Protocol) defines robots.txt. It governs which paths a crawler may fetch.

Relationship: complementary; robots.txt always takes precedence for access control. A publisher's AI Discovery File MUST NOT contradict robots.txt. The Interoperability Guide documents the precedence matrix.

The two serve different layers:

robots.txt: "May this URL be fetched at all?" Protocol-level access control.
AI Discovery Files: "Given that you may fetch, here's what we are, how we want to be cited, what permissions apply, and what context is useful." Content-level guidance.

The robots-ai.txt specification (ADF-010) supplements robots.txt with AI-specific directives without contradicting it.

4. Schema.org

Schema.org is the vocabulary used in JSON-LD structured data on web pages. It covers Organization, LocalBusiness, Article, FAQPage, and many more types.

Relationship: alignment, not duplication. The identity.json specification (ADF-006) is explicitly aligned with Schema.org Organization vocabulary. Field names match (name, alternateName, foundingDate, contactPoint, sameAs) so a publisher who maintains JSON-LD on their pages can derive an identity.json with minimal additional work.

Why a separate file when Schema.org JSON-LD on the homepage exists already:

Schema.org JSON-LD is embedded in HTML and requires page rendering to extract
A publisher's homepage often carries multiple Schema.org objects (Organization, WebPage, Product, FAQPage) which makes extracting "the canonical organisation identity" ambiguous
A separate identity.json at a predictable path is more discoverable for an AI consumer and unambiguous about which object is the canonical identity

Publishers SHOULD maintain both. The data overlaps; the surfaces differ.

5. Common Crawl and crawler-specific user agents

Common Crawl publishes a regular crawl of the public web that many AI training pipelines consume. AI-specific crawlers include GPTBot (OpenAI), ClaudeBot (Anthropic), Google-Extended (Google's AI-specific user agent), and PerplexityBot (Perplexity).

Relationship: the crawlers are the audience, not the standard. AI Discovery Files exist to be read by these crawlers. The robots-ai.txt specification includes guidance for crafting directives that target specific AI user agents.

Publishers SHOULD treat each AI crawler as a distinct user agent (CCBot is not the same as GPTBot is not the same as ClaudeBot). Blanket disallow rules block them all; per-user-agent rules let publishers make finer choices.

6. W3C AI Use Disclosure

The W3C has discussed AI use disclosure (whether and how websites should indicate AI-generated content, and whether AI systems should disclose use of website content). Concrete drafts have not stabilised at the time of writing.

Relationship: adjacent, watch this space. If the W3C publishes a stable disclosure standard, AI Discovery Files will publish a mapping. The current ai.json specification covers usage permissions and attribution from the publisher side; downstream disclosure by AI consumers is covered by the AI Consumer Guidance.

7. BCP 14 (RFC 2119 + RFC 8174)

All AI Discovery Files specifications use the requirement keywords defined in BCP 14. See the conventions page for the rules.

Relationship: normative dependency. AI Discovery Files MUST be read with BCP 14 in mind: only uppercase MUST, SHOULD, MAY, etc. carry requirement weight; lowercase forms are ordinary English.

8. JSON Schema 2020-12

The ai.json and identity.json specifications publish formal JSON Schemas conforming to JSON Schema 2020-12.

Relationship: normative dependency. Publishers SHOULD reference these schemas via $schema in their files. The spec meta-schema and the validator-output schema are also JSON Schema 2020-12 documents.

9. Semantic Versioning 2.0.0

Semantic Versioning 2.0.0 governs how every AI Discovery File specification is versioned.

Relationship: normative dependency. See the Versioning and Deprecation Policy for the project's specific MAJOR / MINOR / PATCH commitments.

10. Other AI-discovery conventions

Various community conventions have appeared for specific AI use cases. Examples include domain-level Acceptable Use Policy files, ads.txt for advertising authorisation (the inspiration for ai.txt's key:value structure), and humans.txt for site attribution.

Relationship: independent. AI Discovery Files do not interact with these conventions and don't impose constraints on publishers who use them. A publisher MAY publish ads.txt, humans.txt, and the full AI Discovery Files suite without conflict.

References

llmstxt.org: original proposal that llms.txt builds on.
RFC 9309: Robots Exclusion Protocol.
Schema.org: the vocabulary identity.json aligns with.
Common Crawl: large-scale public web crawl consumed by many AI pipelines.
BCP 14: requirement keywords.
JSON Schema 2020-12: schema dialect used in this project.
Semantic Versioning 2.0.0.
Interoperability Guide: precedence rules between AI Discovery Files and external standards.
Governance and Editorial Process: how standards-body engagement is managed.