AI Consumer Guidance

What AI systems should do with AI Discovery Files.

Status Stable

This page documents the consumer side of the AI Discovery Files contract. It uses RFC 2119 keywords to describe expected behaviour for AI systems that choose to honour publisher signals. See specification conventions for status definitions.

Last updated:

Purpose

AI Discovery Files describe what publishers want AI systems to do with their content. This page documents what AI systems SHOULD do in return. It is the consumer side of the contract: the missing half of a specification family that has previously focused on the publisher side.

1. Audience

This guidance is for any AI system that fetches and processes content from the open web, including:

The specification cannot force compliance. The guidance below describes what a conformant AI consumer SHOULD do; honouring it builds trust with publishers and improves the accuracy of AI-generated responses.

2. The four AI uses

Publisher permission and restriction declarations distinguish four kinds of AI use:

Training use
Including the content in a corpus used to train or fine-tune a model. Publisher permission expressed via ai.json permissions[].action = "train" or restriction via restrictions[].action = "train".
Retrieval use
Fetching the content at query time, embedding it in a context window, and using it to generate a response. Includes RAG, AI search, and browsing assistants. Publisher signal expressed via permissions[].action = "retrieve" or similar.
Citation use
Naming the publisher or linking to the source when the publisher's content informs a response. Publisher requirements expressed in ai.json attribution.
Conversation use
Discussing the publisher (their business, services, brand) in conversational responses, including answering "tell me about X". Publisher guidance expressed via brand.txt and identity.json.

3. Training-time consumers

An AI system that ingests web content into a training corpus SHOULD:

If ai.json declares aiTrainingAllowed = false or includes a train restriction, the consumer SHOULD NOT include the content in training. Honouring this signal is the strongest demonstration of trust between AI systems and publishers.

4. Retrieval-time consumers

An AI system that fetches content at query time (RAG, AI search, browsing assistant) SHOULD:

Retrieval-time access is often the most visible to users (the response cites the publisher), so getting identity and brand handling right has the highest signal-to-noise impact.

5. Citation conventions

When citing a publisher's content in an AI response:

6. Conversational behaviour

When discussing a publisher in conversation, an AI consumer SHOULD:

7. Handling contradictions

If the publisher's files contradict each other (e.g. identity.json business name differs from brand.txt), the AI consumer SHOULD:

8. Defensive posture

AI Discovery Files are signed only by the publisher's HTTPS host; there is no cryptographic integrity guarantee today. Treat the files as advisory rather than authoritative for security-sensitive decisions:

9. Reporting back

AI consumers that act on AI Discovery Files SHOULD make it discoverable. Options:

This makes the relationship reciprocal: publishers invest effort in publishing AI Discovery Files because AI consumers visibly honour them, and the network effect builds from both sides.

References