HTTP Behaviour

How AI Discovery Files behave at the HTTP protocol level.

Status Stable

This page documents the HTTP-level contract that AI Discovery Files specifications collectively assume. Per-file HTTP requirements are documented on the individual specification pages. See specification conventions for status definitions.

Last updated:

Purpose

AI Discovery Files are fetched over HTTP. This page documents the HTTP-level expectations that apply to every AI Discovery File specification: which status codes mean what, how redirects are followed, how validators detect soft-404s, how query strings are treated, and how publishers and validators should handle caching and rate limiting.

1. Transport

AI Discovery Files SHOULD be served over HTTPS. Files served over plain HTTP carry no transport integrity beyond what the network provides and SHOULD be treated as untrusted by consumers. Validators MAY follow an HTTP-to-HTTPS redirect but MUST NOT report a file as conformant if its final canonical URL is HTTP.

Each file is fetched at its canonical path (e.g. /llms.txt) on the publisher's canonical host. The canonical host is the one served by the publisher's https://<host>/ after all redirects. A publisher SHOULD publish files only on the canonical host; mirrors on sibling hostnames are out of scope for this specification.

2. Status codes

The following status code expectations apply:

200 OK
The file is present and the response body is the file's content. Validators MUST treat 200 as "file fetched successfully" and proceed to format validation.
301 Moved Permanently / 308 Permanent Redirect
The file has moved permanently. Validators MUST follow the redirect up to a reasonable cap (5 hops is typical). The llm.txt specification (ADF-002) explicitly recommends a 301 redirect to /llms.txt; that's the canonical use case.
302 Found / 303 See Other / 307 Temporary Redirect
Temporary redirects. Validators MUST follow them but MAY warn that the publisher has not chosen a permanent location. For long-lived AI Discovery Files, 301 or 308 is strongly preferred.
304 Not Modified
Conditional response to If-Modified-Since or If-None-Match. Validators SHOULD support conditional requests but MUST NOT fail a file that returns 304 to a cached request.
401 Unauthorized / 403 Forbidden
The file exists but is not publicly accessible. AI Discovery Files are public by design; a 401 or 403 means the file does not satisfy the specification. Validators MUST report this as a failure.
404 Not Found
The file does not exist. For optional files (any file outside the Essential conformance class for the target site), this is not a failure of the specification, only an absence of that file. For required files at the publisher's claimed conformance class, this is a conformance failure.
410 Gone
The file has been permanently removed. Treat as 404 for conformance purposes, but a 410 is a stronger signal that re-checking is unlikely to find the file.
429 Too Many Requests
The server is rate-limiting the validator. Validators MUST honour Retry-After if present and MUST back off. Publishers SHOULD set rate limits high enough that periodic validator checks aren't blocked.
5xx Server Error
Transient publisher-side failure. Validators SHOULD retry once after a short delay before reporting a failure. Persistent 5xx is reported as "fetch failed", distinct from "file not present".

3. Redirect handling

Validators MUST follow redirects with these constraints:

4. Soft-404 detection

Some hosts return an HTTP 200 response for any path, including paths that don't exist, by serving a generic "page not found" HTML page. Validators MUST detect this pattern (the soft-404) and treat it as a 404 for conformance purposes.

Detection heuristics that a validator MAY apply:

Publishers SHOULD configure their hosts to return a real 404 for missing AI Discovery Files rather than a soft-404. This makes diagnostics clearer and avoids ambiguity.

5. Query strings

The canonical URL for every AI Discovery File is its path without a query string. Validators fetch the canonical URL directly. Publishers MUST NOT require query-string parameters for the file to be served correctly.

Validators MAY add a cache-busting query string parameter (e.g. ?_v=<timestamp>) when bypassing intermediary caches. Publishers SHOULD serve the canonical content regardless of unknown query parameters.

If a publisher serves different content based on query strings (e.g. language variants), they SHOULD use distinct URL paths instead. Multi-language addressing is documented in the forthcoming i18n guidance (see roadmap).

6. Content negotiation

AI Discovery Files use deterministic file extensions and content types per the specification registry. Content negotiation via Accept headers is not part of the contract.

Specifically:

A publisher MAY include a UTF-8 byte-order mark, but SHOULD NOT; many parsers reject BOMs in JSON files.

7. Caching

Publishers SHOULD send realistic Cache-Control headers on AI Discovery Files. Recommendations:

Validators SHOULD respect cache headers under normal operation and SHOULD support a cache-bypass mode for explicit re-checks.

8. Rate limits

Publishers MAY rate-limit AI Discovery File requests but SHOULD set the limit generously: these are small, infrequent fetches by validators, AI crawlers, and curious humans. A per-IP limit of 10 requests per minute is typical and should not be tighter without good reason.

Validators SHOULD:

9. CORS

AI Discovery Files are public resources. Publishers SHOULD serve Access-Control-Allow-Origin: * on every file so JavaScript validators (including the AI Visibility Checker) can fetch them from any origin.

If a publisher cannot or does not want to enable CORS, server-side validators (those that don't run in a browser) still work; only browser-based validators are affected. The reference validator and most production validators are server-side.

10. Media types and IANA registration

The AI Discovery Files specification reuses existing IANA-registered media types rather than introducing new ones. Each file is served as one of the following:

FileMedia typeSource registration
llms.txttext/plain; charset=utf-8 or text/markdown; charset=utf-8text/plain / text/markdown
llm.txttext/plain; charset=utf-8 (or redirect)text/plain
llms.htmltext/html; charset=utf-8text/html
ai.txttext/plain; charset=utf-8text/plain
ai.jsonapplication/json; charset=utf-8application/json
identity.jsonapplication/json; charset=utf-8application/json
brand.txttext/plain; charset=utf-8text/plain
faq-ai.txttext/plain; charset=utf-8text/plain
developer-ai.txttext/plain; charset=utf-8text/plain
robots-ai.txttext/plain; charset=utf-8text/plain

10.1 Why no bespoke media types

This specification deliberately does not introduce new media types such as application/vnd.ai-visibility.identity+json or text/vnd.ai-discovery.llms+plain. The reasons:

  1. IANA registration is heavy. The process for registering a new media type (RFC 6838 procedure) is appropriate for protocols and formats with broad cross-organisation governance. For a single-team specification, the overhead outweighs the benefit.
  2. Existing types already work. text/plain and application/json are universally supported by HTTP infrastructure, content delivery networks, and browser developer tools. A bespoke type would require every CDN and validator to add it to its whitelist.
  3. The filename identifies the format. A consumer fetching /llms.txt already knows what it expects to receive. The media type is a fallback signal, not the primary identifier.
  4. Versioning is handled elsewhere. A bespoke media type per file would either lock in a version (forcing a new type per MAJOR release) or carry no version information (defeating the purpose). The versioning policy handles version negotiation cleanly via the schema URL pattern.

10.2 Charset and structured-syntax suffix

Two media-type details apply uniformly:

10.3 Content negotiation: out of scope

Section 6 of this page (content negotiation) documents that AI Discovery Files are not negotiated via Accept headers; the canonical file lives at the canonical path. The media-type table above lists what a publisher SHOULD send, not what a consumer SHOULD ask for. A consumer SHOULD send Accept: */* or omit the header entirely; consumers MUST NOT use Accept to request alternative representations because the specification does not define any.

10.4 Future position

If the AI Discovery Files specification is taken to a standards body (see the roadmap entry on IETF / W3C engagement), the question of formal media-type registration MAY be revisited as part of that engagement. Until that decision is made, the position above stands: existing IANA types, no bespoke registrations.

References