HTTP Behaviour

How AI Discovery Files behave at the HTTP protocol level.

Status Stable

This page documents the HTTP-level contract that AI Discovery Files specifications collectively assume. Per-file HTTP requirements are documented on the individual specification pages. See specification conventions for status definitions.

Last updated: 11 May 2026

Purpose

AI Discovery Files are fetched over HTTP. This page documents the HTTP-level expectations that apply to every AI Discovery File specification: which status codes mean what, how redirects are followed, how validators detect soft-404s, how query strings are treated, and how publishers and validators should handle caching and rate limiting.

1. Transport

AI Discovery Files SHOULD be served over HTTPS. Files served over plain HTTP carry no transport integrity beyond what the network provides and SHOULD be treated as untrusted by consumers. Validators MAY follow an HTTP-to-HTTPS redirect but MUST NOT report a file as conformant if its final canonical URL is HTTP.

Each file is fetched at its canonical path (e.g. /llms.txt) on the publisher's canonical host. The canonical host is the one served by the publisher's https://<host>/ after all redirects. A publisher SHOULD publish files only on the canonical host; mirrors on sibling hostnames are out of scope for this specification.

2. Status codes

The following status code expectations apply:

200 OK: The file is present and the response body is the file's content. Validators MUST treat 200 as "file fetched successfully" and proceed to format validation.
301 Moved Permanently / 308 Permanent Redirect: The file has moved permanently. Validators MUST follow the redirect up to a reasonable cap (5 hops is typical). The llm.txt specification (ADF-002) explicitly recommends a 301 redirect to /llms.txt; that's the canonical use case.
302 Found / 303 See Other / 307 Temporary Redirect: Temporary redirects. Validators MUST follow them but MAY warn that the publisher has not chosen a permanent location. For long-lived AI Discovery Files, 301 or 308 is strongly preferred.
304 Not Modified: Conditional response to If-Modified-Since or If-None-Match. Validators SHOULD support conditional requests but MUST NOT fail a file that returns 304 to a cached request.
401 Unauthorized / 403 Forbidden: The file exists but is not publicly accessible. AI Discovery Files are public by design; a 401 or 403 means the file does not satisfy the specification. Validators MUST report this as a failure.
404 Not Found: The file does not exist. For optional files (any file outside the Essential conformance class for the target site), this is not a failure of the specification, only an absence of that file. For required files at the publisher's claimed conformance class, this is a conformance failure.
410 Gone: The file has been permanently removed. Treat as 404 for conformance purposes, but a 410 is a stronger signal that re-checking is unlikely to find the file.
429 Too Many Requests: The server is rate-limiting the validator. Validators MUST honour Retry-After if present and MUST back off. Publishers SHOULD set rate limits high enough that periodic validator checks aren't blocked.
5xx Server Error: Transient publisher-side failure. Validators SHOULD retry once after a short delay before reporting a failure. Persistent 5xx is reported as "fetch failed", distinct from "file not present".

3. Redirect handling

Validators MUST follow redirects with these constraints:

Maximum hops. No more than 5 redirect hops. Beyond that, the validator MUST stop and report a redirect-loop failure.
Cross-host redirects. A redirect that crosses to a different hostname (other than the canonical www-to-apex or apex-to-www form) MUST be reported as a warning. The publisher's canonical host should serve the file directly.
Mixed-protocol redirects. HTTP-to-HTTPS is normal and expected. HTTPS-to-HTTP is a failure; report and do not follow.
Method preservation. Validators only ever issue GET; the method-preservation distinction between 301/302/303 and 307/308 does not apply in practice.

4. Soft-404 detection

Some hosts return an HTTP 200 response for any path, including paths that don't exist, by serving a generic "page not found" HTML page. Validators MUST detect this pattern (the soft-404) and treat it as a 404 for conformance purposes.

Detection heuristics that a validator MAY apply:

Content-Type mismatch: a file expected to be text/plain or application/json that returns text/html is almost certainly a soft-404.
Content does not parse as the expected format (e.g. ai.json body is not valid JSON).
Response body matches a known generic error template for the publisher's CMS or hosting platform.
Response body is identical to a request for a clearly-nonexistent sibling path.

Publishers SHOULD configure their hosts to return a real 404 for missing AI Discovery Files rather than a soft-404. This makes diagnostics clearer and avoids ambiguity.

5. Query strings

The canonical URL for every AI Discovery File is its path without a query string. Validators fetch the canonical URL directly. Publishers MUST NOT require query-string parameters for the file to be served correctly.

Validators MAY add a cache-busting query string parameter (e.g. ?_v=<timestamp>) when bypassing intermediary caches. Publishers SHOULD serve the canonical content regardless of unknown query parameters.

If a publisher serves different content based on query strings (e.g. language variants), they SHOULD use distinct URL paths instead. Multi-language addressing is documented in the forthcoming i18n guidance (see roadmap).

6. Content negotiation

AI Discovery Files use deterministic file extensions and content types per the specification registry. Content negotiation via Accept headers is not part of the contract.

Specifically:

llms.txt, ai.txt, brand.txt, faq-ai.txt, developer-ai.txt, robots-ai.txt: served as text/plain; charset=utf-8 (or, for Markdown, optionally text/markdown; charset=utf-8).
ai.json, identity.json: served as application/json.
llms.html: served as text/html; charset=utf-8.

A publisher MAY include a UTF-8 byte-order mark, but SHOULD NOT; many parsers reject BOMs in JSON files.

7. Caching

Publishers SHOULD send realistic Cache-Control headers on AI Discovery Files. Recommendations:

Cache-Control: max-age=3600 (1 hour, browser) and s-maxage=43200 (12 hours, shared cache) for relatively stable files. The site's own files use this profile.
Files that change rarely (e.g. identity.json for an established business) MAY use a longer TTL.
ETag and Last-Modified headers SHOULD be present so conditional requests work.
Aggressive caching (e.g. max-age over a week) is reasonable for stable content but SHOULD be paired with cache-purge processes for when content does change.

Validators SHOULD respect cache headers under normal operation and SHOULD support a cache-bypass mode for explicit re-checks.

8. Rate limits

Publishers MAY rate-limit AI Discovery File requests but SHOULD set the limit generously: these are small, infrequent fetches by validators, AI crawlers, and curious humans. A per-IP limit of 10 requests per minute is typical and should not be tighter without good reason.

Validators SHOULD:

Identify themselves with a clear User-Agent string
Honour Retry-After on 429 responses
Back off exponentially on repeated 429 or 5xx
Not bombard a publisher with re-checks; cache validation results for a reasonable interval

9. CORS

AI Discovery Files are public resources. Publishers SHOULD serve Access-Control-Allow-Origin: * on every file so JavaScript validators (including the AI Visibility Checker) can fetch them from any origin.

If a publisher cannot or does not want to enable CORS, server-side validators (those that don't run in a browser) still work; only browser-based validators are affected. The reference validator and most production validators are server-side.

10. Media types and IANA registration

The AI Discovery Files specification reuses existing IANA-registered media types rather than introducing new ones. Each file is served as one of the following:

File	Media type	Source registration
`llms.txt`	`text/plain; charset=utf-8` or `text/markdown; charset=utf-8`	text/plain / text/markdown
`llm.txt`	`text/plain; charset=utf-8` (or redirect)	text/plain
`llms.html`	`text/html; charset=utf-8`	text/html
`ai.txt`	`text/plain; charset=utf-8`	text/plain
`ai.json`	`application/json; charset=utf-8`	application/json
`identity.json`	`application/json; charset=utf-8`	application/json
`brand.txt`	`text/plain; charset=utf-8`	text/plain
`faq-ai.txt`	`text/plain; charset=utf-8`	text/plain
`developer-ai.txt`	`text/plain; charset=utf-8`	text/plain
`robots-ai.txt`	`text/plain; charset=utf-8`	text/plain

10.1 Why no bespoke media types

This specification deliberately does not introduce new media types such as application/vnd.ai-visibility.identity+json or text/vnd.ai-discovery.llms+plain. The reasons:

IANA registration is heavy. The process for registering a new media type (RFC 6838 procedure) is appropriate for protocols and formats with broad cross-organisation governance. For a single-team specification, the overhead outweighs the benefit.
Existing types already work. text/plain and application/json are universally supported by HTTP infrastructure, content delivery networks, and browser developer tools. A bespoke type would require every CDN and validator to add it to its whitelist.
The filename identifies the format. A consumer fetching /llms.txt already knows what it expects to receive. The media type is a fallback signal, not the primary identifier.
Versioning is handled elsewhere. A bespoke media type per file would either lock in a version (forcing a new type per MAJOR release) or carry no version information (defeating the purpose). The versioning policy handles version negotiation cleanly via the schema URL pattern.

10.2 Charset and structured-syntax suffix

Two media-type details apply uniformly:

charset=utf-8. Every file MUST be UTF-8 encoded; the charset parameter SHOULD appear on the Content-Type response header for clarity. A consumer encountering a UTF-8 file without the charset parameter MUST still treat the file as UTF-8 because UTF-8 is mandated by every per-file specification.
The +json structured-syntax suffix. JSON files (ai.json, identity.json) use the bare application/json type rather than a bespoke application/...+json variant. This is consistent with the rationale above: validators and HTTP stacks already handle application/json universally.

10.3 Content negotiation: out of scope

Section 6 of this page (content negotiation) documents that AI Discovery Files are not negotiated via Accept headers; the canonical file lives at the canonical path. The media-type table above lists what a publisher SHOULD send, not what a consumer SHOULD ask for. A consumer SHOULD send Accept: */* or omit the header entirely; consumers MUST NOT use Accept to request alternative representations because the specification does not define any.

10.4 Future position

If the AI Discovery Files specification is taken to a standards body (see the roadmap entry on IETF / W3C engagement), the question of formal media-type registration MAY be revisited as part of that engagement. Until that decision is made, the position above stands: existing IANA types, no bespoke registrations.

References

RFC 9110: HTTP Semantics. Authoritative reference for status codes and conditional requests.
RFC 9111: HTTP Caching. Authoritative reference for cache-control behaviour.
Specification Conventions: editorial and structural conventions.
Specification Registry: per-file media types and canonical paths.
Interoperability Guide: precedence rules between robots.txt and AI Discovery Files.
Security and Privacy Considerations: HTTPS expectations, logging concerns, integrity primitives.