HTTP Behaviour
How AI Discovery Files behave at the HTTP protocol level.
This page documents the HTTP-level contract that AI Discovery Files specifications collectively assume. Per-file HTTP requirements are documented on the individual specification pages. See specification conventions for status definitions.
Last updated:
AI Discovery Files are fetched over HTTP. This page documents the HTTP-level expectations that apply to every AI Discovery File specification: which status codes mean what, how redirects are followed, how validators detect soft-404s, how query strings are treated, and how publishers and validators should handle caching and rate limiting.
1. Transport
AI Discovery Files SHOULD be served over HTTPS. Files served over plain HTTP carry no transport integrity beyond what the network provides and SHOULD be treated as untrusted by consumers. Validators MAY follow an HTTP-to-HTTPS redirect but MUST NOT report a file as conformant if its final canonical URL is HTTP.
Each file is fetched at its canonical path (e.g. /llms.txt) on the publisher's canonical host. The canonical host is the one served by the publisher's https://<host>/ after all redirects. A publisher SHOULD publish files only on the canonical host; mirrors on sibling hostnames are out of scope for this specification.
2. Status codes
The following status code expectations apply:
- 200 OK
- The file is present and the response body is the file's content. Validators MUST treat 200 as "file fetched successfully" and proceed to format validation.
- 301 Moved Permanently / 308 Permanent Redirect
- The file has moved permanently. Validators MUST follow the redirect up to a reasonable cap (5 hops is typical). The
llm.txtspecification (ADF-002) explicitly recommends a 301 redirect to/llms.txt; that's the canonical use case. - 302 Found / 303 See Other / 307 Temporary Redirect
- Temporary redirects. Validators MUST follow them but MAY warn that the publisher has not chosen a permanent location. For long-lived AI Discovery Files, 301 or 308 is strongly preferred.
- 304 Not Modified
- Conditional response to
If-Modified-SinceorIf-None-Match. Validators SHOULD support conditional requests but MUST NOT fail a file that returns 304 to a cached request. - 401 Unauthorized / 403 Forbidden
- The file exists but is not publicly accessible. AI Discovery Files are public by design; a 401 or 403 means the file does not satisfy the specification. Validators MUST report this as a failure.
- 404 Not Found
- The file does not exist. For optional files (any file outside the Essential conformance class for the target site), this is not a failure of the specification, only an absence of that file. For required files at the publisher's claimed conformance class, this is a conformance failure.
- 410 Gone
- The file has been permanently removed. Treat as 404 for conformance purposes, but a 410 is a stronger signal that re-checking is unlikely to find the file.
- 429 Too Many Requests
- The server is rate-limiting the validator. Validators MUST honour
Retry-Afterif present and MUST back off. Publishers SHOULD set rate limits high enough that periodic validator checks aren't blocked. - 5xx Server Error
- Transient publisher-side failure. Validators SHOULD retry once after a short delay before reporting a failure. Persistent 5xx is reported as "fetch failed", distinct from "file not present".
3. Redirect handling
Validators MUST follow redirects with these constraints:
- Maximum hops. No more than 5 redirect hops. Beyond that, the validator MUST stop and report a redirect-loop failure.
- Cross-host redirects. A redirect that crosses to a different hostname (other than the canonical www-to-apex or apex-to-www form) MUST be reported as a warning. The publisher's canonical host should serve the file directly.
- Mixed-protocol redirects. HTTP-to-HTTPS is normal and expected. HTTPS-to-HTTP is a failure; report and do not follow.
- Method preservation. Validators only ever issue GET; the method-preservation distinction between 301/302/303 and 307/308 does not apply in practice.
4. Soft-404 detection
Some hosts return an HTTP 200 response for any path, including paths that don't exist, by serving a generic "page not found" HTML page. Validators MUST detect this pattern (the soft-404) and treat it as a 404 for conformance purposes.
Detection heuristics that a validator MAY apply:
- Content-Type mismatch: a file expected to be
text/plainorapplication/jsonthat returnstext/htmlis almost certainly a soft-404. - Content does not parse as the expected format (e.g.
ai.jsonbody is not valid JSON). - Response body matches a known generic error template for the publisher's CMS or hosting platform.
- Response body is identical to a request for a clearly-nonexistent sibling path.
Publishers SHOULD configure their hosts to return a real 404 for missing AI Discovery Files rather than a soft-404. This makes diagnostics clearer and avoids ambiguity.
5. Query strings
The canonical URL for every AI Discovery File is its path without a query string. Validators fetch the canonical URL directly. Publishers MUST NOT require query-string parameters for the file to be served correctly.
Validators MAY add a cache-busting query string parameter (e.g. ?_v=<timestamp>) when bypassing intermediary caches. Publishers SHOULD serve the canonical content regardless of unknown query parameters.
If a publisher serves different content based on query strings (e.g. language variants), they SHOULD use distinct URL paths instead. Multi-language addressing is documented in the forthcoming i18n guidance (see roadmap).
6. Content negotiation
AI Discovery Files use deterministic file extensions and content types per the specification registry. Content negotiation via Accept headers is not part of the contract.
Specifically:
llms.txt,ai.txt,brand.txt,faq-ai.txt,developer-ai.txt,robots-ai.txt: served astext/plain; charset=utf-8(or, for Markdown, optionallytext/markdown; charset=utf-8).ai.json,identity.json: served asapplication/json.llms.html: served astext/html; charset=utf-8.
A publisher MAY include a UTF-8 byte-order mark, but SHOULD NOT; many parsers reject BOMs in JSON files.
7. Caching
Publishers SHOULD send realistic Cache-Control headers on AI Discovery Files. Recommendations:
Cache-Control: max-age=3600(1 hour, browser) ands-maxage=43200(12 hours, shared cache) for relatively stable files. The site's own files use this profile.- Files that change rarely (e.g.
identity.jsonfor an established business) MAY use a longer TTL. ETagandLast-Modifiedheaders SHOULD be present so conditional requests work.- Aggressive caching (e.g.
max-ageover a week) is reasonable for stable content but SHOULD be paired with cache-purge processes for when content does change.
Validators SHOULD respect cache headers under normal operation and SHOULD support a cache-bypass mode for explicit re-checks.
8. Rate limits
Publishers MAY rate-limit AI Discovery File requests but SHOULD set the limit generously: these are small, infrequent fetches by validators, AI crawlers, and curious humans. A per-IP limit of 10 requests per minute is typical and should not be tighter without good reason.
Validators SHOULD:
- Identify themselves with a clear
User-Agentstring - Honour
Retry-Afteron 429 responses - Back off exponentially on repeated 429 or 5xx
- Not bombard a publisher with re-checks; cache validation results for a reasonable interval
9. CORS
AI Discovery Files are public resources. Publishers SHOULD serve Access-Control-Allow-Origin: * on every file so JavaScript validators (including the AI Visibility Checker) can fetch them from any origin.
If a publisher cannot or does not want to enable CORS, server-side validators (those that don't run in a browser) still work; only browser-based validators are affected. The reference validator and most production validators are server-side.
10. Media types and IANA registration
The AI Discovery Files specification reuses existing IANA-registered media types rather than introducing new ones. Each file is served as one of the following:
| File | Media type | Source registration |
|---|---|---|
llms.txt | text/plain; charset=utf-8 or text/markdown; charset=utf-8 | text/plain / text/markdown |
llm.txt | text/plain; charset=utf-8 (or redirect) | text/plain |
llms.html | text/html; charset=utf-8 | text/html |
ai.txt | text/plain; charset=utf-8 | text/plain |
ai.json | application/json; charset=utf-8 | application/json |
identity.json | application/json; charset=utf-8 | application/json |
brand.txt | text/plain; charset=utf-8 | text/plain |
faq-ai.txt | text/plain; charset=utf-8 | text/plain |
developer-ai.txt | text/plain; charset=utf-8 | text/plain |
robots-ai.txt | text/plain; charset=utf-8 | text/plain |
10.1 Why no bespoke media types
This specification deliberately does not introduce new media types such as application/vnd.ai-visibility.identity+json or text/vnd.ai-discovery.llms+plain. The reasons:
- IANA registration is heavy. The process for registering a new media type (RFC 6838 procedure) is appropriate for protocols and formats with broad cross-organisation governance. For a single-team specification, the overhead outweighs the benefit.
- Existing types already work.
text/plainandapplication/jsonare universally supported by HTTP infrastructure, content delivery networks, and browser developer tools. A bespoke type would require every CDN and validator to add it to its whitelist. - The filename identifies the format. A consumer fetching
/llms.txtalready knows what it expects to receive. The media type is a fallback signal, not the primary identifier. - Versioning is handled elsewhere. A bespoke media type per file would either lock in a version (forcing a new type per MAJOR release) or carry no version information (defeating the purpose). The versioning policy handles version negotiation cleanly via the schema URL pattern.
10.2 Charset and structured-syntax suffix
Two media-type details apply uniformly:
charset=utf-8. Every file MUST be UTF-8 encoded; thecharsetparameter SHOULD appear on theContent-Typeresponse header for clarity. A consumer encountering a UTF-8 file without thecharsetparameter MUST still treat the file as UTF-8 because UTF-8 is mandated by every per-file specification.- The
+jsonstructured-syntax suffix. JSON files (ai.json,identity.json) use the bareapplication/jsontype rather than a bespokeapplication/...+jsonvariant. This is consistent with the rationale above: validators and HTTP stacks already handleapplication/jsonuniversally.
10.3 Content negotiation: out of scope
Section 6 of this page (content negotiation) documents that AI Discovery Files are not negotiated via Accept headers; the canonical file lives at the canonical path. The media-type table above lists what a publisher SHOULD send, not what a consumer SHOULD ask for. A consumer SHOULD send Accept: */* or omit the header entirely; consumers MUST NOT use Accept to request alternative representations because the specification does not define any.
10.4 Future position
If the AI Discovery Files specification is taken to a standards body (see the roadmap entry on IETF / W3C engagement), the question of formal media-type registration MAY be revisited as part of that engagement. Until that decision is made, the position above stands: existing IANA types, no bespoke registrations.
References
- RFC 9110: HTTP Semantics. Authoritative reference for status codes and conditional requests.
- RFC 9111: HTTP Caching. Authoritative reference for cache-control behaviour.
- Specification Conventions: editorial and structural conventions.
- Specification Registry: per-file media types and canonical paths.
- Interoperability Guide: precedence rules between robots.txt and AI Discovery Files.
- Security and Privacy Considerations: HTTPS expectations, logging concerns, integrity primitives.