Security and Privacy Considerations

Cross-cutting security and privacy guidance for publishers and consumers of AI Discovery Files.

Status Stable

This guidance is published and current. It documents responsibilities that publishers and consumers SHOULD observe; specific normative requirements are surfaced in the individual specifications that introduce them. See specification conventions for status definitions.

Last updated:

Purpose

AI Discovery Files describe a publisher's identity, scope, permissions, and context to AI systems. The files sit at predictable root-level paths, are publicly fetched, and are meant to be trusted by consumers. This page documents the security and privacy implications of that contract, what the specification does and does not enforce, and what publishers and consumers SHOULD do to handle the gaps responsibly.

1. Trust model

The trust model for AI Discovery Files rests on three assumptions:

  1. The publisher controls the host. A file served from https://example.com/ai.json is assumed to represent the operator of example.com. The HTTPS host is the trust anchor.
  2. The transport is HTTPS. Consumers SHOULD fetch files over HTTPS. Files served over plain HTTP carry no integrity guarantees beyond what the transport provides; consumers MAY treat them as untrusted.
  3. The publisher accepts public disclosure. Anything written in an AI Discovery File is public. There is no privacy boundary inside the file format itself.

The specification does NOT guarantee:

Integrity guarantees beyond TLS are a forthcoming capability. See section 6.

2. Content injection and prompt-style manipulation

AI Discovery Files are read by automated systems including large language models. A publisher SHOULD NOT include content designed to manipulate an AI consumer's downstream behaviour beyond the documented field semantics.

Patterns to avoid:

Consumers SHOULD validate field content against the format documented in each specification, treat instruction-shaped content in non-instruction fields as untrusted prose, and decline to act on it.

3. Personal data and GDPR

Several AI Discovery Files describe people: identity.json can list founders and notable employees; brand.txt can mention spokespeople; contact fields can include personal email addresses or telephone numbers.

Publishers SHOULD treat the file as a public statement to which all UK GDPR / EU GDPR principles apply:

For sole traders, single-person businesses, and home-based businesses where the "company contact" is the operator's personal contact, this distinction matters. Publishers SHOULD use a dedicated business email address and a separate business contact channel where practical.

4. Access control

AI Discovery Files describe usage preferences. They do NOT enforce access control. A file declaring "AI-Training: No" is a request; it does not technically prevent an AI crawler from training on the content.

For enforcement, publishers SHOULD layer multiple mechanisms:

A publisher relying solely on AI Discovery Files for enforcement is relying on goodwill. The files are useful and increasingly respected, but they are not a security boundary.

5. Logging and request fingerprinting

The files at /llms.txt, /ai.txt, etc. are fetched by AI crawlers, validators, the AI Visibility Checker, indexing tools, and curious humans. Each fetch leaves a log line on the publisher's server.

Implications:

6. File integrity and signing

The current specification does not include file-integrity primitives. A consumer that fetches identity.json over HTTPS gets the transport-level integrity TLS provides but no signed assertion that the file's content originated from the claimed publisher. This section documents the current baseline, the candidate mechanisms for a future signing capability, and the cross-cutting concerns that hold signing back to a future MAJOR release.

6.1 Current baseline (1.x)

For the entire 1.x line of the specification, the trust model relies on:

This is enough for the specification's stated purpose: helping AI systems discover, interpret, and safely use publishers' machine-readable identity and context. It is not enough to defend against an attacker who controls the publisher's hosting (e.g. via DNS hijack or compromised hosting credentials).

6.2 Candidate signing mechanisms

The maintainer is tracking four candidate mechanisms for a future MAJOR release. Each has different trade-offs:

DKIM-style detached signatures via DNS
Publishers store a signing public key in a DNS TXT record at a well-known subdomain. A detached signature accompanies each AI Discovery File (either as a sidecar file or as a header). Consumers verify the signature against the DNS-published key. Pros: reuses DKIM operational practice; DNS already anchors host identity. Cons: ties signing to DNS operators; DNS TXT records have size limits; key rotation requires DNS coordination.
Sigstore-style transparency-log signing
Publishers obtain short-lived signing certificates from a public certificate authority (e.g. Sigstore's Fulcio), sign the files, and the signing event is recorded in a public transparency log. Consumers verify both the signature and the log inclusion proof. Pros: short-lived certificates remove long-term key custody burden; transparency log gives third parties a way to detect malicious signing. Cons: mature for software supply chain, less mature for static content; requires Sigstore-like infrastructure to remain operational.
JOSE / JWS attached signatures
Each file (or a manifest of files) is wrapped in a JWS structure. The signing key is published at a well-known URL on the publisher's host (e.g. /.well-known/ai-discovery-keys.json). Pros: JOSE is well-understood and well-implemented across languages. Cons: wraps the file content in a JOSE envelope, changing the on-the-wire format; tooling integration cost.
HTTPS-anchored manifest with content hashes
A single signed manifest file (e.g. /.well-known/ai-discovery-manifest.json) contains SHA-256 hashes of every other AI Discovery File. Consumers fetch the manifest, verify its signature, and then verify each individual file against its hash. Pros: single signature covers all files; individual files stay unmodified. Cons: manifest staleness becomes a failure mode; manifest signing key custody still has to be solved (likely via one of the three mechanisms above).

The maintainer has no commitment to any single mechanism. The roadmap entry on integrity / signing (see roadmap) records the status as On hold (2.0 territory) while these designs are evaluated.

6.3 Cross-cutting concerns

Signing is held back to a future MAJOR release because it has implications that ripple across the entire ecosystem:

  1. Key management. Whichever mechanism is chosen, publishers must generate, store, and protect a signing key. Small-business publishers (the bulk of the AI Discovery Files audience) do not have key-management practice; a signing requirement that demands one effectively prices them out of conformance.
  2. Key rotation. Long-lived keys are a liability. The chosen mechanism MUST support routine rotation without breaking historical verifications. This is non-trivial: a signature from a rotated-out key must still validate against the key's revocation status at signing time, not at verification time.
  3. Downstream caching. CDNs cache AI Discovery Files. A signature attached to the cached body must remain valid after intermediate caching. CDNs that mangle whitespace, normalise line endings, or rewrite headers will invalidate signatures unless the signing canonicalisation explicitly accounts for them.
  4. Validator behaviour. Validators (the AI Visibility Checker, third-party tools, AI consumers) must verify signatures consistently. A validator that fails open on a missing or invalid signature gives publishers no incentive to sign correctly; a validator that fails closed locks legitimate publishers out during transition. The deprecation timeline for signed-by-default would need to be at least 12 months (per the versioning policy).
  5. Revocation. A publisher who discovers their signing key has been compromised needs a path to revoke. The chosen mechanism MUST define how revocation propagates to consumers without requiring real-time lookups for every file fetch (which would re-introduce the centralised dependency the specification deliberately avoids).
  6. Conformance class membership. Today, a publisher signs nothing. After signing ships, the question is whether Essential / Recommended / Complete classes include a "signed" requirement. The maintainer's working position is that signing will introduce a separate axis (Signed vs Unsigned) rather than collapsing into the existing class hierarchy.

6.4 What publishers can do now

While integrity primitives are pending, publishers SHOULD adopt operational practices that reduce the risk of unauthorised modification:

6.5 What consumers should do now

Until file integrity ships, consumers SHOULD:

References