Internationalisation & Accessibility

Multi-language publication and accessibility considerations for AI Discovery Files.

Status Stable

The guidance on this page is normative where marked. Multi-variant file naming is currently experimental; see section 3. See specification conventions for status definitions.

Last updated: 11 May 2026

Purpose

The AI Discovery Files specification is, by design, language-neutral: the file formats accept any UTF-8 content and the BCP 47 language declaration field already lets publishers tag the language of a file. This page consolidates the guidance for non-English publication, multi-language sites, right-to-left languages, locale-tagged identity, and the accessibility responsibilities of the human-readable llms.html surface.

1. Single-language declaration (normative recap)

Every text-based AI Discovery File MAY include a single-language declaration. The mechanism was added in Phase 2 (specification release 1.5.0) and is documented in detail on the specification conventions page.

Text files (llms.txt, ai.txt, brand.txt, faq-ai.txt, developer-ai.txt): a Lang: <BCP-47-tag> header on its own line, near the top of the file
JSON files (ai.json, identity.json): a "language": "<BCP-47-tag>" property at the top level
llms.html: standard HTML <html lang="<BCP-47-tag>"> attribute

Publishers writing primarily in a single non-English language SHOULD declare that language. Consumers without a declared language fall back to detection heuristics, which are unreliable for short files.

2. Locale-tagged fields inside identity.json

For publishers operating across multiple languages with distinct trading names or descriptions per locale, the JSON files (especially identity.json) MAY include locale-tagged variants alongside the canonical English (or other primary-language) value. The convention follows Schema.org's @language pattern adapted for plain JSON:

{
  "name": "Acme Corporation",
  "nameLocalized": {
    "en-GB": "Acme Corporation",
    "fr-FR": "Acme Société Anonyme",
    "ja-JP": "アクメ株式会社"
  },
  "description": "A worked example for documentation purposes.",
  "descriptionLocalized": {
    "en-GB": "A worked example for documentation purposes.",
    "fr-FR": "Un exemple pratique à des fins de documentation."
  }
}

Rules:

The canonical fields (name, description, etc.) MUST remain populated. Localised variants are additive.
Each key of the *Localized object MUST be a valid BCP 47 language tag. Validators MAY reject unknown tags.
If the publisher has a primary language declared in "language", the value at that tag in a *Localized object MUST match the canonical value.
Locale-tagged variants are not the same as alternateName. alternateName is for trading names regardless of language; nameLocalized is for the same name in different writing systems or translations.

Consumers that surface the publisher's name in citations or summaries SHOULD prefer the variant matching the user's locale, falling back to the canonical value.

3. Multi-variant file naming (experimental)

For publishers who genuinely produce different file content per language (not just translations of one identity, but different operational footprints in different regions), the specification proposes a multi-variant naming convention as an experimental mechanism. This is not yet in any conformance class.

The convention:

/llms.txt: the default (publisher-canonical) variant
/llms-fr.txt: the French-language variant
/llms-ja.txt: the Japanese-language variant
... and so on for any BCP 47 primary language subtag

The publisher MUST declare each variant's language via the Lang: header. The default variant SHOULD declare its language too, even though it could be inferred from convention.

The default variant SHOULD be the publisher's primary language. A site whose primary audience is German-speaking SHOULD make /llms.txt the German variant and offer English-, French- (etc.) -variants as /llms-en.txt, /llms-fr.txt.

Consumers that support multi-variant fetching SHOULD use HTTP content negotiation (Accept-Language) as a hint to choose a variant, but MUST fall back to /llms.txt if a specific variant is not present. Path-based fetching (a consumer choosing to request /llms-fr.txt directly) is also valid.

This mechanism is experimental because:

It interacts with HTTP content negotiation in ways that aren't fully tested across CDNs
The interaction with identity.json locale-tagged fields needs more deployment evidence to know whether both mechanisms are sustainable
Validators do not yet have a conformance class that includes multi-variant publication

Publishers MAY publish multi-variant files today; the maintainer recommends starting with the default variant plus nameLocalized / descriptionLocalized in identity.json, and adding language variants only when there is clear demand from non-default-language consumers.

4. Right-to-left languages

Arabic, Hebrew, Persian, Urdu, and several other languages are written right-to-left. The AI Discovery Files specification handles RTL languages without special syntax, but a few practical considerations apply:

Text files: the file content is RTL where the language is RTL. No directional control characters are required; consumers MUST handle bidirectional text per the Unicode Bidirectional Algorithm.
JSON files: property names remain ASCII (e.g. name, not a translated property name). Property values MAY be RTL. JSON serialisers and parsers handle this transparently.
llms.html: the publisher MUST set dir="rtl" on the <html> element (or the relevant block element) when the document content is in an RTL language. Validators MAY warn when an RTL lang attribute is paired with no dir declaration.
Mixed content: a file containing both LTR and RTL content (e.g. an English description with Arabic identity fields) MAY use Unicode Bidirectional Algorithm marks (U+200E LRM and U+200F RLM) at boundary positions. This is rarely needed in machine-readable contexts.

5. Non-Latin scripts

UTF-8 encoding is mandated for every AI Discovery File. This makes non-Latin scripts (CJK ideographs, Devanagari, Cyrillic, Greek, Arabic, etc.) work without special syntax. Publishers MAY freely use:

Native-script business names: "name": "株式会社サンプル"
Native-script descriptions and contexts
Native-script headings inside llms.txt

Publishers SHOULD additionally provide a Latin-script transliteration via alternateName (in identity.json) or via Pronunciation: entries (in brand.txt). The transliteration helps AI consumers cite the publisher in contexts where the native script is not displayable. It does not replace the native-script canonical name.

6. Legal, trading, and locale-specific names

A publisher may have several names: a registered legal name, one or more trading names, and locale-specific variants of any of these. The specification supports this:

Legal name: The publisher's registered company name. Place in identity.json as legalName (Schema.org-aligned). Example: "Acme Holdings Limited".
Canonical name: The publisher's preferred public name. Place in identity.json as name. Example: "Acme Corporation". This is the name consumers SHOULD use in citations.
Trading names: Alternative names the publisher trades under. Place in identity.json as entries in the alternateName array. Example: "Acme", "Acme Corp".
Locale-specific variants: Translations or transliterations of the canonical name. Place in identity.json as nameLocalized (see section 2). Example: "ja-JP": "アクメ株式会社".
Pronunciation: Audio pronunciation guidance for any of the above. Place in brand.txt using the Pronunciation: convention. Especially useful for non-English names in English-language conversations.

The identity.json specification and the brand.txt specification are the normative source of truth for each of these fields.

7. Regional contact details

A publisher with offices in multiple countries SHOULD publish each office as an entry in the contactPoint array of identity.json, with the areaServed property indicating the region. Each contact point MAY have a language preference declared via availableLanguage.

Consumers serving users in a particular region SHOULD prefer contact points whose areaServed matches the region. Consumers serving users in a particular language SHOULD prefer contact points whose availableLanguage matches the language. Both signals are hints; falling back to the first listed contact point is acceptable.

8. Accessibility of llms.html

llms.html is the human-readable presentation of llms.txt. It is intended to be read by humans (and assistive technology) as well as by AI consumers. Accessibility considerations:

Semantic HTML. Use <h1> for the publisher name, <h2> for top-level sections, native <ul> / <ol> for lists. Validators MAY warn on heading-level skips.
Language attribute. The <html> element MUST have a lang attribute with the document's BCP 47 language tag.
Direction attribute. The <html> element MUST have a dir attribute when the document language is RTL.
Link text. Links MUST have descriptive accessible names. Validators MAY warn on link text consisting solely of "click here", "read more", or the destination URL.
Schema.org JSON-LD. llms.html MAY (and SHOULD) include the same Organization JSON-LD that the publisher uses elsewhere on the site. Consistency between identity.json, llms.html's JSON-LD, and the site's homepage JSON-LD reduces ambiguity for AI consumers.
No JavaScript dependency. The content of llms.html MUST be present in the initial HTML response. AI consumers do not execute JavaScript; assistive technology may not execute it reliably.
WCAG considerations. The page SHOULD meet WCAG 2.1 AA contrast, focus, and keyboard-navigation criteria. This is good practice for any human-facing page; the specification does not mandate it but strongly recommends it.

References

Specification conventions: language declaration: the normative source for the single-language declaration mechanism.
BCP 47: language tag format used throughout this specification.
identity.json specification: source of name, alternateName, legalName, contactPoint, nameLocalized (this page).
brand.txt specification: source of Pronunciation:, alternate-name guidance.
llms.html specification: the surface accessibility guidance applies.
WCAG 2.1: accessibility benchmark referenced in section 8.
Roadmap entry: i18n & accessibility: planning context for this page.