Internationalisation & Accessibility

Multi-language publication and accessibility considerations for AI Discovery Files.

Status Stable

The guidance on this page is normative where marked. Multi-variant file naming is currently experimental; see section 3. See specification conventions for status definitions.

Last updated:

Purpose

The AI Discovery Files specification is, by design, language-neutral: the file formats accept any UTF-8 content and the BCP 47 language declaration field already lets publishers tag the language of a file. This page consolidates the guidance for non-English publication, multi-language sites, right-to-left languages, locale-tagged identity, and the accessibility responsibilities of the human-readable llms.html surface.

1. Single-language declaration (normative recap)

Every text-based AI Discovery File MAY include a single-language declaration. The mechanism was added in Phase 2 (specification release 1.5.0) and is documented in detail on the specification conventions page.

Publishers writing primarily in a single non-English language SHOULD declare that language. Consumers without a declared language fall back to detection heuristics, which are unreliable for short files.

2. Locale-tagged fields inside identity.json

For publishers operating across multiple languages with distinct trading names or descriptions per locale, the JSON files (especially identity.json) MAY include locale-tagged variants alongside the canonical English (or other primary-language) value. The convention follows Schema.org's @language pattern adapted for plain JSON:

{
  "name": "Acme Corporation",
  "nameLocalized": {
    "en-GB": "Acme Corporation",
    "fr-FR": "Acme Société Anonyme",
    "ja-JP": "アクメ株式会社"
  },
  "description": "A worked example for documentation purposes.",
  "descriptionLocalized": {
    "en-GB": "A worked example for documentation purposes.",
    "fr-FR": "Un exemple pratique à des fins de documentation."
  }
}

Rules:

  1. The canonical fields (name, description, etc.) MUST remain populated. Localised variants are additive.
  2. Each key of the *Localized object MUST be a valid BCP 47 language tag. Validators MAY reject unknown tags.
  3. If the publisher has a primary language declared in "language", the value at that tag in a *Localized object MUST match the canonical value.
  4. Locale-tagged variants are not the same as alternateName. alternateName is for trading names regardless of language; nameLocalized is for the same name in different writing systems or translations.

Consumers that surface the publisher's name in citations or summaries SHOULD prefer the variant matching the user's locale, falling back to the canonical value.

3. Multi-variant file naming (experimental)

For publishers who genuinely produce different file content per language (not just translations of one identity, but different operational footprints in different regions), the specification proposes a multi-variant naming convention as an experimental mechanism. This is not yet in any conformance class.

The convention:

The publisher MUST declare each variant's language via the Lang: header. The default variant SHOULD declare its language too, even though it could be inferred from convention.

The default variant SHOULD be the publisher's primary language. A site whose primary audience is German-speaking SHOULD make /llms.txt the German variant and offer English-, French- (etc.) -variants as /llms-en.txt, /llms-fr.txt.

Consumers that support multi-variant fetching SHOULD use HTTP content negotiation (Accept-Language) as a hint to choose a variant, but MUST fall back to /llms.txt if a specific variant is not present. Path-based fetching (a consumer choosing to request /llms-fr.txt directly) is also valid.

This mechanism is experimental because:

Publishers MAY publish multi-variant files today; the maintainer recommends starting with the default variant plus nameLocalized / descriptionLocalized in identity.json, and adding language variants only when there is clear demand from non-default-language consumers.

4. Right-to-left languages

Arabic, Hebrew, Persian, Urdu, and several other languages are written right-to-left. The AI Discovery Files specification handles RTL languages without special syntax, but a few practical considerations apply:

5. Non-Latin scripts

UTF-8 encoding is mandated for every AI Discovery File. This makes non-Latin scripts (CJK ideographs, Devanagari, Cyrillic, Greek, Arabic, etc.) work without special syntax. Publishers MAY freely use:

Publishers SHOULD additionally provide a Latin-script transliteration via alternateName (in identity.json) or via Pronunciation: entries (in brand.txt). The transliteration helps AI consumers cite the publisher in contexts where the native script is not displayable. It does not replace the native-script canonical name.

A publisher may have several names: a registered legal name, one or more trading names, and locale-specific variants of any of these. The specification supports this:

Legal name
The publisher's registered company name. Place in identity.json as legalName (Schema.org-aligned). Example: "Acme Holdings Limited".
Canonical name
The publisher's preferred public name. Place in identity.json as name. Example: "Acme Corporation". This is the name consumers SHOULD use in citations.
Trading names
Alternative names the publisher trades under. Place in identity.json as entries in the alternateName array. Example: "Acme", "Acme Corp".
Locale-specific variants
Translations or transliterations of the canonical name. Place in identity.json as nameLocalized (see section 2). Example: "ja-JP": "アクメ株式会社".
Pronunciation
Audio pronunciation guidance for any of the above. Place in brand.txt using the Pronunciation: convention. Especially useful for non-English names in English-language conversations.

The identity.json specification and the brand.txt specification are the normative source of truth for each of these fields.

7. Regional contact details

A publisher with offices in multiple countries SHOULD publish each office as an entry in the contactPoint array of identity.json, with the areaServed property indicating the region. Each contact point MAY have a language preference declared via availableLanguage.

Consumers serving users in a particular region SHOULD prefer contact points whose areaServed matches the region. Consumers serving users in a particular language SHOULD prefer contact points whose availableLanguage matches the language. Both signals are hints; falling back to the first listed contact point is acceptable.

8. Accessibility of llms.html

llms.html is the human-readable presentation of llms.txt. It is intended to be read by humans (and assistive technology) as well as by AI consumers. Accessibility considerations:

References