Internationalisation & Accessibility
Multi-language publication and accessibility considerations for AI Discovery Files.
The guidance on this page is normative where marked. Multi-variant file naming is currently experimental; see section 3. See specification conventions for status definitions.
Last updated:
The AI Discovery Files specification is, by design, language-neutral: the file formats accept any UTF-8 content and the BCP 47 language declaration field already lets publishers tag the language of a file. This page consolidates the guidance for non-English publication, multi-language sites, right-to-left languages, locale-tagged identity, and the accessibility responsibilities of the human-readable llms.html surface.
1. Single-language declaration (normative recap)
Every text-based AI Discovery File MAY include a single-language declaration. The mechanism was added in Phase 2 (specification release 1.5.0) and is documented in detail on the specification conventions page.
- Text files (
llms.txt,ai.txt,brand.txt,faq-ai.txt,developer-ai.txt): aLang: <BCP-47-tag>header on its own line, near the top of the file - JSON files (
ai.json,identity.json): a"language": "<BCP-47-tag>"property at the top level llms.html: standard HTML<html lang="<BCP-47-tag>">attribute
Publishers writing primarily in a single non-English language SHOULD declare that language. Consumers without a declared language fall back to detection heuristics, which are unreliable for short files.
2. Locale-tagged fields inside identity.json
For publishers operating across multiple languages with distinct trading names or descriptions per locale, the JSON files (especially identity.json) MAY include locale-tagged variants alongside the canonical English (or other primary-language) value. The convention follows Schema.org's @language pattern adapted for plain JSON:
{
"name": "Acme Corporation",
"nameLocalized": {
"en-GB": "Acme Corporation",
"fr-FR": "Acme Société Anonyme",
"ja-JP": "アクメ株式会社"
},
"description": "A worked example for documentation purposes.",
"descriptionLocalized": {
"en-GB": "A worked example for documentation purposes.",
"fr-FR": "Un exemple pratique à des fins de documentation."
}
}
Rules:
- The canonical fields (
name,description, etc.) MUST remain populated. Localised variants are additive. - Each key of the
*Localizedobject MUST be a valid BCP 47 language tag. Validators MAY reject unknown tags. - If the publisher has a primary language declared in
"language", the value at that tag in a*Localizedobject MUST match the canonical value. - Locale-tagged variants are not the same as
alternateName.alternateNameis for trading names regardless of language;nameLocalizedis for the same name in different writing systems or translations.
Consumers that surface the publisher's name in citations or summaries SHOULD prefer the variant matching the user's locale, falling back to the canonical value.
3. Multi-variant file naming (experimental)
For publishers who genuinely produce different file content per language (not just translations of one identity, but different operational footprints in different regions), the specification proposes a multi-variant naming convention as an experimental mechanism. This is not yet in any conformance class.
The convention:
/llms.txt: the default (publisher-canonical) variant/llms-fr.txt: the French-language variant/llms-ja.txt: the Japanese-language variant- ... and so on for any BCP 47 primary language subtag
The publisher MUST declare each variant's language via the Lang: header. The default variant SHOULD declare its language too, even though it could be inferred from convention.
The default variant SHOULD be the publisher's primary language. A site whose primary audience is German-speaking SHOULD make /llms.txt the German variant and offer English-, French- (etc.) -variants as /llms-en.txt, /llms-fr.txt.
Consumers that support multi-variant fetching SHOULD use HTTP content negotiation (Accept-Language) as a hint to choose a variant, but MUST fall back to /llms.txt if a specific variant is not present. Path-based fetching (a consumer choosing to request /llms-fr.txt directly) is also valid.
This mechanism is experimental because:
- It interacts with HTTP content negotiation in ways that aren't fully tested across CDNs
- The interaction with
identity.jsonlocale-tagged fields needs more deployment evidence to know whether both mechanisms are sustainable - Validators do not yet have a conformance class that includes multi-variant publication
Publishers MAY publish multi-variant files today; the maintainer recommends starting with the default variant plus nameLocalized / descriptionLocalized in identity.json, and adding language variants only when there is clear demand from non-default-language consumers.
4. Right-to-left languages
Arabic, Hebrew, Persian, Urdu, and several other languages are written right-to-left. The AI Discovery Files specification handles RTL languages without special syntax, but a few practical considerations apply:
- Text files: the file content is RTL where the language is RTL. No directional control characters are required; consumers MUST handle bidirectional text per the Unicode Bidirectional Algorithm.
- JSON files: property names remain ASCII (e.g.
name, not a translated property name). Property values MAY be RTL. JSON serialisers and parsers handle this transparently. llms.html: the publisher MUST setdir="rtl"on the<html>element (or the relevant block element) when the document content is in an RTL language. Validators MAY warn when an RTLlangattribute is paired with nodirdeclaration.- Mixed content: a file containing both LTR and RTL content (e.g. an English description with Arabic identity fields) MAY use Unicode Bidirectional Algorithm marks (
U+200ELRM andU+200FRLM) at boundary positions. This is rarely needed in machine-readable contexts.
5. Non-Latin scripts
UTF-8 encoding is mandated for every AI Discovery File. This makes non-Latin scripts (CJK ideographs, Devanagari, Cyrillic, Greek, Arabic, etc.) work without special syntax. Publishers MAY freely use:
- Native-script business names:
"name": "株式会社サンプル" - Native-script descriptions and contexts
- Native-script headings inside
llms.txt
Publishers SHOULD additionally provide a Latin-script transliteration via alternateName (in identity.json) or via Pronunciation: entries (in brand.txt). The transliteration helps AI consumers cite the publisher in contexts where the native script is not displayable. It does not replace the native-script canonical name.
6. Legal, trading, and locale-specific names
A publisher may have several names: a registered legal name, one or more trading names, and locale-specific variants of any of these. The specification supports this:
- Legal name
- The publisher's registered company name. Place in
identity.jsonaslegalName(Schema.org-aligned). Example: "Acme Holdings Limited". - Canonical name
- The publisher's preferred public name. Place in
identity.jsonasname. Example: "Acme Corporation". This is the name consumers SHOULD use in citations. - Trading names
- Alternative names the publisher trades under. Place in
identity.jsonas entries in thealternateNamearray. Example: "Acme", "Acme Corp". - Locale-specific variants
- Translations or transliterations of the canonical name. Place in
identity.jsonasnameLocalized(see section 2). Example:"ja-JP": "アクメ株式会社". - Pronunciation
- Audio pronunciation guidance for any of the above. Place in
brand.txtusing thePronunciation:convention. Especially useful for non-English names in English-language conversations.
The identity.json specification and the brand.txt specification are the normative source of truth for each of these fields.
7. Regional contact details
A publisher with offices in multiple countries SHOULD publish each office as an entry in the contactPoint array of identity.json, with the areaServed property indicating the region. Each contact point MAY have a language preference declared via availableLanguage.
Consumers serving users in a particular region SHOULD prefer contact points whose areaServed matches the region. Consumers serving users in a particular language SHOULD prefer contact points whose availableLanguage matches the language. Both signals are hints; falling back to the first listed contact point is acceptable.
8. Accessibility of llms.html
llms.html is the human-readable presentation of llms.txt. It is intended to be read by humans (and assistive technology) as well as by AI consumers. Accessibility considerations:
- Semantic HTML. Use
<h1>for the publisher name,<h2>for top-level sections, native<ul>/<ol>for lists. Validators MAY warn on heading-level skips. - Language attribute. The
<html>element MUST have alangattribute with the document's BCP 47 language tag. - Direction attribute. The
<html>element MUST have adirattribute when the document language is RTL. - Link text. Links MUST have descriptive accessible names. Validators MAY warn on link text consisting solely of "click here", "read more", or the destination URL.
- Schema.org JSON-LD.
llms.htmlMAY (and SHOULD) include the sameOrganizationJSON-LD that the publisher uses elsewhere on the site. Consistency betweenidentity.json,llms.html's JSON-LD, and the site's homepage JSON-LD reduces ambiguity for AI consumers. - No JavaScript dependency. The content of
llms.htmlMUST be present in the initial HTML response. AI consumers do not execute JavaScript; assistive technology may not execute it reliably. - WCAG considerations. The page SHOULD meet WCAG 2.1 AA contrast, focus, and keyboard-navigation criteria. This is good practice for any human-facing page; the specification does not mandate it but strongly recommends it.
References
- Specification conventions: language declaration: the normative source for the single-language declaration mechanism.
- BCP 47: language tag format used throughout this specification.
- identity.json specification: source of
name,alternateName,legalName,contactPoint,nameLocalized(this page). - brand.txt specification: source of
Pronunciation:, alternate-name guidance. - llms.html specification: the surface accessibility guidance applies.
- WCAG 2.1: accessibility benchmark referenced in section 8.
- Roadmap entry: i18n & accessibility: planning context for this page.