AYAN
← All resources
Engineering

What llms.txt actually does, and what most teams get wrong.

Half the brands we audit ship an llms.txt. About one in ten ship one that does what they think it does. What the file actually does, the three failure modes, and what a correct one looks like.

AYAN Platform EngineeringMay 19, 20263 min read

Half the brands we audit ship an llms.txt at /llms.txt. About one in ten ship one that does what they think it does. This post is for the team that has just been asked by marketing to “add an llms.txt” and wants to do it right the first time.

What it actually is

llms.txt is a plain text file served from the root of your domain, modeled loosely on robots.txt. It is not yet a formal standard: there is an active proposal from Anthropic, parallel discussion in the broader retrieval-augmented community, and a handful of model providers who already read it. Today, treat it as a polite directive rather than an enforced contract.

The file describes which pages of your site you would like AI assistants to prioritise when summarising your brand or your product. It does not block crawling. That is robots.txt’s job. It does not control whether a model trains on your content. That is a separate set of headers and request blocks. What it does is help the model understand, at retrieval time, which pages of yours are canonical.

What teams typically get wrong

Three failure modes show up over and over.

First, treating it like a sitemap. We see llms.txt files that list every page on the site. This defeats the purpose. The point of the file is to signal canonicality: which pages, out of the hundreds you have, are the ones the model should treat as authoritative. A list of 2,000 URLs is not signal, it is noise.

Second, treating it like a robots.txt. We see llms.txt files that try to block specific user agents. The format does not support that cleanly, and the agents that respect llms.txt at all are the ones that already respect your other directives. Use robots.txt for blocking. Use llms.txt for prioritisation.

Third, writing it once and forgetting it. The pages a model should treat as canonical change every quarter: campaigns ship, products launch, founder narratives evolve. An llms.txt that was correct in March is misleading by September.

What a correct llms.txt looks like for a brand

Short. Curated. Maintained. For a brand operating across multiple markets, a useful llms.txt typically contains: the brand’s primary About or Story page, the founder narrative (if it carries differentiation), the canonical product index per category, the most recent sustainability or trust report, the canonical contact and press pages.

That is between ten and forty URLs, not two thousand. Each entry should carry a short description so the model knows why this page is canonical. Group entries by section. Keep it under 200 lines.

How AYAN handles llms.txt

For every brand we work with, the llms.txt is generated from the Brand Base, the structured source of truth that captures your brand’s canonical pages, claims and proof. Every week, we regenerate it, diff against the live version, and open a pull request against your repository for review and merge. The version that ships always reflects the current state of your brand, not last quarter’s.

We pair the llms.txt with three other artifacts: validated JSON-LD schemas on the canonical pages (so the model has structured data to parse, not just prose), an MCP feed for the model surfaces that support it (so the assistant can query your brand directly), and a small set of retailer PDP rewrites that align off-domain copy with the brand truth (because retailer copy is the second-largest source of model interpretation drift after old press cycles).

The validation harness

Every change we ship to llms.txt, schemas or MCP feeds goes through a pre-merge validation harness. The harness does three things: confirms the file parses under the current spec, confirms the URLs in it resolve and serve canonical content, and runs a small set of model probes to verify that the change actually shifts retrieval shape in the expected direction. We have caught more regressions in that harness than in any other part of the platform.

Bottom line

llms.txt is one of the cheapest interventions in your AI brand surface and one of the easiest to get wrong. Keep it short, keep it canonical, regenerate it on a cadence, and pair it with structured data. If you want the AYAN reference implementation, our own llms.txt lives at ayanlabs.ai/llms.txt. We eat our own dog food.

Your data, your brand, your control.

Your data stays in Europe. Privacy by design, GDPR aligned. Your brand content is never used to train other AI systems. Your brand base, briefs, scores and board reads belong to you.

Request a demo

Keep reading

What llms.txt actually does, and what most teams get wrong | AYAN