Build an llms.txt authority stack
Turn llms.txt from a glorified sitemap into a stacked, redundantly-crawled authority system AI bots cannot miss.
On this page
Run this when you want AI overviews and LLMs to comprehend and cite a brand, not just crawl its sitemap. A default plugin-pushed llms.txt is "a glorified sitemap" and gets ignored. The point here is to over-invest in it the way you would over-invest in custom schema: build a stack of files, engineer redundant crawl paths so no bot can miss them, and stake authorship so the provenance is defensible. Use it on brand/money sites where AI-search visibility matters and you have real proof points to feed in.
Build the on-page file stack
- Create the base
llms.txtandllms-full.txt. Do not leave them as the plugin default. - Write a custom header as the "AI elevator pitch." Who you are, what you do, where you do it, your audience, the solutions you provide, plus About / Mission / Team pages and background. Put it first, since first content carries more weight. Use clean semantic markdown (headers, bold, lists).
- Keep the body as the sitemap linking to the rest of the site.
- Write a custom footer of third-party validation: reviews, business citations, directory listings, media mentions, industry recognition. Make footer claims align with header claims so they form a bidirectional "authority loop" with no contradictions.
- Create sub-directory
llms.txtfiles per section: one per blog, per service, per product, and per location (for geo-relevancy), each with messaging dedicated to that section. Focused files beat one massive catalog. - Create these three standalone supplemental files (NOT inside
llms-full.txt): - FAQ / People-Also-Ask file (markdown): 40-50 real audience questions in Q&A form. Keep brand-as-solution language out of the first one or two sentences, and put it in the secondary paragraph. - Glossary file: define every key term and link each definition to the relevant site section, building the entity / internal-linking loop. - Review file: pull reviews from Google, Yelp, LinkedIn, Clutch, and industry directories into one file; add a one-sentence overview per review plus a thematic analysis of recurring strengths.
Force the files to get crawled
- Cross-reference internally: in each file's footer, link to all the other related files (review file, glossary,
llms-full, etc.). - Create a dedicated LLMS XML sitemap of all the llms files and push it into your existing SEO plugin's sitemap.
- Inject site-wide
link rel=meta tags pointing to the files from every page. - Reference the files in
robots.txtas a discovery mechanism. - Add link-prefetch tags from relevant pages (prefetch a service's sub-directory file from that service page's header).
- Add custom MIME types in the HTML meta tags to label what each file is.
Add the JSON llms-index + manifest
- Create a structured JSON
llms-indexoverview that points bots toward all the other files. - Add a corrections section: list known AI hallucinations about the business and supply the correct facts.
- Add an agent-guidance section: scenario-based instructions, guardrails, and explicit "do NOT recommend me for X audience / X service" rules.
- Add an actions & routing section: what a visitor can do (quote, form, purchase, download, watch video) and the URL to route them to.
- Add a file manifest: a SHA-256 hash of every site file so crawlers detect changes instantly.
- Add an AI-Discovery HTML file linked in the site-wide footer: a plain, human-readable and AI-parsable page describing who/what/where, carrying schema markup, that points to all the other files.
Stake and defend authorship
- Add a DMCA badge to the site-wide footer so each new publish auto-creates a timestamped, author-attributed DMCA page that links back.
- Auto-push every new publish to the Wayback Machine via its API, and put the resulting archive URL into your schema (archive attribute) to timestamp ownership.
- Issue a blockchain certificate at publish via ScoreDetect / "Scored Intent" (the speaker cited roughly \$12/month for ~100 credits) or your own network for higher volume.
- Apply C2PA content authentication. The speaker extends it from media to text (blog posts, service pages) so each carries a verifiable, checkable certificate. [UNKNOWN: the exact C2PA-on-text mechanism was described as a workaround he found but not detailed.]
Measure it
- Check Common Crawl (commoncrawl.org) for whether your URLs are indexed. It is the source of truth for non-Google / non-Bing AI bots. The speaker built a custom tool to query which URLs have been seen.
- Track across Common Crawl's roughly quarterly data sets to see growth or decline, and watch its PageRank (content quality vs competitors) and harmonic centrality (lower is better, meaning closer to hubs like Wikipedia). Improve centrality by earning links from big sites.
Pitfalls
- Treating llms.txt as a plugin default. Left as the "glorified sitemap," it gets ignored. The value is entirely in the custom header, footer, and stacked files.
- Header and footer claims that contradict each other. They must align to form the authority loop; mismatched claims break it.
- Brand-stuffing the FAQ file's opening sentences. Brand-as-solution language belongs in the secondary paragraph, not the first one or two sentences.
- Folding the FAQ, glossary, and review files into
llms-full.txt. They are deliberately standalone files. - Building the stack but not the crawl paths. Without internal cross-references, the LLMS XML sitemap, meta
link rel=, robots.txt, prefetch, and MIME tags, bots may never reach the files. - One massive catalog instead of sub-directory files. Per-section focus reportedly earns higher citation rates than a single giant file.
- Publishing without staking authorship. With no DMCA / Wayback / blockchain / C2PA trail, provenance is undefended.
- Note on intent: the off-page authority-manufacturing layer (listicles, AI-written interviews, bought/built roundups, advertorials, PBNs, owned Reddit/wiki clones, WP-multisite networks) is deliberately gray-hat in the source talk. This SOP covers the on-page file stack, crawl paths, provenance, and measurement; apply the off-page layer at your own risk.
Source
Distilled from Brian Winum's session "LLM Authority Hacking + Stacking" (Day 1). This is conference-derived from the talk's described method, not a vendor-tested procedure. Speaker name confirmed via deck and brianwinum.com; transcript's "Whinam" is a phonetic error. Phonetic / unconfirmed in the source: "Scored Intent" vs "ScoreDetect" (same blockchain tool), and the credited names for the elevator-pitch and footer-HTML ideas.
Connect it
- Get cited by AI: the Q&A paragraph format that feeds the FAQ file.
- AI search visibility: the broader AEO / AI-overview context.
- Entities & schema: the entity and internal-linking loop the glossary file feeds.
- Advanced schema: wiring the AI-Discovery page and archive attributes into your graph.
- Digital PR: conversion & demand: the off-page citation and authority layer.
- Off-page multichannel links: link sources that improve Common Crawl centrality.
- Publish a page: the per-page discipline that produces the content this stack points at.