Building @dualnova/llms-txt: a zero-dependency validator for the AI-crawler standard
A deep dive on @dualnova/llms-txt — the TypeScript library + CLI to parse, validate and generate /llms.txt files, the emerging standard for guiding AI search crawlers.
Building @dualnova/llms-txt: a zero-dependency validator for the AI-crawler standard
TL;DR: /llms.txt is to AI search what robots.txt was to Google in the 90s — a small, well-known file at the root of a domain that gives crawlers a curated map of what matters. We published @dualnova/llms-txt, a TypeScript library and CLI to parse, validate and generate these files. Zero runtime dependencies, ESM-only, MIT. It's the same code that generates the file you can see at dualnova.org/llms.txt today. This post is the long version of why we built it, what's in it, and the honest reality of how much it actually moves the needle right now.
What /llms.txt is, in 30 seconds
/llms.txt is a convention proposed at llmstxt.org for publishing a Markdown summary of a site at the well-known path /llms.txt. The format is intentionally tiny:
# Site title
> One-sentence description of what the site is.
## Section heading
- [Page title](url): optional description
- [Another page](url)
## Another section
- [More pages](url)
That's the entire spec. No JSON Schema, no XML, no required fields beyond the H1 title. The goal is to be the smallest possible artifact an LLM can ingest and immediately understand what the site is about and which pages to read for more.
The motivation: AI crawlers don't have the patience that Google's renderer does. They want signal, not noise. A 50-line /llms.txt is more useful to a Perplexity bot in three seconds than thirty cached HTML pages full of nav, footer and modal overlays.
Why publish one if no AI engine officially uses it yet
This is the part most posts skip. Let's be honest: as of May 2026, primary sources from Google (John Mueller in Sept 2024, Gary Illyes in Oct 2024), the SE Ranking 300,000-domain study, and the OtterlyAI server-log audit all report that no major AI search engine currently uses /llms.txt as a citation-ranking signal. Crawlers don't even consistently request it.
So why bother?
- The cost is near zero. A 100-line static file you write once and regenerate from a build script.
- The standard may be adopted. Anthropic, OpenAI, and Perplexity have all separately acknowledged the convention exists. If any of them starts using it, you want to already have one.
- It forces clarity. Writing a one-page summary of your business in 200 words is a useful exercise regardless of who reads it. Several of our team's clearest copywriting decisions started inside their own
/llms.txt. - Other tools are starting to ingest it. The Cloudflare "Agent Ready" audit reads it. Some Brand Radar tools index it. Independent agent harnesses use it as a default discovery path.
What /llms.txt does not do: bypass the work of being citable. The big movers for AI citations are still brand mentions on YouTube, Reddit, Wikipedia, and GitHub — Ahrefs' December 2025 study of 75,000 brands found these correlate three times more strongly with AI citations than backlinks do, and /llms.txt correlated with nothing measurable.
If a vendor sells you on /llms.txt as the lever for AI visibility, walk away. It's hygiene, not strategy.
What's in the library
The npm package exports three functions and ships a CLI.
parseLlmsTxt(source: string): ParsedLlmsTxt
A tolerant Markdown parser that returns typed sections and links. It does not throw on missing fields — you get back whatever was present.
import { parseLlmsTxt } from '@dualnova/llms-txt';
const { title, description, sections } = parseLlmsTxt(source);
for (const section of sections) {
console.log(`## ${section.heading} — ${section.links.length} links`);
}
validateLlmsTxt(source: string): ValidationResult
Surfaces issues with severity levels (error, warning, info). Errors fail validation; warnings and info don't.
import { validateLlmsTxt } from '@dualnova/llms-txt';
const { valid, issues } = validateLlmsTxt(source);
if (!valid) {
for (const issue of issues.filter((i) => i.severity === 'error')) {
console.error(`✗ ${issue.message}${issue.at ? ` (${issue.at})` : ''}`);
}
process.exit(1);
}
Things it catches:
- Missing H1 title → error
- Missing
> descriptionblockquote → warning - Description shorter than 40 chars → info
- Empty sections → warning
- Invalid URLs (not
http(s)://...and not site-relative/...) → error - No H2 sections at all → warning
The full check runs in microseconds — there's no markdown-to-AST step, just careful line-by-line regex.
buildLlmsTxt(options: BuildOptions): string
Generate a well-formed /llms.txt from a typed object. The function exists so you can wire it into your build pipeline and stop hand-editing the file.
import { buildLlmsTxt } from '@dualnova/llms-txt';
import { writeFileSync } from 'node:fs';
const md = buildLlmsTxt({
title: 'Acme',
description: 'Acme builds autonomous warehouse robots.',
sections: [
{
heading: 'Products',
links: [
{ title: 'ARO-100', url: 'https://acme.example/products/aro-100', description: 'mid-size AMR' },
{ title: 'ARO-200', url: 'https://acme.example/products/aro-200' },
],
},
],
});
writeFileSync('./public/llms.txt', md);
The output is identical to what parseLlmsTxt would consume — we round-trip the format in our test suite to guarantee compatibility.
CLI
# Validate the file in the current directory
llms-txt validate
# Validate a specific local file
llms-txt validate ./public/llms.txt
# Fetch and validate a remote one (handy for CI checks against production)
llms-txt validate --url https://dualnova.org/llms.txt
Sample output:
Validating /home/me/site/public/llms.txt
Title: DualNova
Description: DualNova is a blockchain and AI software development company …
Sections: 5
Links: 18
WARN Section "Recent client work" has no links and no content. (sections[3])
✓ 0 error(s), 1 warning(s)
Exit codes: 0 clean, 1 validation errors, 2 couldn't read the file. CI-ready.
Using it in a Next.js build
The pattern we use on dualnova.org. Add a route at app/llms.txt/route.ts:
import { buildLlmsTxt } from '@dualnova/llms-txt';
export function GET() {
const body = buildLlmsTxt({
title: 'Acme',
description: 'Acme builds autonomous warehouse robots used at 40+ distribution centers.',
sections: [
{
heading: 'What Acme does',
freeform:
'Autonomous mobile robots (AMRs) for warehouse pick-pack-ship. Fleet orchestration software. Open-source SDK.',
},
{
heading: 'Key pages',
links: [
{ title: 'Home', url: 'https://acme.example/' },
{ title: 'Products', url: 'https://acme.example/products' },
{ title: 'Customers', url: 'https://acme.example/customers' },
],
},
],
});
return new Response(body, {
headers: { 'content-type': 'text/plain; charset=utf-8' },
});
}
export const dynamic = 'force-static';
Two things this gives you over hand-editing the static file:
- Single source of truth. If you already have your pages listed in a sitemap or MDX frontmatter, pipe that into the call and stop drifting.
- CI breaks if you publish a malformed file. The validator runs at build time.
GitHub Action: fail the build on broken /llms.txt
# .github/workflows/llms-txt.yml
name: Validate llms.txt
on: [push, pull_request]
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: '20' }
- run: npx -y @dualnova/llms-txt validate ./public/llms.txt
Three lines of config and your /llms.txt is now part of your CI gate, same as your TypeScript compiler.
How it's built (the boring details)
Zero runtime dependencies. The whole library is four files under src/:
src/
├── index.ts ← re-exports the public API
├── parser.ts ← regex-driven Markdown parser
├── validator.ts ← rules + severity classification
├── builder.ts ← typed object → Markdown serializer
└── cli.ts ← argv parser + colored output
Tested with Node's built-in test runner (no jest, no vitest), which keeps install size tiny. Ships ESM-only since Node 20+ has been stable on import.meta and top-level await for over a year.
The parser deliberately accepts malformed input and emits structured warnings instead of throwing — the validator is the only place that decides whether something is fatal. This separation matters when you're using the library inside a CMS that allows partial drafts.
Where this fits in the GEO stack
/llms.txt is the second-cheapest thing you can do for Generative Engine Optimization. The cheapest is unblocking AI crawlers in robots.txt (and stopping Cloudflare from over-blocking them). After that, in rough order of return on effort:
- Unblock AI crawlers in
robots.txtand Cloudflare AI Crawl Control. - Publish
/llms.txtwith this library. - Add a multi-entity
@graphSchema.org block with Organization + WebSite + ProfessionalService. - Publish 134–167-word citable passages on each key page (the "X is [definition]" pattern).
- Build brand mentions on YouTube, Reddit, GitHub, Wikipedia — by far the biggest lever.
The library handles step 2. Steps 1, 3, and 4 are code changes in your repo. Step 5 is months of patient brand work.
Install and try
npm install @dualnova/llms-txt
# or run the CLI without installing
npx @dualnova/llms-txt validate --url https://dualnova.org/llms.txt
If you find a bug, open an issue. If you use it in your stack, drop a note in the discussions tab — we're collecting reference deployments for the README.
- GitHub: github.com/DualNova/llms-txt
- npm: @dualnova/llms-txt
- Sister libraries in the same release: @dualnova/agent-skills and tokenization-templates
Built by DualNova — blockchain and AI software development for LATAM and the US. Bilingual team in Caracas, Bogotá, and Miami. If you're shipping a blockchain or AI-agent product and want a 30-minute technical scoping call, book one here.