Standards Glossary
The AI readability landscape is new and fragmented — multiple proposals, conventions, and specifications are evolving simultaneously.
llms.txt
What it is: A proposed convention for a plain-text file at a domain's root, giving AI systems a concise, structured summary of a website's identity and offerings.
Status: Community-driven convention, not a ratified standard. No formal governing body; adoption is voluntary and growing organically.
Learn more: See llms.txt for implementation details.
Content Signals (AI Preferences)
What it is: A proposed extension to robots.txt allowing site owners to declare granular preferences for AI training, search inclusion, and AI input usage, separately from basic crawl access.
Status: IETF Internet-Draft (draft-romm-aipref-contentsignals), submitted by engineers at Cloudflare. As of this writing, the draft has expired and has not been adopted by a formal IETF working group. Not binding on any crawler.
Source documents:
Learn more: See Content Signals for implementation details.
robots.txt
What it is: The long-standing standard for declaring crawler access permissions at a domain level, originally designed for search engines and now extended to cover AI crawlers.
Status: Long-established de facto web standard (originally proposed 1994), broadly respected by compliant crawlers, though compliance is voluntary and not legally enforced.
Learn more: See robots.txt for AI-specific implementation guidance.
Schema.org
What it is: A shared vocabulary for structured data markup, maintained collaboratively by Google, Microsoft, Yahoo, and Yandex, used to describe entities like organizations, products, articles, and FAQs in a machine-readable format embedded directly in HTML.
Status: Mature, widely adopted standard since 2011. Actively maintained.
Learn more: Validate your implementation with Google's Rich Results Test. See AI Credibility for how schema affects your AIA Score.
sitemap.xml
What it is: An XML file listing a site's URLs to help search engines discover and crawl pages efficiently.
Status: Mature, widely adopted standard, supported by all major search engines.
Relevance to AI readability: While built for traditional search crawling, sitemaps can also reference AI-readable resources like llms.txt and /semantic/index.json in their Sitemap: declarations within robots.txt, improving discoverability.
AIA Score
What it is: A weighted, three-factor scoring framework (Structure, Explicitness, Accessibility) for quantifying how readily content can be interpreted, trusted, and cited by generative AI systems.
Status: Original framework introduced by Kayvan Momeni, December 2025. Used as the core methodology behind AIA Matrix's scanning and scoring product.
Learn more: AIA Score Explained, or the full paper: The AI Interpretation & Accessibility Score (AIA Score)
Generative Engine Optimization (GEO)
What it is: An emerging discipline focused on optimizing content for interpretation and citation by generative AI systems, distinct from traditional Search Engine Optimization (SEO), which optimizes for human click-through and ranking.
Status: Emerging term, not governed by any standards body — used descriptively across the industry to refer to this broader practice area, of which the AIA Score and AI-readable file layers are practical implementations.
A Note on This Fast-Moving Space
Because most of these standards are new, proposed, or voluntary, none of them guarantee specific outcomes — no AI crawler is legally required to honor llms.txt, Content Signals, or even robots.txt itself. Implementing them is a matter of good-faith signaling and early positioning, not contractual compliance. As adoption grows and standards formalize, sites that implemented these conventions early will already be in compliance.
This page will be updated as new standards emerge or existing drafts change status.