Content Signals
Content Signals is a proposed extension to robots.txt that lets website owners declare specific preferences for how AI systems may use their content — separately from whether crawlers can access it at all.
The Problem It Solves
Standard robots.txt only answers one question: can a crawler access this page or not. It has no way to express more nuanced preferences, such as:
"You may crawl this page and use it to answer user questions in real time, but you may not use it as training data for your model."
Content Signals introduces a structured way to declare exactly that distinction.
The Three Signals
Content-Signal: ai-train=no, search=yes, ai-input=yes| Signal | Controls |
|---|---|
| ai-train | Whether AI companies may use this content to train models |
| search | Whether this content may appear in search engine results |
| ai-input | Whether this content may be retrieved and used as live context when an AI system answers a user's question |
Each signal accepts yes or no.
How to Implement It
Add Content-Signal directives under the relevant User-agent blocks in your robots.txt:
User-agent: *
Content-Signal: ai-train=no, search=yes, ai-input=yes
User-agent: GPTBot
Content-Signal: ai-train=no, search=yes, ai-input=yes
User-agent: ClaudeBot
Content-Signal: ai-train=no, search=yes, ai-input=yesSee robots.txt for the full directive structure, and AI Crawlers for a complete list of user-agents worth covering.
Choosing Your Settings
Most businesses publishing public-facing marketing content — services, about pages, contact information — benefit from being discoverable and citable, while having less interest in their specific copy being absorbed into model training data.
A common, reasonable default:
Content-Signal: ai-train=no, search=yes, ai-input=yesThis says: don't train on my content, but do show it in search results, and do use it to answer questions about my business in real time.
If you operate a content business where the text itself is the product (journalism, paid research, proprietary analysis), you may want stricter settings:
Content-Signal: ai-train=no, search=yes, ai-input=noValidating Your Settings
You can check whether your Content Signals are correctly configured using a free scan at isitagentready.com:
POST https://isitagentready.com/api/scan
Content-Type: application/json
{"url": "https://yourdomain.com"}Check that checks.botAccessControl.contentSignals.status returns "pass".
See Validation for more on confirming your full AI-readiness setup.
Current Standards Status
Content Signals originates from an IETF draft (draft-romm-aipref-contentsignals) submitted by engineers at Cloudflare. As of this writing, the draft has expired and has not been adopted by a formal IETF working group — meaning no AI crawler is required to honor it.
This doesn't make it worthless to implement. Cloudflare's own AI Crawl Control product already supports Content Signals configuration, and early adoption costs only a few lines in your robots.txt. As with llms.txt, being early to an emerging convention is low-cost and positions your site ahead of eventual formalization.
For more on where this fits among other emerging standards, see Standards Glossary.
How AIA Matrix Implements This
AIA Matrix automatically includes Content Signals directives in every generated robots.txt, scoped to each major AI crawler individually. Default settings follow the ai-train=no, search=yes, ai-input=yes pattern, and Professional plan users can request custom settings per crawler through their dashboard.