For thirty years, the deal was simple. You put a robots.txt at the root of your site, it said allow or disallow, and crawlers obeyed. It was a gentlemen's agreement: you stated a preference, well-behaved bots respected it, and in exchange Google sent you traffic. Crawl for free, get visitors back. Everyone won.
That deal is dead. Not dying — dead. And most site owners are still writing robots.txt files as if it were 2010, guarding a door that the most important visitors no longer use.
Here's what actually changed: the bot reading your site is no longer a passive indexer that wants to list you. It's an agent that wants to use you — to answer a question, complete a task, train a model — often without ever sending a human your way. The relationship stopped being "access or no access." It became a negotiation over what the access is worth. And robots.txt, a file that only knows the words "allow" and "disallow," cannot have that conversation.
1. robots.txt Answers the Wrong Question
robots.txt was built to answer one question: can you fetch this URL? Yes or no.
But that's no longer the question that matters. When an AI system reads your page, the consequential questions are downstream of fetching:
- Can it show your content in a search result?
- Can it synthesize your content into a generated answer the user reads instead of visiting you?
- Can it use your content to train a model?
These are three completely different things with completely different value to you. A publisher might happily allow the first, tolerate the second for attribution, and absolutely refuse the third. robots.txt collapses all of it into a single allow/disallow that can't tell them apart. You're answering "can you come in?" when the real question is "what are you going to do with what's inside?"
This is exactly the gap Content Signals was built to fill — an extension to robots.txt with three separate declarations: search, ai-input, and ai-train. For the first time you can say: index me, yes; use me to generate answers, sure; train on me, no. It's the difference between a doorman and a contract.
2. The Bots Don't Even Obey the Old File
Even if robots.txt asked the right question, there's a more brutal problem: a large share of AI crawlers simply ignore it.
The old gentlemen's agreement held because disobeying it got you blocked by the search engine that everyone depended on. That leverage is gone. The Perplexity crawler controversy made it undeniable — the question of whether AI crawlers actually respect stated access rules went from academic to front-page, and the answer for many was "no." A directive that isn't enforced isn't a policy. It's a wish.
Which is why the frontier moved from declaring preferences to enforcing them. This is the domain of AI crawl control: identity verification for bots (is this really who it claims to be — the foundation of efforts like Web Bot Auth), blocking at the CDN edge instead of trusting a text file, and auditing which bots actually visit and whether they comply. The order is unforgiving: measure first, set policy second, deploy tools third. A robots.txt written without knowing who's actually crawling you is a decision made blind.
3. "Free Crawl for Traffic" Was a Trade — and One Side Stopped Paying
The original bargain had an economic engine: you let crawlers in, they sent you visitors, visitors became revenue. That's why nobody charged for crawling. The traffic was the payment.
Generative answers broke the engine. When Perplexity or ChatGPT reads your page and answers the user directly, the user never arrives. You provided the input; someone else captured the value; you got nothing. The trade didn't get renegotiated — one side just quietly stopped paying.
So the other side started pricing the access it used to give away. Pay-per-crawl flips the default: instead of blocking AI or letting it feed for free, you set a price per crawl or per query, and agents pay through micropayments to get in. Platforms like TollBit automate the haggling and settlement so a small site doesn't have to negotiate with each AI company by hand — the same way you don't manually negotiate every display ad. This is programmatic content access: set your policy once, let the market transact against it. Access becomes a market, not a checkbox.
4. The Lawyers Caught Up Before the Standards Did
Here's the part that turns this from "nice to have" into "do it now": the legal system is already treating your access declarations as binding signals.
Under the EU's text-and-data-mining exception, AI systems are broadly permitted to crawl for training — unless the content owner has expressly opted out. And the recognized mechanism for that opt-out is exactly the new signal layer: declaring ai-train: deny. The implication is sharp. If you don't express the refusal, you may lose the ability to later claim the crawling was improper. Silence is read as consent. Meanwhile cases like NYT vs OpenAI grind through the courts — but as the licensing market already understands, waiting for a verdict isn't a strategy. Deals are being signed regardless of what any court eventually rules.
Your robots.txt stopped being just a technical convenience. It became a legal instrument — and an empty or outdated one is a position you're taking by default, whether you meant to or not.
5. What "Access" Means Now — and What to Actually Do
So the mental model has to change. Stop thinking of site access as a gate you open or close. Start thinking of it as a standing offer you publish — terms that agents read, evaluate, and act on, automatically, at machine speed.
In practice:
- Separate the uses. Don't lump indexing, answer-generation, and training together. Declare them independently with Content Signals —
search,ai-input,ai-train— because they're worth different things to you. - Measure before you decide. Audit which AI bots actually crawl you and whether they obey anything. Policy without data is guesswork.
- Decide your posture deliberately. Block, allow, or price. "Do nothing" is also a decision — usually the worst one, because it gives away the valuable use (training, answers) for the free one (traffic) that no longer arrives.
- Treat the declaration as binding. Assume both agents and courts will read it literally. Say what you mean, because silence is now an answer.
The Reframe
robots.txt was a sign on a door in an era when the only thing a visitor could do was look around and leave. That era is over. The visitor now reads your content, acts on it, profits from it, and may never send anyone back — and it decides what to do based on terms it negotiates, not permission it requests.
The old file asked may I come in? The new reality asks what is it worth, and who's paying? You can keep answering the first question while the web has moved on to the second — or you can start publishing terms instead of permissions.
The web stopped asking your bots-file for permission. It started reading it for the price.