The Agentic Web Is Solving the Same Problem Nine Times

by lukasz | Jun 11, 2026 | Essays

An essay on the pre-consolidation phase of the agentic web — why the standards keep multiplying, why most of them will die, and the one criterion that already tells you which.

Here is what the last three years have felt like from the publisher's side of the agentic web. Every few months a new file or protocol appears, addressed to AI agents, each with its own format and its own manifesto. You learn it. You consider implementing it. And then the next one arrives, solving — as far as you can tell — the same problem, partly from its own idea and partly from pieces borrowed from the others. The closer you look at the stack, the blurrier it gets.

The cheap reading of this is "standards chaos," followed by a shrug. I want to argue something stronger. This is not chaos. It is a recognizable phase — the web has been through it at least twice before — and phases of this kind end the same way every time. The agentic web is in its pre-consolidation period. Consolidation will happen by extinction, not by merger. And the survivors are already predictable by a single criterion: a standard survives when it has a consumer with a runtime, not just a specification with a website.

This essay is the argument for that claim. The implementation side — what to actually deploy on a working site, file by file, as of mid-2026 — lives in a companion piece on Webflux, in Polish. Here I am after the mechanism.

Nine answers to three questions

Count the mechanisms currently addressed to agents and you get roughly nine: robots.txt, sitemap.xml, schema.org, llms.txt, NLWeb, OpenAPI/MCP, agent.json, agents.txt, and the academic attempt to name agent actions known as Web Verbs. Nine artifacts, nine formats, nine governance stories. But put them on the table and ask what question each one answers, and the nine collapse into three.

Who is asking? That is robots.txt and everything now growing out of its grave — the access and identity layer, which I have written about separately and will not re-argue here. What should be read? That is sitemap.xml, schema.org, and llms.txt — discovery and content. How can an agent act? That is OpenAPI and MCP, with NLWeb as a connector, and the long tail of declared-intent experiments behind them.

Three questions. Nine mechanisms. That ratio is the whole problem — and it hides an asymmetry that I think is the most useful structural observation available about this landscape.

The read path genuinely does reduce to "serve text in Markdown or JSON." A sitemap is a list. A schema.org block is annotated facts. An llms.txt is a curated reading list. This is why half the "standards" of the agentic web are literally text files sitting at a root path: reading is stateless and safe, so a static file is a sufficient answer. The action path does not reduce to a file and never will, because an action needs identity, a session, authorization, and increasingly money. You cannot sign a Markdown file into a transaction.

So the two halves of the stack are not just different layers. They live under different selection pressures. Files compete on attention — whether any crawler bothers to read them. Protocols compete on capability — whether anything can be done through them that cannot be done without them. Attention markets produce many cheap entrants and mass die-offs. Capability markets produce few entrants and durable winners. Keep that asymmetry in mind; it does most of the predictive work later.

Four genealogies that never met

Why does the stack feel so sprawling — as if each mechanism reinvented the others slightly wrong? Because these nine artifacts come from four lineages that never coordinated.

The first is the crawl era: robots.txt in 1994, sitemaps in 2005, written for a web whose only non-human reader was a search indexer. The second is the semantic web remnant: schema.org, born 2011, the one survivor of a much grander vision — and it survived for an unromantic reason: Google paid rent on it, in rich snippets, for over a decade. The third is the API world: OpenAPI, built for machine-to-machine integration long before anyone said "agent." The fourth is the LLM-era patchwork of 2023–2025 — llms.txt, MCP, NLWeb, agent.json — written by people who needed something now and could not wait five years for a W3C working group.

Each new layer cites the old ones. "Inspired by robots.txt" appears in practically every proposal of the fourth lineage. But citation is courtesy, not integration. Integrating with an existing standard means entering a standards process, and this market visibly outruns every standards process that exists. The most telling artifact of 2026 is that meta-standards are now emerging whose explicit pitch is that they can absorb new protocols faster than traditional standardization — the fragmentation has begun producing standards about the standards. That is not a sign of maturity. It is a sign that everyone can see the sprawl and nobody can stop contributing to it.

We have watched this movie. Twice.

If this all feels unprecedented, it is only because the last screenings were a while ago.

Between 1999 and 2005, the web had four incompatible specifications for the same humble object, a content feed: RSS 0.9, RSS 1.0, RSS 2.0, and Atom. Each had a community, a philosophy, and a roadmap. None of them merged. The market consolidated by extinction — publishers and readers drifted to the variants that the dominant consumers actually parsed, and the rest simply stopped being generated. (And then, in a darker epilogue worth exactly one sentence, the entire category was eaten by platforms that had no interest in open feeds at all.)

The second screening was WS-* versus REST. One side built a complete, committee-designed capability stack — security, transactions, addressing, reliability, each its own specification. The other side was barely a standard at all: a convention with running consumers. The convention won, comprehensively. Not because it was better designed — it demonstrably was not, by the criteria the committees cared about — but because at every moment there were more programs actually consuming it.

Extract the lesson explicitly, because it is the hinge of this essay: in both cases the survivor was not the most complete specification. It was the one with a runtime on the other end. Completeness is an argument; a consumer is a fact.

The survival criterion, applied

So here is the criterion, stated as plainly as I can: a standard survives when it has a consumer with a runtime, not a specification with a website. Apply it to the nine, and the landscape stops being confusing. As of June 2026:

The crawl-era pair, robots.txt and sitemap.xml, survives trivially — thirty years of consumers, every crawler ever shipped. Schema.org survives and is, if anything, being promoted: what was decoration for rich snippets is becoming the primary data channel agents trust, a shift with its own pathologies that I documented in the crawl-scale markup data — much of what sites "say" in markup is automated, accidental, and out of date, but it is consumed, and consumption is what the criterion measures.

MCP survives, and the reason generalizes the argument I made in the MCP essay: it did not win on elegance, it won on the mathematics of a runtime ecosystem — clients that ship, servers that multiply, each side making the other more valuable. By the criterion, MCP is the most alive object on this list, because it is the one where the consumer arrived before the hype.

llms.txt is the live experiment, and the most instructive case on the board. It has everything except the one thing that matters. Adoption is real: 10.13% of domains in SE Ranking's crawl of nearly 300,000 — and that is the generous measurement; narrower crawls of top-ranked domains and vertical samples land at 6–7%. Tooling is real; Google has folded a check for it into Lighthouse's Agentic Browsing audits, which as of May 2026 ship in the tool's default configuration — though in Chrome itself the category is, at this writing, visible only in Canary builds, not yet in stable DevTools. What it does not have is the crawler it was written for. No major AI crawler officially commits to consuming it, and the effect on citations is not just unproven but measured as absent: in the SE Ranking data, only one of the fifty most AI-cited domains had the file, and removing llms.txt as a variable from their citation-prediction model improved the model. A separate 90-day server-log experiment found that of more than 62,000 AI bot visits, about 0.1% touched the file at all. This is supply without demand — the same pattern the markup data exposed in schema.org's long tail, repeating in a newer file format.

And yet — here is the detail that makes llms.txt the perfect illustration of the criterion rather than just its victim. The file does have a consumer with a runtime. It is simply not the one its adopters imagine. Developer tools and RAG frameworks — Cursor, Continue, Aider and their kin — actively read llms.txt when it exists. So the file is alive in exactly one niche, the one with a runtime on the other end, and inert everywhere else. The same artifact, split down the middle by the same criterion. Either a search-side consumer materializes too, in which case llms.txt graduates overnight from hedge to requirement — or it remains what it is today: documentation infrastructure for developer tools, wearing a GEO costume. Adoption statistics do not vote. Consumers vote.

NLWeb is not actually a competitor in this race, and judging it as one produces confusion. It is a bridge — from one survivor (schema.org, the data you already publish) to another (MCP, the protocol agents already speak). Bridges are honest infrastructure, but they have a structural property worth naming: they live exactly as long as both banks. NLWeb's fate is not in NLWeb's hands.

And then the rest: agent.json, agents.txt, Web Verbs. Specifications with websites. I intend no malice toward any of them — each contains a real idea, and agents.txt in particular gestures at a policy layer the web will eventually need. But the criterion does not ask whether an idea is good. It asks whether anything consumes it, and today nothing does.

One honest hedge, in the house tradition of reporting straight: the criterion predicts which class survives, not which instance. A consumer can materialize late and resurrect a dead-looking spec — that is precisely the llms.txt question. Which is why every verdict above carries a date, and why I expect to be wrong about at least one of them.

What this predicts

A theory that cannot be wrong is not worth publishing, so here are three predictions, dated June 2026, checkable in twelve months.

First: the identity layer consolidates fastest. Of the three questions, "who is asking" is the only one with a business model attached — traffic control — and the only one being pushed by infrastructure players with deployment power and by an actual IETF track. Money plus runtime is the strongest combination the criterion knows. Expect signed-agent identification to be boring, settled plumbing before the content layer settles anything.

Second: the read-path files consolidate by extinction within roughly two years. At most one curated-content convention survives alongside sitemap and schema — and only if a major crawler publicly commits to consuming it. If no consumer steps up, the answer to "what should be read" remains what it already is in practice: schema.org plus clean HTML, and the special files quietly stop being generated, RSS-style.

Third: the action layer does not consolidate to one protocol — and that is fine. Capability layers historically tolerate plurality; HTTP and gRPC coexist without anyone calling it chaos. What the action layer does consolidate to is a single discovery convention — one agreed way for an agent to find out which protocols a site speaks. The interesting standards fight of 2027 will not be MCP versus anything. It will be over the index.

If you own a website, the practical conclusion is short, and it is the same one the criterion gives: you do not need to predict the winner. You need the test. Implement what has a consumer today; put a date on everything else and watch. The file-by-file version of that advice — what to deploy, what to hedge, what to ignore on a working WordPress site — is the companion piece on Webflux.

I will come back to the three predictions above in June 2027 and score them in public, including the ones that did not go to plan. That is the house style. It would be strange to exempt my own forecasts from it.

Verdicts and figures in this essay are stated as of June 2026 and will age. The companion implementation guide, in Polish, is published on Webflux; the access layer, the MCP economics, and the markup data each have their own earlier essays on Senteri, linked above.