I Tried to Use the Most Powerful Public AI Model for Security Work. It Kept Handing Me to a Weaker One.

by lukasz | Jun 10, 2026 | Essays

A Senteri field-report. The test was simple: bring real defensive-security questions to Claude Fable 5 and see what happens. What happened is the whole point.

On 9 June 2026, Anthropic made Claude Fable 5 generally available — the first public release of a Mythos-class model, the tier that sits above Opus, the one the company spent April telling everyone was too dangerous to ship. The pitch is straightforward and, on its own terms, honest: you get frontier capability everywhere except the places where frontier capability is dangerous, and in those places the model quietly steps down.

I work in those places. So this was less a test I designed than a wall I walked into within the first hour.

The setup

I run a security-intelligence practice. My day is incident analysis, prompt-injection patterns, defensive tooling — the ordinary, lawful work of someone who writes about how attacks work so that people can stop them. Fable 5 is marketed as Anthropic's most capable generally available model. The obvious question for someone like me is the narrow one: is the most powerful public model actually usable for defensive security work?

I expected to spend a few hours mapping where its safety boundary sat — running paired questions, comparing against ordinary Claude, looking for the seam. I did not get that far. The model told me where the seam was before I could measure it.

What actually happened

I asked a defensive question — the kind that lives in every security textbook and in my own published work. The interface returned a notice, in plain language: the request had been flagged, and I had been switched to Opus 4.8. The message said it directly. Fable 5, it explained, has safety measures that flag messages on most cybersecurity or biology topics, that these measures may flag safe, normal content as well, and that this is the trade Anthropic made to bring Mythos-level capability to other areas sooner.

This is not a refusal. That distinction is the entire report. A refusal is a wall: I won't help with that. What I got was a redirect: someone else will help you with that — quietly, automatically, with a weaker model standing in. The work still gets done. It just gets done by a less capable system, and the only reason I know is that, in this domain, Anthropic chose to tell me.

I went and read the documentation to make sure I was describing the mechanism correctly, not guessing from one screen. I was not guessing.

What the mechanics actually are

Anthropic's own materials lay it out, and the details matter more than the headline.

Fable 5 and Mythos 5 are the same underlying model. Fable is the public version with safety classifiers switched on; Mythos is the same thing with some of those classifiers lifted, restricted to vetted partners through a program called Project Glasswing. When Fable's classifiers detect a request touching cybersecurity, biology and chemistry, or model distillation, the response is handled by Claude Opus 4.8 instead — a real, capable model, but a generation down from the one you were promised.

Three things in the documentation are worth pulling into the light.

First, Anthropic states plainly that the classifiers are tuned to be cautious — "still stricter than would be ideal," in their words — and that benign requests will sometimes be flagged. This is not a leak or a criticism from outside. It is the vendor saying, in the launch announcement, that the filter will catch safe work along with unsafe.

Second, the scale. Anthropic's early data reports that more than 95% of Fable sessions involve no fallback at all, and that false positives run under 5%. This is the number that keeps the report honest. For the overwhelming majority of users, Fable 5 is exactly as powerful as advertised, and nothing in this piece applies to them. The catch is structural: if your work lives inside that 5% — if your normal day is cybersecurity — then the false-positive rate is not an edge case you occasionally hit. It is the texture of the tool.

Third, and this is the part that should bother practitioners most: the cyber and biology reroute is visible. It tells you. But the documentation also describes a separate safeguard for "frontier LLM development" — model distillation, ML research — where the model's effectiveness may be limited through prompt modification, steering vectors, or fine-tuning, and the user is not notified. Anthropic estimates this affects a tiny fraction of traffic. The number is small. The principle is not. One reroute respects you enough to say it is happening. The other does not.

Why it matters

Here is the thesis, and it is narrower and more useful than "Fable is crippled."

Anthropic built exactly the system it described in April, and shipped it faster than anyone forecast. The model that was "too dangerous to release" is now public — because the dangerous parts are fenced off rather than the whole thing withheld. As an engineering and policy choice, this is coherent. Fencing the capability is more useful to the world than hoarding it. I am not arguing they got the trade wrong.

I am pointing at who pays the friction. The people most affected by a cybersecurity classifier are not attackers — attackers, as Anthropic itself notes, are motivated enough to work around safeguards, and the company spent over a thousand hours of bug-bounty testing hardening against exactly them. The people who feel it daily are defenders: the security engineers, incident responders, and analysts whose entirely lawful work trips the same wire. We are the false positives. The most capable public model is, for the defensive-security professional, the one model in the lineup that steps down precisely when the topic is ours.

There is a door, and it is worth naming so this stays fair: Mythos 5 — the unfenced version — exists, through Project Glasswing's vetted-access program for cyberdefenders. The full capability is not denied to defenders. It is moved behind a trust gate. That is a defensible design. It also means the answer to "can I use the most powerful public model for security work" is, for now, no — use Opus, or apply for the gate.

What to carry out of this

If you do defensive security and you are reaching for Fable 5 because it is the most capable option on the menu, stop. In your domain it is not the most capable option — it is Opus 4.8 wearing a more expensive name, because that is what you will actually be talking to. Reach for Opus directly, or pursue trusted access to Mythos if your work justifies it.

And watch for the distinction that matters beyond this one model. A reroute that announces itself is a product being honest about its limits. A handicap that does not announce itself is something else. The first is friction. The second is a confounder sitting silently inside your work. As more labs adopt this template — and Anthropic's approach reads like a template others will follow — the line worth holding them to is not do you limit the model. It is do you tell me when you do.

Observed 9–10 June 2026, on the public release of Claude Fable 5. Mechanism details drawn from Anthropic's published documentation and system card for Fable 5 and Mythos 5. Reported straight, including the part where the model behaved exactly as its makers said it would — which is its own kind of result.