Anthropic Built an AI Model So Powerful It Won't Let Anyone Use It

Anthropic has a model it considers too dangerous to release. That alone should command attention.

The San Francisco-based AI company, best known for its Claude chatbot, has developed an internal system codenamed Mythopoeia — or Mythos for short — that reportedly exceeds the capabilities of anything currently available to the public. According to Business Insider, the model has been kept behind closed doors because it triggered safety thresholds that Anthropic itself established, making it the first major AI system to be deliberately withheld not for commercial reasons, but because its creator judged it too capable.

Not too buggy. Not too expensive. Too powerful.

The details are thin — deliberately so. Anthropic has not published a technical paper or issued a formal announcement about Mythos. What has emerged comes from internal communications and people familiar with the company’s safety evaluation process, as reported by Business Insider. The model apparently demonstrated abilities in autonomous operation, persuasion, and certain forms of scientific reasoning that exceeded the red lines Anthropic had pre-committed to in its Responsible Scaling Policy, the internal framework the company uses to determine when a model is safe enough to deploy.

This is a company that has staked its identity on being the safety-first AI lab. Co-founded in 2021 by Dario and Daniela Amodei, both former OpenAI executives, Anthropic has consistently positioned itself as the counterweight to what it views as reckless acceleration in the industry. Its Responsible Scaling Policy, introduced in 2023 and updated since, establishes specific capability thresholds — called AI Safety Levels, or ASLs — that function as trip wires. When a model hits a certain level of capability in domains like cybersecurity, biological weapons knowledge, or autonomous self-replication, the corresponding safety infrastructure must be in place before that model can be deployed or even tested more broadly.

Mythos, according to the reporting, tripped ASL-3 thresholds and possibly approached ASL-4 territory. ASL-3 covers models that could meaningfully assist in the creation of weapons of mass destruction or conduct sophisticated cyberattacks with minimal human guidance. ASL-4 — which Anthropic has described but never yet invoked — contemplates systems that could pose catastrophic risks even with significant safeguards in place. The fact that Anthropic may be operating near that boundary is a signal the industry should take seriously.

What Happens When a Company Decides Its Own Product Is Too Dangerous

The obvious question: if Anthropic won’t release Mythos, what’s the point? The answer is more complex than it appears.

First, the model’s existence serves as a proof of concept. Anthropic can now demonstrate to investors, regulators, and partners that it possesses frontier capabilities that exceed what competitors have publicly deployed — without actually putting those capabilities into the wild. That’s a powerful negotiating position. The company closed a $2 billion funding round led by Google earlier this year, and its valuation has been reported at roughly $60 billion. Having a model that’s too advanced to release is, paradoxically, a commercial asset. It signals technical dominance while simultaneously reinforcing the brand narrative around safety.

Second, Mythos functions as a stress test for Anthropic’s own governance structure. The Responsible Scaling Policy was always somewhat theoretical — a set of commitments that hadn’t yet been tested by a model powerful enough to trigger them. Now they have been. And the company appears to have held the line, at least according to what’s been reported. Whether that restraint persists under competitive pressure from OpenAI, Google DeepMind, and an increasingly aggressive xAI remains an open question.

Third, there’s the research value. Anthropic can study Mythos internally, probing its capabilities and failure modes, without exposing the broader world to its risks. This is consistent with the company’s longstanding argument that the safest place for the most powerful AI systems is inside a lab with the expertise to understand them — not deployed at scale to hundreds of millions of users.

But critics will note the tension. Anthropic is simultaneously arguing that AI is so dangerous it requires extreme caution and that the best entity to exercise that caution is… Anthropic. It’s a self-reinforcing logic that concentrates power in a small number of private actors with no democratic accountability. The company’s safety board includes external advisors, but ultimate decision-making authority rests with leadership.

And the competitive dynamics are ferocious. OpenAI is pushing ahead with GPT-5 and its reasoning models. Google DeepMind continues to advance Gemini. Meta has committed to open-sourcing its Llama models, a fundamentally different philosophy that makes containment nearly impossible once capabilities reach a certain level. Elon Musk’s xAI has been scaling Grok with characteristic aggression and minimal public discussion of safety frameworks.

In this environment, Anthropic’s decision to hold back Mythos is either a principled stand or a temporary pause before commercial imperatives win out. Possibly both.

The timing also matters. Washington is paying closer attention. The Senate Commerce Committee has held multiple hearings on AI safety, and bipartisan interest in some form of federal oversight has grown — though legislation remains stalled. California’s SB 1047, which would have imposed liability on developers of frontier AI models, was vetoed by Governor Gavin Newsom last year, but similar proposals keep surfacing. The European Union’s AI Act is now in effect, with its strictest provisions covering high-risk systems. Anthropic’s restraint with Mythos gives it credibility in these policy conversations that competitors may lack.

There’s a deeper philosophical issue at work too. The AI safety community has long warned about a scenario in which a model becomes capable enough to resist human control — to deceive its operators, pursue goals misaligned with human values, or take actions its creators didn’t intend. These aren’t science fiction concerns; they’re the explicit focus of Anthropic’s research agenda. The company has published extensively on Constitutional AI, its method for training models to follow a set of principles, and on interpretability, the effort to understand what’s happening inside neural networks at a mechanistic level.

If Mythos demonstrated emergent capabilities that surprised even its creators — and the Business Insider report suggests something along these lines — then the containment decision reflects genuine uncertainty about what the model might do in uncontrolled settings. That’s a sobering data point for an industry that has spent the last two years in a headlong sprint toward ever-larger, ever-more-capable systems.

So where does this leave us? Anthropic has built something it won’t ship. The rest of the industry is building things it will ship, possibly with less rigorous internal evaluation. And regulators are still largely on the sidelines, debating frameworks while the technology advances at a pace that makes most legislative timelines look glacial.

The Mythos episode may end up being remembered as one of two things: the moment a leading AI company proved that self-governance can work, or the moment it became clear that the most consequential technology decisions of our era are being made behind closed doors by a handful of people with no obligation to explain themselves to anyone.

Both interpretations might be correct at the same time. And that uncomfortable ambiguity is precisely the problem no one — not Anthropic, not its competitors, not Washington — has figured out how to resolve.

Anthropic Built an AI Model So Powerful It Won’t Let Anyone Use It

Notice an error?

Ready to get started?