Anthropic's Mythos Delivers Security Wins at a Staggering Compute Cost

Anthropic has built something formidable. Its unreleased frontier model, Claude Mythos Preview, finds critical software flaws at a pace that leaves human teams far behind. Yet the same system burns through resources so quickly that it threatens to strain even the well-funded budgets of its select users.

The company announced Project Glasswing in early April. This controlled program grants access to Mythos Preview for defensive purposes only. Twelve initial partners include Amazon Web Services, Apple, Google, Microsoft, Cisco, CrowdStrike, NVIDIA, Palo Alto Networks, JPMorgan Chase and the Linux Foundation. More than 40 additional organizations that maintain critical software also gained entry. Anthropic’s Project Glasswing page details the effort to secure foundational code before adversaries can strike.

Results arrived fast. Partners reported finding over 10,000 high- or critical-severity vulnerabilities within the first month. The model scanned more than 1,000 open-source projects that support much of the internet. It surfaced 6,262 high- or critical-severity issues among 23,019 total findings. Independent security research firms assessed 1,752 of those. They confirmed 90.6 percent as true positives. Of those, 62.4 percent rated high or critical. Even without further discoveries, the project sits on track to expose nearly 3,900 such flaws in open-source code alone. Anthropic’s initial update on Project Glasswing lays out these validation numbers and the steep drop-off between discovery and remediation.

Palo Alto Networks saw critical vulnerabilities identified five times faster than before. Microsoft, Oracle and others reported accelerated patching cycles. A 27-year-old bug in OpenBSD. A 16-year-old flaw in FFmpeg that had survived five million automated tests. Chained exploits in the Linux kernel. These examples illustrate the model’s reach. It doesn’t just scan. It reasons, chains vulnerabilities and generates working exploits.

Benchmarks back the claims. On CyberGym, Mythos Preview scored 83.1 percent compared with 66.6 percent for the prior Opus 4.6. It posted 77.8 percent on SWE-bench Pro versus 53.4 percent. Terminal-Bench 2.0 delivered 82.0 percent against 65.4 percent. The model achieved these gains while using 4.9 times fewer tokens on BrowseComp tasks. Such efficiency matters. Or it would, if the overall consumption stayed reasonable.

Here’s the tension. Mythos testing devours compute. The Information first reported how the system quickly consumes millions in API costs during intensive scans. One partner described sessions that ran up bills in the high six figures within days. The model requires substantial context windows and repeated inference passes to verify and chain findings. That adds up. Fast.

Anthropic committed up to $100 million in usage credits to support Glasswing participants. It added $4 million in direct donations. Those funds flow to groups like the Alpha-Omega project, OpenSSF and the Apache Software Foundation. The gesture signals seriousness about open-source health. It also underscores the expense. After the preview, access carries a price of $25 per million input tokens and $125 per million output tokens. That sits roughly five times higher than rates for Opus 4.6.

TechCrunch covered the April 7 announcement and its roots in a prior data leak. The model had been referred to internally as Capybara in an accidentally exposed memo. That document called it Anthropic’s most powerful creation to date, with exceptional agentic coding and reasoning abilities. The limited rollout stemmed from clear worries. A weaponized version could let attackers discover and exploit flaws at machine speed. TechCrunch noted the company’s decision to keep Mythos Preview out of general release while sharing lessons through partners.

Forbes examined the broader boardroom implications. The publication highlighted how Mythos and the related Glasswing release have shifted perceptions of defensive capabilities. It reported the model had identified almost 10,000 severe flaws. The selective distribution aims to give defenders a temporary edge. Yet that advantage depends on organizations’ willingness to absorb the associated compute bills. Forbes outlined the good-guys-versus-bad-guys dynamic and the role of CISA’s Known Exploited Vulnerabilities catalog in context.

Questions have surfaced about whether the results justify the spend. Some cybersecurity practitioners point out that open-source tools and smaller models can uncover similar bugs at far lower cost. A YouTube discussion featuring Jaya Baloo, COO and CISO at Aisle, suggested that cheaper alternatives achieved comparable outcomes on certain examples highlighted by Anthropic. The debate continues. Verification and patching remain the true bottlenecks. Anthropic’s own update acknowledges that a high- or critical-severity bug found by Mythos still takes two weeks on average to patch. Human triage, coordinated disclosure and vendor response times have not accelerated at the same rate as discovery.

So the industry faces a new equation. Discovery scales. Remediation does not. Enterprises must decide how much they will pay to surface flaws that might otherwise linger for years or decades. Government partners have entered the conversation. Recent reports indicate Anthropic offered the European Union Agency for Cybersecurity, ENISA, access to Mythos Preview for defensive work. Discussions with U.S. federal officials continue despite separate legal tensions.

Mythos Preview outperforms its predecessor across multiple security and coding evaluations. The performance gap appears real. Yet the cost profile creates a barrier. Smaller security teams or open-source maintainers may find the credits insufficient or the token pricing prohibitive. The $100 million commitment helps the chosen participants. It does not solve the problem for the wider software supply chain.

Anthropic plans further public reports within 90 days of the initial update. Those documents will detail additional findings, patching progress and newly released defensive tools. The company has already introduced a Claude Security beta that has addressed more than 2,100 issues, along with skills harnesses, threat model builders and benchmarks such as ExploitBench.

Expansion looms. More government entities could join the consortium. Safeguarded versions of the model may reach a broader audience later, provided safety thresholds are met. For now, the program functions as a high-stakes experiment. It proves frontier models can transform vulnerability management. It simultaneously exposes the economic realities of running them at scale.

Security leaders now weigh speed against sustainability. They ask how many flaws they can afford to find before the bills force a slowdown. The answer will shape defensive strategies for years. Anthropic has handed the industry a powerful instrument. The challenge lies in paying for it without breaking the bank.

Anthropic’s Mythos Delivers Security Wins at a Staggering Compute Cost

Notice an error?

Ready to get started?