Mozilla's AI Bug Hunt Exposes 271 Firefox Flaws With Near-Zero Noise

Mozilla shipped Firefox 150 this week packed with fixes for 271 vulnerabilities. All but a handful came from Anthropic’s experimental model, Claude Mythos Preview. The numbers stunned even the browser’s own engineers. One month earlier the same team had credited 22 bugs to an earlier Anthropic model. Now the total had jumped more than twelvefold.

Distinguished Engineer Brian Grinstead didn’t mince words. “In terms of the bugs coming out on the other side, there are almost no false positives,” he told Ars Technica. That claim stands in sharp contrast to earlier experiments with large language models. Those produced what Mozilla engineers called “unwanted slop” — hallucinated reports that wasted hours of human review.

But something changed. Mythos Preview arrived with greater reasoning depth. Mozilla paired it with a custom harness built on the browser’s existing fuzzing infrastructure. The harness hands the model targeted files, read-write access, and a sanitizer build that crashes on memory errors. It loops the model until it produces a reproducible test case. A second model then grades the output. The result is a tight feedback loop that discards noise before it reaches developers.

“With these harnesses, so long as you can define a deterministic and clear success signal or task verification signal, you can just keep telling it to keep working,” Grinstead explained in the same Ars Technica report. That capability unlocked scale. Over two months the pipeline surfaced 180 high-severity issues exploitable through normal web browsing, 80 moderate, and 11 low. Mozilla grouped them into three roll-up CVEs for the Firefox 150 advisory.

The April 2026 release cycle tells a broader story. Mozilla fixed 423 security bugs total. That figure dwarfs the 31 handled in April 2025. One-third came directly from the Mythos-driven pipeline. Another third used similar AI methods with other models. The rest relied on traditional fuzzing or internal discovery. Over 100 contributors touched code, triage, testing, and releases to push the most secure Firefox yet, according to Mozilla’s own technical write-up.

Firefox CTO Bobby Holley captured the shift in tone. “Defenders finally have a chance to win, decisively,” he wrote in Mozilla’s April blog post. The post, titled “The zero-days are numbered,” described the initial reaction inside the team as vertigo. A single memory-safety bug in a hardened browser once triggered red alerts. Suddenly hundreds arrived at once. Yet Holley and his colleagues saw the long game. Attackers would gain the same tools soon enough. Better to drain the reservoir of latent flaws now.

The vulnerabilities themselves reveal how Mythos thinks. Mozilla published full Bugzilla reports for 12 of them on May 7 to illustrate the range. One involved a 15-year-old error in parsing the HTML

element. Complex interactions among recursion limits, expando properties, and cycle collection triggered the flaw. Another exposed a use-after-free through a race condition over IPC in IndexedDB reference counting. A third let a raw NaN value masquerade as a JavaScript object pointer when crossing process boundaries, creating a fake-object primitive in the parent process.

Still others touched XSLT hash tables, WebTransport certificate flooding, color-picker event loops, and even a 20-year-old reentrancy bug in XSLT key functions. One standout abused rowspan=0 semantics in HTML tables to overflow a 16-bit layout field after appending more than 65,000 rows. Fuzzers had missed it for years. Many bugs assumed a compromised content process and focused on sandbox escapes. The model was explicitly allowed to patch only sandboxed code during its search, mirroring real attacker constraints.

These examples matter. They show Mythos does not invent novel vulnerability classes. “Encouragingly, we also haven’t seen any bugs that couldn’t have been found by an elite human researcher,” Mozilla stated in its April 21 blog. The model simply reasons across large codebases faster than humans can. It traces data flows, anticipates edge cases in garbage collection, and constructs proof-of-concept test cases that trigger sanitizer crashes. Elite humans have always done this. The difference is volume and speed.

That speed forces hard choices. Holley told WIRED that every piece of software must make this transition. Latent bugs now sit exposed to automated discovery. Large companies talk of reassigning thousands of engineers for months. Smaller open-source projects face steeper hurdles. They lack both early access to frontier models and the staffing to triage thousands of findings. “The underlying economics haven’t changed,” added Mozilla CTO Raffi Krikorian in the same WIRED story. Valuable infrastructure often rests on volunteer labor while corporations profit from it.

Mozilla’s response mixes transparency with pragmatism. The team built a full lifecycle pipeline — discovery, deduplication, triage, tracking, and shipping. They iterate on prompts, parallelize across virtual machines, and integrate with continuous integration where possible. Early experiments started small, targeting sandbox escapes with Claude Opus 4.6. Results impressed them enough to scale. When Mythos Preview became available, swapping it in proved straightforward. The harness did most of the heavy lifting.

Critics have questioned the headline numbers. Some pointed out that only three CVEs carried direct credit to Anthropic’s red team in earlier releases. Others noted the 271 bugs collapsed into three omnibus entries rather than individual CVEs. Mozilla counters that each bug met its internal security criteria, produced reproducible test cases, and received fixes. The low false-positive rate — fewer than 15 by some internal counts — separates this effort from prior AI audits that drowned teams in noise.

And the work continues. Mozilla plans to scan patches as they land, not just entire codebases. It shares lessons publicly so other projects can build similar harnesses. “Anyone building software can start using a harness with a modern model to find bugs and harden their code today,” the hacks.mozilla.org post advises. The current moment feels perilous. It also offers opportunity.

Security has long been offensively dominant. Attackers needed only one viable bug. Defenders faced an asymmetric burden. Fuzzing helped but left gaps in complex, stateful code. Human red teams filled some of those gaps at high cost. Now automated reasoning closes the loop. The defects, Mozilla argues, are finite. The browser’s modular design keeps it comprehensible to humans even as models probe it relentlessly.

Of course risks remain. Future models could complicate codebases faster than they find flaws in them. Over-reliance on AI during development might erode the very human understandability that makes systematic fixing possible. Mozilla flags that concern in a footnote. Maintain human-comprehensible architecture, it warns, especially in browsers and operating systems.

For now the scoreboard favors defenders who act. Firefox 150 is measurably stronger. The 271 bugs no longer sit waiting for a well-resourced adversary. Similar pipelines will spread. The question shifts from whether AI can find memory-safety issues to how quickly organizations can fix them at scale. Mozilla has shown one path. It demands focus, engineering grit, and a willingness to reprioritize. It also delivers something the industry has chased for decades: a realistic shot at driving exploit costs so high that even sophisticated actors think twice.

The zero-days, it seems, really are numbered.

Mozilla’s AI Bug Hunt Exposes 271 Firefox Flaws With Near-Zero Noise

Notice an error?

Ready to get started?