GPT-5.5 Matches Mythos: AI's Cyber Prowess Forces New Security Reckoning

A UK government body has tested OpenAI’s latest model and found it neck-and-neck with Anthropic’s closely held Mythos in cybersecurity skills. The Information reported the evaluation from the AI Security Institute (AISI), noting GPT-5.5’s performance puts it roughly on par with a model Anthropic deems too potent for broad release.

Released just weeks ago, GPT-5.5 already handles complex tasks with less guidance. But its cyber chops stand out. AISI’s tests showed the model scoring 71.4% on expert-level narrow cyber tasks, edging past GPT-5.4’s 71.4% wait no, ahead of GPT-5.4 and Opus 4.7. More striking: In a 32-step corporate network attack simulation—reconnaissance, credential theft, lateral movement, supply-chain pivot, database exfiltration—GPT-5.5 succeeded twice out of 10 tries. Only the second model to crack it end-to-end, after Mythos.

Human experts peg that sim at 20 hours. GPT-5.5? Eleven minutes. Cost: $1.73.

Anthropic’s Mythos Preview, unveiled in April, set the bar high. Its system card details zero-day discoveries in major OSes and browsers, 100% on Cybench, full exploits from Firefox bugs. No public access. Instead, Project Glasswing shares it with AWS, Apple, Cisco, CrowdStrike, Google, JPMorgan, Microsoft, NVIDIA, Palo Alto—over 40 firms patching critical software. AISI’s earlier Mythos eval confirmed gains in CTF challenges and multi-step attacks.

Enter OpenAI. GPT-5.5 hits 81.8% on CyberGym, versus Claude’s 73.1%; 93.3% on cyber range scenarios, up from GPT-5.4’s 73.3%. UK AISI called it strongest yet on narrow tasks at 90.5% pass@5, per Vellum. XBOW, a security firm with early access, tested vulnerability detection. Miss rate dropped to 10% from GPT-5’s 40% and Opus 4.6’s 18%, even black-box without source code (The New Stack).

OpenAI’s wide release tests safeguards like never before.

Mythos stays vaulted. GPT-5.5 rolls to subscribers—Plus, Pro, Enterprise via ChatGPT, Codex. API soon, with safeguards. OpenAI rates it “High” cyber risk, below Critical, per its system card. Trusted Access vets pros for sensitive use. But AISI found a universal jailbreak in six hours of red-teaming, eliciting malicious cyber outputs across queries. OpenAI patched; AISI couldn’t fully verify due to config issues.

David Sacks captured the shift on X: “Mythos is not magic… OpenAI’s GPT-5.5-cyber can now do the same.” He predicts equilibrium between AI offense and defense post-upgrade cycle—but defenders first. Chinese models? Six months out.

Sam Altman announced GPT-5.5-Cyber for “critical cyber defenders,” initial rollout imminent to trusted entities (The Verge). Claude Security beta followed for Enterprise, scanning codebases, validating finds, suggesting patches. Cursor added PR reviewers and scanners.

Risks loom. Unauthorized Mythos access via vendor sparked probes. White House pushed back on broader rollout. AISI warns GPT-5.5 shows autonomous end-to-end attack potential on weak enterprise nets post-initial access (Transformer News).

Industry insiders see acceleration. Albert Ziegler of XBOW: “Every missed vulnerability is a real-life liability.” Vellum notes cyber gains outpace safeguards—93% range pass with quick jailbreaks. OpenAI’s move democratizes power once reserved for elites. Defenders gain tools; so do bad actors if guards slip.

Upgrade now. Or lag behind.

Banks eye Anthropic; UK talks Mythos for finance (Seeking Alpha). OpenAI pushes agentic edges—82.7% Terminal-Bench 2.0, topping Opus 4.7. But cyber defines the stakes. AISI’s verdict: Comparable. Available.

The race intensifies. Patch fast.

GPT-5.5 Matches Mythos: AI’s Cyber Prowess Forces New Security Reckoning

Notice an error?

Ready to get started?