Developers noticed something off with Claude Code in early March. Outputs grew repetitive. The tool seemed to forget prior steps mid-task. Complex coding jobs faltered. Anthropic’s April 23 postmortem lays it bare: three separate changes, shipped over six weeks, combined to erode performance across Sonnet 4.6, Opus 4.6, and Opus 4.7. Not a model downgrade. Product tweaks gone wrong. Anthropic Engineering Blog.
It started March 4. Engineers switched default reasoning effort from high to medium. Goal: cut latency in long-thinking mode. High effort delivers sharper results but drags on UI. Medium speeds things up—for most tasks. Users felt the hit on tougher ones. Reverted April 7 after complaints piled up.
Then March 26. A caching fix aimed to prune old thinking from idle sessions over an hour. Meant to trim costs and latency on resume. Bug triggered it every turn. Reasoning blocks vanished repeatedly. Cache misses snowballed. Claude repeated itself. Drained usage limits faster. Tool calls suffered most. Fixed April 10 in v2.1.101.
April 16 brought the third. System prompt tweak: keep text between tool calls under 25 words, final responses under 100 unless needed. Intended to curb verbosity. Paired with other prompts, it dulled coding smarts. Evaluations later showed a 3% drop. Reverted April 20 in v2.1.116. All fixed now.
Why so hard to spot? Each change hit different traffic slices on staggered schedules. Looked like broad inconsistency. Internal tests passed—unit, end-to-end, dogfooding. No easy reproduction. User reports spiked online, but noisy evals muddied signals. Back-testing with Opus 4.7 caught the caching flaw; 4.6 missed it.
Users paid the price. Faster token burn from cache issues. Locked out mid-project. AMD’s Stella Laurenzo crunched numbers on 6,852 sessions: thinking length plunged 73% from January to March, files read per edit halved, retries spiked 80x. X post by @realarmaansidhu. Developers vented on X and Reddit for weeks. Anthropic reset all subscriber limits April 23 as remedy.
Transparency wins praise. But gaps glare. Internal teams ran non-public builds. Now they’ll match production exactly. System prompt changes get stricter evals, ablations, audits. Intelligence tweaks demand soak periods, wider tests, rollouts. Code Review tool ships to customers soon, with repo context for better debugging. Anthropic Engineering Blog.
And the timing stings. Competitors pounced. OpenAI dropped GPT-5.5 agents same day. NVIDIA opened 80 free frontier models. Qwen 3.6 matched Opus on benchmarks from a laptop. Community fled to OpenClaw, free hosts. Pricing shifts to enterprise didn’t help. Kingy.ai.
One X user nailed it: community pressure forced the postmortem. Initial denials blamed harnesses. Bugs confirmed. Fixes shipped. But enterprise AI demands ironclad stability. Auto-updates could wreck workflows. Gating releases? SRE basics from the ’90s. AI labs still learn them. X post by @ViC305.
API stayed solid throughout. No inference tweaks. Claude’s core held. Product layer slipped. Developers test v2.1.116 now. Workflows decide if trust returns. Market moves fast. Bugs like these? Open doors wide.


WebProNews is an iEntry Publication