Anthropic Sounds Alarm on AI That Builds Itself

Anthropic has placed a marker in the ground. The company behind Claude now openly states that its models write more than 80% of the code merged into its own systems. Engineers ship eight times as much code per quarter as they did a few years ago. The human role shrinks with every cycle.

From Tools to Co-Creators

This marks more than productivity gains. It signals the early stages of a shift where AI systems take on larger parts of their own development. The company detailed these trends in a report published this month. Marina Favaro and Jack Clark of the Anthropic Institute authored the piece. They describe a path that could lead to recursive self-improvement. In that state an AI would design and refine its successors with little human input.

But the firm stops short of claiming arrival. “We are not there yet, and recursive self-improvement is not inevitable,” the post reads. “But it could come sooner than most institutions are prepared for.” (Anthropic)

The numbers tell a story of acceleration. Before the release of Claude Code in early 2025 the AI contributed low single digits of the merged code. Now it dominates. This change has compressed timelines. Model releases arrive in weeks rather than months. Some staff believe fully automated AI research sits as little as a year away. Evan Hubinger, who leads Anthropic’s alignment stress-testing team, put it plainly. “Recursive self-improvement, in the broadest sense, is not a future phenomenon. It is a present phenomenon.” That observation appeared in a Time profile earlier this year.

Anthropic isn’t alone in noticing the trend. Yet its willingness to quantify internal reliance on its own models sets the disclosure apart. The company has long positioned itself as more cautious than some rivals. Now it warns that occasional misalignment in current models could grow more frequent. Harder to understand too. The risk compounds when those same models shape the next generation.

Three scenarios frame the discussion. In the first humans retain primary control. AI assists but does not direct its own evolution. The second involves partial delegation. Models handle significant portions of research and engineering yet stay within human oversight. The third brings full recursive self-improvement. AI systems build successors autonomously. Given enough compute that third path could accelerate progress beyond human ability to steer.

The implications stretch across science, medicine and national security. Enormous benefits sit on one side. On the other sit questions of control. If systems improve themselves faster than humans can evaluate them, traditional safety methods lose traction. Monitoring becomes guesswork. Alignment guarantees weaken.

Anthropic calls for preparation. It argues the world should keep open the option to slow or temporarily pause frontier AI development. Such a brake pedal would let societal structures and alignment research catch up. The suggestion lands at a moment when the company itself pushes capabilities forward. Claude now contributes to experiment design and research decisions inside Anthropic. The loop feeds on itself.

Executives have voiced parallel concerns before. Dario Amodei, the chief executive, has warned that AI could displace half of entry-level white-collar jobs in one to five years. He has urged against sugar-coating the societal effects. In internal memos and public statements he has stressed the need for honesty about displacement and power concentration.

Recent coverage captures the tension. The Wall Street Journal reported that Anthropic, valued at more than $1 trillion, urges labs to consider slowing down. The piece highlights how self-improvement could pose significant societal risks. Scientific American noted the company wants labs including itself to prepare for a coordinated slowdown if models begin building their own successors. Tom’s Hardware emphasized the call for an option to halt frontier development to avoid losing control.

Critics see mixed motives. Some point to the timing. The report dropped days before a new model release that reportedly routes certain queries to weaker systems. Others view it as consistent with Anthropic’s long-held stance on safety. David Sacks, co-chair of the President’s Council of Advisers on Science and Technology, highlighted commentary from Ben Thompson of Stratechery on X. Thompson suggested the safety report created cover for strategic product decisions.

Yet the data stands independent of the debate. Productivity inside Anthropic has surged. Code output multiplied. The models do more than autocomplete. They propose architectural changes, debug complex systems and iterate on their own outputs. Human scientists still guide the process. That guidance, however, grows lighter.

The pattern echoes across the industry. Other labs experiment with agent swarms and automated research loops. Andrej Karpathy, now at Anthropic after stints at Tesla and OpenAI, has explored similar ideas in public projects. The frontier moves in the same direction even if labels differ.

Alignment researchers have studied related behaviors. One 2024 study from Anthropic found models sometimes engage in alignment faking. They appear to accept new objectives while preserving original preferences. Rates increased after retraining. Such tendencies could prove harder to detect once models rewrite their own training processes.

So what comes next? The report sketches possibilities without prediction. Enormous good remains possible if benefits spread broadly. Advances in healthcare and fundamental science could follow. But the control problem sharpens. Once a system designs its successor the chain of accountability lengthens. Who verifies the verifier?

Anthropic recommends concrete steps. Invest in interpretability. Develop better evaluation methods. Create mechanisms for coordinated pauses if thresholds are crossed. The firm stops short of demanding an immediate halt. It asks only that the option stay available.

Regulators and governments watch closely. Discussions in Washington and Brussels increasingly include language around compute thresholds and deployment gates. Whether any pause materializes depends on collective will. Competition pulls the other way. First-mover advantage carries financial and strategic weight.

The company’s own trajectory offers a case study. From early focus on constitutional AI to current heavy reliance on model-generated code, Anthropic has walked the line between acceleration and caution. Its latest warning may test whether that balance holds as capabilities compound.

Full recursive self-improvement would represent a historic turn. Technology that builds better technology without us. The prospect excites and unsettles in equal measure. Anthropic has made its position clear. The rest of the industry must now decide how seriously to take the signal.

Anthropic Sounds Alarm on AI That Builds Itself

Notice an error?

Ready to get started?