AI Codes Fast, But Can't Scale the Real Engineering Walls

Lowe’s CEO Marvin Ellison drew a sharp line at the Shoptalk 2026 conference. AI can write code. It can’t climb a 12-foot ladder. His point landed amid hype about tech job wipeouts. But the analogy sticks for software too. Machines churn lines. Humans handle the mess.

Ellison’s words echo broader doubts. Yahoo Finance captured the moment, tying it to labor shifts. Harvard Business School research backs the nuance. Post-ChatGPT, postings for repetitive tasks dropped 13%. Demand for analytical tech roles rose 20%. Finance and technology saw the biggest cuts. Yet augmentation wins. Humans plus AI, not AI alone. “Generative AI creates new demand in augmentation-prone roles,” says Suraj Srinivasan in Harvard Business Review. Software engineering fits here. Code generation surges. Judgment doesn’t.

Take the numbers. Big Tech plans $720 billion in AI capex this year. Amazon alone commits $200 million. Lowe’s skips the frenzy, pledging $250 million to train 250,000 tradespeople by 2035. Why? Construction faces a 41% retirement wave in five years. It needs 350,000 workers for infrastructure. Physical limits. Digital ones mirror them.

AI coding tools flood teams. Developers accept 27-30% of suggestions. They reject 70%. GitHub Copilot slips 29.1% potential security flaws into Python. That’s from a Medium analysis of 2025 trends. Developers review every snippet. Manually. 75% do. Trust hovers at 46% “almost right but not fully correct.” Speed? Patchy.

A METR study hit hard. Sixteen pros tackled real open-source bugs from million-line repos. They predicted 24% faster with AI like Cursor and Claude 3.5 Sonnet. Post-task, they swore 20% gains. Reality: 19% slower. Tasks passed code review standards. Developers had LLM hours under their belts. Still, overhead won. Non-productive prompting. Tool friction. The gap reveals illusion. METR’s blog details it.

Early vendor studies glowed. GitHub, Google, Microsoft clocked 20-55% task speedups. Bain & Company called real savings “unremarkable.” Developers code 20-40% of time. Gains dilute. Stanford saw young developer jobs drop 20% from 2022-2025. Correlation? Maybe. MIT Technology Review rounds up the discord.

SWE-bench tells part. OpenAI’s benchmark started at 33% bug fixes in 2024. Now above 70%. Progress. But benchmarks lie. Real repos? Context windows choke on millions of lines. Logical knots baffle. Long-term design? Absent. A Cornell-MIT-Stanford-Berkeley paper nails it. AI falters on sweeping scopes, extended contexts, complexity, planning. “Programming without these tools just feels primitive,” says MIT’s Armando Solar-Lezama. Yet collaboration lags humans. IEEE Spectrum breaks it down.

Pragmatic Engineer surveyed 900+. Thirty percent hit token limits. Even on $200/month plans. Builders debug AI bugs nonstop. “AI slop” floods reviews. Shippers rush debt. Coasters upskill but overwhelm. Costs mount. $100-200 per head monthly. Unsustainable, say 15%. “I ran up $600 bills with Cursor,” one CPTO gripes. Firms switch to Claude Code at $100. Europe balks more. The Pragmatic Engineer maps the trends.

Anthropic’s work stings. AI impairs understanding. Seventeen percent score drop learning libraries. Sub-40% when AI codes all. Zero speed edge. Prompting skips thinking. Gaps hit production. X posts amplify: Priyanka Vergadia flagged the paper. Mel Andrews called gains insignificant.

Teams adapt. Coinbase hits 90% on refactors, tests. Flat elsewhere. GitClear spots 10% more durable code since 2022. Quality dips. Review queues balloon. LinearB notes 30% more PRs, 2% feature releases. Bottleneck shifted. Not code. Delivery.

Security? Up 23.7% vulnerabilities in AI code. Hallucinations persist. Nonexistent packages. Conventions ignored. Bill Harding of GitClear: “AI has this overwhelming tendency to not understand existing conventions.” Tech debt piles. Juniors struggle spotting confident wrongs.

But. Builders refactor faster. Agents handle P1s in days, 2,500 lines. Solo devs thrive. Managers code again. Pie expands. “I ship more quality code faster,” a staffer boasts. Orchestration rules. What to build. Not how.

Harness report flags risks. Faster code, lagging DevOps. More rework, burnout. Stack Overflow: 65% weekly users. Trust falls.

So AI writes code. Fine. It scales prototypes, boilerplate, bugs. Fails architecture. Integration. Production grit. Like Ellison’s ladder. Digital walls loom. Experienced hands climb them. For now.

Invest in humans. Reskill for judgment, oversight. Measure comprehension, not lines. Costs will bite. Limits too. True gains demand balance. AI assists. Doesn’t replace. Engineering endures.

AI Codes Fast, But Can’t Scale the Real Engineering Walls

Notice an error?

Ready to get started?