Why AI Masters Code but Stumbles on Taste: OpenAI's Codex Leader on Design's Enduring Edge

Andrew Ambrosino leads the team behind OpenAI’s Codex desktop app. Nearly 100% of OpenAI employees use it weekly. Not just the engineers. The surge in adoption has been dramatic. Internal data released this week shows non-developer usage exploding since last August.

Yet when the conversation turns to design, Ambrosino grows cautious. He appeared on Lenny’s Podcast recently and laid out a core friction. Business Insider captured his blunt assessment. “I think design’s a little bit harder to grade than software,” he said. “Creating a loop where you can train the model on what’s good design and what’s bad design is just a little bit more tedious and onerous than, you know, does the code compile?”

Short pause. The difference matters. Code either works or it doesn’t. Design lives in taste. In judgment. In the hard-to-quantify feel that separates the memorable from the merely functional. Ambrosino didn’t mince words. “Let’s give it up for the human brain for now.”

His counterpart in the design world echoes the view. Dylan Field, CEO of Figma, has argued for months that AI won’t displace human designers. Models train on the distribution of existing data. They produce outputs that feel average. Field made the point during a “Hard Fork” live event. AI generates what people already recognize. It struggles to break new ground.

This tension sits at the heart of today’s product teams. Tools like Codex have transformed how code gets written. OpenAI’s own research paper, published days ago, quantifies the shift. OpenAI reports that through August 2025 the average employee spent less than 10% of tokens on Codex. By June 2026 every department treats it as the primary AI tool. It accounts for over 85% of output tokens per worker and 99.8% company-wide.

Non-developer adoption grew even faster. Individual non-developer users jumped 137 times. Organizational non-developer usage climbed 189 times. Inside OpenAI itself the multiplier hit 12 times. Tasks once limited to engineers now stretch across legal, finance, recruiting. Agents handle multi-hour work. Nearly a quarter of Codex requests now exceed one hour of human effort. Some power users generate more than 60 hours of agent work daily.

But. The same systems that accelerate code review, bug fixes and automation hit a wall on subjective decisions. Prototypes look polished. They seduce. Yet they often rest on shallow assumptions. Teams risk shipping before the underlying taste has been stress-tested.

Field sees this dynamic creating a new competitive reality. In his October 2025 conversation on Lenny’s Newsletter he called taste the new moat. Lenny’s Newsletter. When anyone can generate interfaces or marketing assets in minutes, the ability to discern quality, to refine craft, to insist on coherence becomes the differentiator. Startups that treat design as afterthought will drown in a sea of competent but forgettable output.

Ambrosino’s team discovered this internally before the Codex app even launched publicly. Early dogfooding in January and February 2026 revealed strong product-market fit for engineering and research workflows. Engineers were thrilled. Researchers too. Yet design tasks required heavier human guidance. The feedback loop for taste proved slower and more expensive to close.

Recent coverage reinforces the pattern. A June 29 report from Business Insider ties the podcast remarks directly to broader questions about creative jobs. AI handles boilerplate. It accelerates iteration. Human taste still sets direction. Musician Bas put it plainly at a 2024 Harvard event: a certain level of human taste remains required to prompt outputs that audiences will actually enjoy.

Industry watchers have begun tracking downstream effects. Productivity studies on AI coding tools show mixed results. Some developers report speed gains on familiar tasks. Others, especially on complex unfamiliar codebases, see slowdowns when debugging AI suggestions. A METR study found experienced developers 19% slower in certain scenarios. The cost of reviewing and correcting generated code can offset initial gains.

So what does this mean for product leaders in 2026? Teams must split work differently. Let agents own the predictable, measurable pieces. Reserve human attention for the ambiguous, high-stakes judgment calls. That division isn’t trivial. It demands new processes, new evaluation methods, new ways of pairing human intuition with machine scale.

OpenAI’s data suggests the frontier is moving fast. Agents already tackle tasks that once required days. They cross departmental lines. A recruiter can now automate data transformations that once waited on engineering tickets. Legal teams generate and review drafts at speeds previously unimaginable. The economic potential, the research paper argues, expands as these tools improve and reach more workers.

Yet the ceiling remains visible. Design systems, brand expression, user empathy, those elements resist pure automation. They require taste honed through deliberate practice. Field has said design matters more than ever in an AI world. The proliferation of generative tools doesn’t diminish craft. It elevates it.

Executives ignore this split at their peril. Companies flooding the market with AI-generated interfaces will compete on polish that feels interchangeable. Those investing in refined taste, in quality that can’t be easily replicated, will stand apart. The moat isn’t the tool. It’s the discernment applied to the tool’s output.

Ambrosino’s frank admission carries weight precisely because it comes from inside the company pushing the frontier hardest. OpenAI employees live the productivity gains daily. They also encounter the limits. When even the builders concede that design grading resists easy optimization, the message lands. Human judgment isn’t a temporary bottleneck. For the foreseeable future it remains the critical input.

The next phase of product work will likely look hybrid by necessity. Agents for volume and speed. Humans for direction and refinement. Getting the balance right will separate organizations that merely adopt AI from those that compound its strengths with irreplaceable human strengths. Taste, after all, doesn’t compile. But it decides what gets built. And what gets remembered.

Why AI Masters Code but Stumbles on Taste: OpenAI’s Codex Leader on Design’s Enduring Edge

Notice an error?

Ready to get started?