TL;DR

Google’s May 2026 whitepaper, “The New SDLC With Vibe Coding,” argues that software development is shifting from writing code to directing AI systems with clearer intent and stronger verification. The paper says the model is only a small part of agent performance, while prompts, tools, context, tests, evals and oversight carry much of the practical risk and value.

Google has published a May 2026 whitepaper, The New SDLC With Vibe Coding, that argues the central change in software engineering is a move from manually writing code to expressing intent and verifying machine-generated output, a shift that matters as the paper says AI coding agents are now used regularly by most professional developers.

The paper, written by Addy Osmani, Shubham Saboo and Sokratis Kartakis, reports that 85% of professional developers regularly use AI coding agents, 51% use them daily, and about 41% of new code is AI-generated. Those figures are presented by the authors as evidence that AI-assisted development has moved from experimentation into routine software work.

The whitepaper’s main claim is that model choice is only a small part of agent performance. It describes a coding agent as the model plus its harness: prompts, rules, tools, context policies, hooks, sandboxes, observability, sub-agents and verification systems. According to the paper, teams often misread agent failures as model failures when the cause is a weak or poorly configured harness.

The paper also draws a line between casual “vibe coding” and more disciplined agentic engineering. It says quick prompts and informal checks may suit prototypes or disposable scripts, while production systems require specifications, automated tests, evals, CI gates and human judgment over architecture and risk.

AI Dispatch · Field Notes

Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified

Vibe Coding

Casual prompts · “does it seem to work?” · disposable code · high risk

Structured AI-Assisted

Detailed prompts + constraints · manual testing · features in real codebases

Agentic Engineering

Formal specs · automated tests + evals + CI gates · production scale · low risk

Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.

The idea worth building your strategy around

Agent = Model + Harness

~10%

HARNESS — prompts · tools · context · hooks · sandboxes · observability

MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S

Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.

“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.

The economics: it’s a token-cost problem (CapEx vs OpEx)

Vibe Coding

Low CapEx · High OpEx

Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.

Agentic Engineering

High CapEx · Low OpEx

Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.

85%

of devs use AI coding agents (51% daily)

41%

of all new code is AI-generated

~90%

of agent behavior is the harness, not the model

+19%

longer on some tasks (METR) — verification is the cost

The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.

thorstenmeyerai.com

Agent Design Becomes Strategy

The paper matters because it shifts attention from which AI model a team buys to how the team builds the system around that model. If the authors’ framing is right, companies that treat AI coding as a model procurement problem may miss the bigger source of performance, cost control and reliability.

For engineering leaders, the practical message is that verification now carries more weight. Deterministic tests can check whether software returns expected outputs, while evals are needed for less predictable agent behavior, such as tool choice, reasoning path and quality thresholds. The paper argues that without both, teams are still close to vibe coding even when their prompts are detailed.

The cost argument is also material. The source analysis says casual AI coding can appear cheap at first, but may create higher operating costs through repeated fix loops, maintenance burden and security cleanup. By contrast, the whitepaper’s agentic engineering model asks teams to spend more up front on specs, evals, context and routing, then recover savings through higher first-pass success and cheaper model use for simpler work.

Amazon

AI coding verification tools

As an affiliate, we earn on qualifying purchases.

From Vibe Coding To Gates

The whitepaper responds to a term that spread quickly after Andrej Karpathy used “vibe coding” in February 2025 to describe a looser style of prompting AI, accepting outputs and feeding errors back until the result works. The Google paper treats that as one end of a spectrum rather than a label for all AI-assisted development.

At one end, the paper places casual prompting with minimal review. At the other, it describes agentic engineering: AI implementation inside formal constraints, automated checks and human oversight. The difference, according to the authors, is not whether AI is used but how much structure surrounds its output.

The source material also notes that Google’s framing is mostly tool-agnostic while still pointing readers toward Google’s own AI development products and ecosystem. That means the technical argument can be useful beyond Google, but the commercial direction of the paper is part of the story.

“Generation is solved; verification, judgment, and direction are the new craft.”
— Osmani, Saboo and Kartakis, Google whitepaper

Amazon

software testing automation tools

As an affiliate, we earn on qualifying purchases.

Adoption Figures Need Scrutiny

Several details remain dependent on the whitepaper’s own data and cited sources. The reported 85% regular usage rate, 51% daily usage rate and 41% AI-generated code figure are presented in the source material as the paper’s numbers, but the underlying survey design, sample and definitions are not fully detailed in the provided material.

It is also not yet clear how widely the 10% model and 90% harness split applies across teams, languages, codebases and risk levels. The benchmark examples cited in the source material show that harness changes can sharply improve agent performance, but benchmark gains do not automatically translate into production reliability.

The economic claim is also still developing. The source analysis says casual AI coding can cross into much higher per-feature costs, while agentic engineering can lower operating expense after an upfront investment. Actual results will vary by team maturity, code quality, security needs and the amount of legacy software involved.

Amazon

AI development environment with CI/CD

As an affiliate, we earn on qualifying purchases.

Teams Face Verification Decisions

The next step for software teams is likely to be less about adopting AI coding tools in general and more about deciding how to govern them. That means setting rules for when agents can modify production code, what tests and evals must pass, how context is supplied, and when human review is required.

Vendors, including Google, are expected to keep turning these workflow ideas into product paths for agent tooling, model routing and automated evaluation. Buyers will need to separate broadly useful SDLC practices from vendor-specific on-ramps.

For readers following AI in software development, the key question is whether the industry can make AI-generated code easier to verify than to produce. The whitepaper’s answer is that production use depends on the scaffolding around the model, not the model alone.

Amazon

AI prompt engineering tools

As an affiliate, we earn on qualifying purchases.

Key Questions

What is the actual news development?

Google published a May 2026 whitepaper, The New SDLC With Vibe Coding, arguing that AI-assisted software development now requires stronger verification, clearer intent and better agent harnesses.

What does the paper mean by the model being only 10%?

The paper’s framework says agent behavior is shaped largely by the surrounding harness: prompts, tools, context, rules, sandboxes, observability, tests and evals. The 10% figure is a rough framing, not a universal measurement.

Is vibe coding the same as AI-assisted development?

No. In the paper’s framing, vibe coding is the casual end of a spectrum, where prompts and informal checks may be enough for prototypes. Production work requires more structure and verification.

What is confirmed right now?

The confirmed development is the publication of the Google whitepaper and the arguments and figures attributed to it in the source material. Broader claims about costs, productivity and long-term reliability remain dependent on team-specific evidence.

Why should engineering leaders care?

The paper suggests that the largest gains may come from better process design around AI agents rather than simply switching models. That affects budgets, team practices, risk controls and software quality.

Source: Thorsten Meyer AI

Wellness content on this site is informational and not a substitute for professional medical guidance.