Vibe Coding vs Spec-Driven Development: Why Visual Specs Are the Third Way | 4ge Blog

The Binary That's Killing Your Codebase

You've seen the debate. It plays out on X every week.

On one side: the vibe coders. Andrej Karpathy coined the term in February 2025 — "vibe coding" — and meant it as a description, not an insult. You describe what you want in natural language, the AI generates code, you accept it with minimal review, you iterate. It's fast, it feels like the future, and for prototypes it works. Karpathy himself said he's "fully vibe coding" — letting the AI drive while he steers with English.

On the other side: the engineers. The ones who've watched vibe-coded software break in production. The AI-refactored legacy codebase that created 127 new bugs and cost $76,800 in lost productivity. The GitHub Copilot suggestion that passed code review and wiped user data — $47,000 in recovery costs. The Claude Code session that ran terraform destroy on production and took down an entire learning platform. These aren't hypotheticals. They're post-mortems.

Both sides are right. And both sides are missing the point.

Vibe coding works beautifully for exploration — for testing ideas, for weekend projects, for answering "what would this look like?" But it fails systematically for production software, where the cost of a bug isn't a frustrated afternoon but a page-on-call at 2am. And the engineering camp is right about the failure mode, but their prescription — more process, more planning, more documentation — creates the exact friction that drove people to vibe coding in the first place. Nobody reaches for AI code generation because they enjoyed writing the spec document.

There's a third way. It keeps the velocity of vibe coding. It adds the structure that production code requires. And it doesn't require you to write a 50-page PRD before touching a keyboard.

127 bugs

introduced by an AI-refactored legacy codebase that appeared to work. Vibed code doesn't announce its failures — it deposits them quietly for later.

What Vibe Coding Actually Is (And What It Isn't)

Let's be precise about this, because the discourse has gotten sloppy.

Karpathy's original definition is specific: you use natural language to describe intent, you accept AI-generated code with minimal review, and you iterate through conversation rather than manual editing. The key word is accept — you're not reviewing the code in the way you'd review a human PR. You're vibing with it. Does it look right? Ship it.

Simon Willison — one of the more thoughtful voices in this space — draws a sharp line between vibe coding and what he calls vibe engineering. Vibe coding is the irresponsible version: accepting AI output without understanding it. Vibe engineering is the professional version: using AI tools to accelerate your work while maintaining quality through review, testing, and your own engineering judgment.

This distinction matters, and Willison is right to make it. But in practice, the line between the two is razor-thin and gets thinner as AI tools improve. When your AI assistant starts running tests, iterating on failures, and proposing fixes autonomously — which is exactly what the latest generation of agentic tools does — the boundary between "I'm reviewing this carefully" and "the AI reviewed it for me" blurs fast. Willison himself noted in May 2026 that vibe coding and agentic engineering are "getting closer than I'd like."

The real insight: it's not about how carefully you review the output. It's about whether the AI had the right input — the right context, the right constraints, the right understanding of what you're building and why. A well-contextualised AI generates code you can trust. A poorly-contextualised AI generates code you have to review line by line — at which point you've lost the velocity that made vibe coding appealing in the first place.

Where Vibe Coding Breaks

Not eventually. Not theoretically. Specifically, structurally, and in ways that compound.

The Abstraction Ceiling

Vibe coding works when you're describing a feature the AI has seen ten thousand times in its training data. "Add user authentication with JWT." "Create a REST API for a todo app." The AI doesn't need context for these — it's generated them before. It can vibe its way to a reasonable implementation because the problem space is well-trodden.

The ceiling appears when you need something that's specific to your system. Not "add authentication" but "add authentication that works with our existing SSO provider, respects the role-based access rules we defined in Q1, and validates against the user state machine at src/auth/state.ts." That's not a vibe. That's a specification. And if you can't provide it — if the context exists only in your head — the AI will generate the generic version and you'll spend hours refactoring the result.

This is the pattern: vibe coding excels at the generic and fails at the specific. Which is exactly backwards for production software, where the specific is where the bugs live and the value lives.

The Consistency Problem

Here's the thing about vibe coding that nobody talks about: you're not vibing once. You're vibing across sessions, across features, across weeks and months. And each vibe session starts from scratch.

Session one: your AI generates an authentication flow with middleware-based token validation. Reasonable! Clean! Ships!

Session two (new session, same project): your AI generates a separate auth guard for a different route — because it doesn't know the middleware exists. Also reasonable! Also clean! Incompatible with session one.

Session three: someone asks for a role check on the admin route. The AI adds yet another auth mechanism. Because it doesn't know about the middleware or the auth guard.

Three authentication flows. Three reasonable AI suggestions. One contradictory mess. This is how cognitive debt compounds — and vibe coding is an engine for producing it, because every session is an amnesiac rerun of the last.

The Review Paradox

The most common defence of vibe coding is: "I review the code before I ship it." And sure, if you're reviewing every line — understanding every function, tracing every dependency, validating every architectural assumption — then vibe coding is just... coding with extra steps. You've removed the velocity benefit entirely.

But that's not what most people do. They scan the output. The code looks clean. The names are consistent. The tests pass. Ship it. This is the complacency trap: AI-generated code looks better than it is. The syntax is flawless. The architecture might be hollow. And code review was designed to catch code that looks wrong — not code that looks perfect but contradicts patterns established three sessions ago.

$47,000

in recovery costs from a GitHub Copilot suggestion that passed code review and wiped user data. The code looked right. That's the problem.

Where Traditional Engineering Breaks Too

Here's what the engineering camp doesn't want to admit: the old process doesn't scale to AI-native development either.

The traditional answer to vibe coding is: write a spec first. Plan before you code. Document your architecture. Define your interfaces. Then implement against the spec.

This is correct in principle and broken in practice for 3 reasons:

1. Specs rot. You write ARCHITECTURE.md in January. By March, the codebase has diverged from it and nobody remembers to update it. The AI reads the stale spec and generates code based on wrong assumptions — which is worse than no spec, because at least no spec is obviously unreliable. A stale spec is reliability theatre.

2. Specs are expensive to write. The 50-page PRD takes a week. The technical design doc takes three days. Nobody has that kind of time in a startup shipping weekly, and the larger the organisation, the more process accumulates around the spec until the spec becomes the bottleneck, not the code.

3. Specs are text. Even well-written specs are linear prose. They describe user flows as paragraphs. They describe system architecture as bullet points. And paragraphs and bullet points are terrible at revealing the things that matter most: the edge cases, the broken paths, the "what happens when the payment fails AND the user's session expires AND the retry logic fires?" moments. Those are invisible in text. They're obvious on a diagram.

The traditionalists are right that you need structure. They're wrong that the structure should look like a Word document.

The Third Way: Spec-Driven Development

What would it look like to keep the velocity of vibe coding and the structure of engineering? Not as a compromise — as a different approach entirely.

Spec-driven development starts from a different premise. Instead of "vibe first, correct later" or "spec first, code later," it's: make the spec the first thing the AI sees, not the last thing you write.

The spec isn't a document. It's a living blueprint that carries your project's intent — architecture decisions, business logic, edge cases, tech stack constraints — into every AI session. It's structured, not prose. It's visual, not textual. It catches the things text misses. And it's available to the AI before it generates a single line of code.

Visual, Not Textual

This matters more than it sounds. A user flow drawn on a canvas reveals broken paths that a requirements.md hides. You can see that the "payment failed" state has no exit path. You can see that the auth flow dead-ends when SSO returns an error. These are the bugs that eat weekends — and they're invisible in bullet points.

Visual planning isn't a cosmetic preference. It's a different cognitive mode. Diagrams expose structural problems the way paragraphs can't. (If you've ever reviewed a perfectly reasonable spec document and then discovered in production that nobody handled the "password reset email bounces" edge case — you know this already.)

Spec Before Code, Not After

The critical difference between SDD and both vibe coding and traditional engineering is when the spec exists.

Vibe coding: code first, spec never (or spec as documentation afterward, which nobody updates).

Traditional engineering: spec first as a Word document, then code against it (until the spec diverges from reality and becomes stale).

Spec-driven development: spec first as a living structure that lives alongside the code, versioned in git, updated when reality changes, and — here's the critical bit — consumed by the AI as input, not generated as output.

When the AI reads your spec before generating code, it doesn't guess. It doesn't vibe. It has the architectural context it needs to generate code that fits your system — not just code that passes tests. The spec becomes the context engineering layer that transforms an AI session from "delegating architecture" into "delegating implementation."

Edge Cases as First-Class Citizens

This is where SDD separates from both vibe coding and traditional specs. Neither vibe coding nor text-based specs are good at catching edge cases — vibe coding because the AI doesn't think about them, text specs because edge cases are invisible in paragraphs.

4ge's Adversarial AI Feedback Engine stress-tests your plans before code is written. It surfaces broken flows, identifies gaps in error handling, and challenges assumptions. The vibe coding approach assumes the first version is probably fine. The traditional approach assumes the spec captured everything important. SDD assumes neither — and uses AI to challenge the plan before you commit to it.

Where the Third Way Wins

Not everywhere. Let's be honest about the tradeoffs.

Use vibe coding for: Prototypes. Experiments. "What would this look like?" moments. Weekend projects. Anything where the cost of a bug is a frustrated afternoon, not a prod incident. Vibe coding is excellent for exploration — that's not a flaw, that's a feature.

Use traditional engineering for: Regulated industries. Compliance-first environments. Systems where the cost of failure is measured in millions of dollars and human safety. When you need formal verification, you need formal process. AI doesn't change that.

Use spec-driven development for: Everything in between. Which is most software. Production SaaS products. Growing codebases that need to stay coherent across sprints and team members. AI-native development where the bottleneck has shifted from "can the AI write the code?" to "does the AI have the context to write the right code?"

60%

of AI development time lost to rework from poor requirements. The spec gap isn't a minor inconvenience — it's the single largest source of waste.

The Implication

The vibe coding debate is a false binary. It's not "fast and sloppy" versus "slow and careful." It's "contextualised" versus "decontextualised." And the teams that figure out how to give their AI the right context — not through longer prompts, but through persistent, structured specifications that carry architectural intent — will build faster and better than teams on either extreme.

The question isn't whether to vibe or to spec. The question is: does your AI know what you're building before it starts writing code? If yes, you're doing spec-driven development — whether you call it that or not. If no, you're vibe coding — whether you admit it or not.

Don't vibe what you can't afford to break.

Ready to give your AI the context it needs? 4ge is a visual workspace that generates AI-ready specifications — so your assistant builds what you actually want, not what it guesses.