Why Text Specs Have the Same Problem as Vibe Coding | 4ge Blog

The Uncomfortable Realisation

I wrote text specs for 6 years. Markdown files, mostly — a requirements.md in the project root, a design.md next to it, maybe a tasks.md with checkboxes I'd never actually tick. When OpenSpec shipped I was an early adopter. The /opsx:propose workflow felt like a genuine upgrade — structured, fast, integrated with every AI tool I used. I recommended it to everyone.

And I still do. OpenSpec is good. GitHub Spec Kit is good. CLAUDE.md files are good. They're meaningfully better than no spec at all — and we've made the case for spec-driven development over vibe coding clearly enough that I don't need to repeat it here.

But here's the thing I didn't want to admit: my text specs kept missing the same class of bugs that my vibe-coded features missed. The payment failure that nobody specified. The empty state that nobody described. The error path that existed in the system but not in the spec. The text spec said "process payment." It didn't say what happens when payment fails. And the AI — doing exactly what the spec told it to — didn't implement failure handling.

Same bug. Same root cause. Just with a more respectable document attached.

The text spec was better than no spec. It wasn't enough. And the reason it wasn't enough isn't that I wrote bad specs — it's that the format has a structural limitation that no amount of diligence can overcome.

The number of spec-driven development tools in the top 10 search results that use a visual canvas as their primary interface. Every single one is text-first. They all share the same structural limitation.

The Three Problems Text Specs Share With Vibe Coding

I'm not going to pretend text specs and vibe coding are equivalent. They're obviously not — one is structured intent, the other is vibes. But they share three failure modes that matter, and understanding these is the key to understanding why text specs are a step forward, not the destination.

1. Happy-Path Bias

This is the big one. The adversarial AI article covers this in depth at the code level — AI-generated code handles the happy path beautifully and breaks everywhere else. But the problem starts upstream, in the specification.

Text specs describe what should happen. They're written linearly — Step 1, Step 2, Step 3. The user registers, the system validates, the account is created, the welcome email is sent. Clean, sequential, logical.

What's missing: the seventeen things that happen when Step 2 fails for one of five different reasons. The user who registers with an email that already exists. The welcome email that bounces. The registration form submitted three times because the user clicked the button twice. The account that's "created" but never verified because the verification email went to spam.

None of these are exotic. They're the immediately-obvious-in-hindsight failures that every registration flow has. But they're hard to catch in a text spec — not because the writer is careless, but because the format makes them invisible.

A text spec is a line. Software is a graph. The line describes the path from A to B. The graph contains every path from A to everywhere — including the ones nobody wrote down because they weren't on the main path.

Vibe coding has the same problem, just worse. The developer doesn't specify anything — so the AI implements whatever the training data suggests, which is almost always the happy path. The text spec writer specifies the happy path explicitly — so the AI implements exactly what was specified, which is almost always the happy path. Different mechanism, same result.

The severity is different. The text spec at least forces you to think about the feature before building it. You'll catch some issues during the writing process. But the happy-path bias of the format itself — the fact that text is linear and software is concurrent — means you'll miss the same class of edge cases, just with a more respectable spec document attached.

2. Context Loss Between Sessions

This one sneaks up on you. The text spec is supposed to be the persistent context layer — the thing that survives when your Cursor session evaporates and you have to start fresh the next morning. And it does survive, physically. The requirements.md is still there. The OpenSpec folder is still there. The CLAUDE.md is still there.

But the context that generated those specs — the architectural reasoning, the tradeoffs you considered, the alternatives you rejected, the production incidents that made certain constraints non-negotiable — that context exists only in your fading memory or in the AI's evaporated chat history. The spec says "validate orders before checking inventory." It doesn't say why. It doesn't say what happens if someone reorders the validation. It doesn't mention the 3-day manual-refund incident that made this ordering a hard constraint.

Tomorrow morning, you open Cursor and feed it the spec. The AI reads "validate orders before checking inventory" and treats it as a formatting preference — not as a constraint backed by a production incident. When it later suggests "optimising" the validation sequence, it doesn't know there's a reason not to. The spec captured the what. It lost the why.

This is the same context-loss problem that makes vibe coding feel like betrayal. The AI that understood your project perfectly at 6pm greets you like a stranger at 9am. The text spec mitigates this — you don't have to re-explain everything from scratch — but it doesn't solve it. The spec document is a shadow of the context that produced it. The reasoning didn't survive the format.

3. Architectural Delegation to the AI

Here's the problem that's hardest to see when you're inside it: a text spec delegates architectural decisions to the AI.

Not explicitly. Not intentionally. But structurally. When your spec says "add Stripe billing to the app," the AI has to make dozens of decisions that weren't in the spec: where to put the webhook handler, which existing utilities to reuse, what error handling pattern to follow, whether to create a new API route or extend an existing one. The spec didn't specify these things because the format doesn't naturally accommodate them — a requirements.md isn't the right place to specify file paths and import statements.

So the AI makes those decisions. It invents a webhook handler from scratch instead of checking whether you already have one. It creates a new config file instead of finding the existing Stripe config. It writes its own error types instead of using your AppError class. The code works. It doesn't fit your architecture.

Vibe coding does the same thing, just more obviously. When you prompt "add Stripe billing" with no spec at all, the AI makes every architectural decision from scratch. The text spec constrains some of those decisions — the ones you thought to specify. The ones you didn't think to specify still get delegated.

The difference is degree, not kind. Vibe coding delegates most decisions. Text specs delegate the decisions you didn't think to write down. Both produce code that works but doesn't fit.

Why Text Specs Are Still a Step Forward

I need to be clear about this, because the argument I'm making sounds like "text specs are no better than vibe coding" — and that's not what I'm saying.

Text specs are a real improvement over no specs. They force you to think before you build. They create a record of intent. They give the AI better context than a vague prompt. They catch some issues during the writing process itself — the act of putting something into words reveals gaps in your thinking. They exist in your git history, which means they're versioned, reviewable, and persistent in a way that chat sessions aren't.

OpenSpec's delta system — ADDED/MODIFIED/REMOVED/RENAMED markers — is genuinely clever. It solves a real problem: how do you update a spec without rewriting the whole thing? The three-layer instruction system (Context → Rules → Template) dynamically adjusts AI guidance based on project state. These aren't minor features. They're architectural decisions that make the tool better.

CLAUDE.md files give Claude Code a persistent memory of your project's conventions — tech stack, naming patterns, architectural rules. That's a real Layer 3 context mechanism. It's better than re-explaining your project every morning. Not because it captures everything — it doesn't — but because it captures something, and something beats nothing.

The complete guide to context engineering makes this layered model explicit: immediate context (the current prompt), session context (the conversation so far), and project context (the persistent knowledge about your system). Text specs are a Layer 3 mechanism. They persist. They compound. They're a floor you can build on.

But they're a floor. Not the ceiling.

The Linear Thinking Trap

Here's the structural limitation that makes text specs insufficient — and it's the one that's hardest to see if you've been writing prose specifications your whole career.

Text is linear. You read it top to bottom. You write it top to bottom. It describes sequences: first this, then that, then the other thing. The structure is inherent in the format — sentences follow sentences, paragraphs follow paragraphs, headings are arranged in a hierarchy from general to specific.

Software is not linear. Software is a graph.

A user doesn't proceed from registration to onboarding to feature usage in a neat line. They register, get distracted, come back three days later, skip onboarding, try a feature, fail, go back to onboarding, complete half of it, get an error, open a support ticket, and then — finally — get to the feature they wanted in the first place. That's not a line. That's a directed graph with cycles, dead ends, and unexpected path convergences.

When you try to describe a graph in a linear document, you make a choice: you either describe the main path (happy path bias) or you try to enumerate every possible path (which is unreadable and still probably incomplete, because the graph has more paths than you can enumerate in text).

This isn't a diligence problem. It's a representation problem. Some structures are simply better represented visually than textually. A subway map is a good subway map because you can see the connections. Describing the same map in text — "take the red line three stops, transfer to the blue line, go two stops, then either continue on the blue line for one more stop or transfer to the green line depending on whether you're heading east or west" — is technically accurate and practically useless. The information is the same. The comprehension is not.

Software is a subway system. Not a paragraph.

What linear thinking actually costs

I started tracking this after the 4th time in 9 months I found an edge case in production that my spec had missed. The pattern was consistent:

I'd write a text spec describing the feature's main flow. Well-structured, covering the obvious cases.
The AI would implement the spec faithfully. The happy path would work.
Users would find the unhappy paths — the ones that were invisible in the linear document because they branched off the main sequence at points I hadn't thought to enumerate.
I'd add the missing path to the spec. Update the code. Ship the fix.
Repeat. Each cycle, the spec got longer and harder to read, because a linear document describing a graph gets exponentially more complex as you add branching paths.

The compounding effect: specs that start clear and useful become sprawling documents that nobody reads carefully — because reading a tasks.md with fourteen exception cases embedded in it is harder than reading a visual flow where the exception cases are visible as branches.

The text spec didn't fail because I was a bad writer. It failed because the format fights the structure it's trying to represent.

Where Text Specs Are the Right Tool

I'm not making the case that text specs are always wrong. They're often right — for the right job.

Simple, well-understood features. You're adding a dark mode toggle. A settings page. A CSV export. The feature has two states (on/off, export/no-export) and zero branching flows. A text spec is perfect — fast to write, easy to review, more than sufficient. Dragging boxes on a canvas for this would be overkill, and I'd be the first to say so.

Incremental changes to existing features. You're adding a field to an existing form. Extending an existing API endpoint. Adding a new optional parameter. The architecture is already established. The edge cases are already handled (or not — but adding a text-specified incremental change won't make them worse). Write it in Markdown and move on.

Documentation for humans. Architecture Decision Records. Onboarding guides. Runbooks. These are written for people who need to understand the reasoning, not for AI assistants that need to execute the implementation. Text is the right format for these. Always has been, always will be.

When you already know the edge cases. This is the key distinction. If you can enumerate every important path through the feature — if you know what happens when the payment fails, when the user navigates back, when the session expires — then a text spec captures what you already understand. The format doesn't need to help you discover something you already know.

The problem is: most features are more complex than you think they are when you start writing the spec. And the format doesn't help you discover that.

The Visual Spec as the Completion of the SDD Promise

The vibe coding vs SDD article framed spec-driven development as the "third way" — not the chaos of vibe coding, not the overhead of traditional planning, but something that keeps the speed and adds the structure. That framing is right. But it's incomplete, because it didn't distinguish between text specs and visual specs.

Here's the completion of that argument:

Vibe coding → text specs → visual specs is the progression. Each one is a genuine improvement over the previous one. None of them is invalid. But each one solves problems that the previous one couldn't.

Vibe coding solves the problem of "I need to build something now." It's fast. It's fun. It produces working code. It doesn't produce reliable code, maintainable code, or code that handles edge cases — because there's no specification at all.

Text specs solve the problems of "I need to think before I build" and "I need context that survives the session." They force deliberation. They create a persistent record. They give your AI assistant better input than a raw prompt. They catch some issues during the writing process. They're an unambiguous improvement over vibe coding.

But text specs don't solve the problem of "I need to see the system, not just read about it." They don't solve the representation problem — the fact that software is a graph and text is a line. They don't solve the happy-path bias that's structural to the format. They don't close the gap between what you specify and what you miss.

Visual specs do.

A visual spec isn't a replacement for text — it's a different representation of the same information, one that makes certain properties visible and others less prominent. Just like a subway map makes connections visible while abstracting away exact distances, a visual spec makes branching flows and error states visible while abstracting away implementation details.

When you draw a user flow on a canvas, the gaps are spatial. The missing error state is a blank space between "payment processing" and "order confirmed" where the failure path should branch off. You don't have to remember to write it down — you have to miss it on the diagram for it not to appear in the spec.

When you combine the visual representation with adversarial feedback — an AI that actively probes your spec for gaps — you get something that neither vibe coding nor text specs can produce: a specification that catches its own errors before code is written.

This is the completion of the SDD promise. Not just "write specs before code" — but "write specs that catch what text can't." The movement was right about the direction (specify before you build). It stopped one step short of the format (specify visually, not just textually).

The Honest Trade-off

Visual specs aren't free. They require a different tool than your text editor. They take slightly longer to create for simple features (where they're overkill). And they require learning a canvas interface instead of typing in a terminal — which, for developers who live in the CLI, is a real workflow shift.

But here's the trade-off I keep coming back to. In my experience, an edge case caught at the spec stage costs maybe 30 seconds to fix. Same edge case caught once code exists? 4 hours — you're reading someone else's code, figuring out where the handler goes, writing the fix, updating the tests. Same edge case caught in production? Days. Literally days — the incident, the rollback, the fix, the deployment, the follow-up. The question isn't whether the visual spec is more work upfront. It's whether the work upfront saves you 10x later.

For simple features: text specs are fine. Write them and build.

For complex features — the ones with multiple user flows, error states, conditional logic, and branching paths — the text spec's linear structure is fighting the graph you're trying to represent. The visual spec works with the structure instead of against it. The gaps become visible instead of invisible.

And the spec you can see is the spec that shows you what you missed — before the AI builds it and your users discover it for you.

4ge is a context engineering platform — a visual workspace where specs are built on a canvas, not in a terminal. Edge cases are visible as gaps in the diagram, not invisible omissions in a document. See how visual specs catch what text misses →