The Most Valuable Document in Your Codebase Isn't Code
A lead developer I know spent 3 hours in a code review meeting defending a decision she'd made 4 months ago. Not because the decision was wrong — it was right, and she could prove it. She spent 3 hours because nobody else in the room knew it was right. The decision was embedded in the code. The rationale was in her head. And she was the only person who could connect the two.
Four months earlier, she'd used Cursor to generate the initial implementation of the billing module. The AI suggested validating orders before checking inventory. She accepted — looked reasonable — and moved on. Two weeks later, a production incident revealed that the ordering mattered: validating after inventory reservation created a race condition that double-charged customers. The fix was urgent. The documentation was... nothing. A Slack thread that had been archived. A commit message that said "fix billing validation ordering." No record of why the original ordering was wrong, why the fix was right, or what would happen if someone "optimised" it back.
She wrote an Architecture Decision Record that afternoon. Two paragraphs. Took maybe 4 minutes. And the next time someone in a code review asked "why does validation run before inventory?" — she didn't have to spend 3 hours defending it. She pointed to the ADR. The meeting moved on.
To write an Architecture Decision Record that prevents a 3-hour code review debate. The ratio is not subtle — ADRs might be the highest-ROI document in software development.
Here's the thing most teams don't realise: an ADR is worth more than a hundred .cursorrules entries. Your rules file says "validate orders before checking inventory." That's a rule. It tells the AI what to do. The ADR says why, what happens if you reverse it, and the specific incident that made it necessary. That's context — and context is what your AI assistant (and your future self, and the next developer) actually needs.
If you're building with AI, ADRs aren't optional. They're the single most important thing you can do for context engineering — because they capture the one thing AI can't generate and humans forget: the why.
What Is an Architecture Decision Record?
An ADR is a short document that captures a single architectural decision. Not a design document. Not a technical spec. A decision — one specific choice, the context that led to it, and the consequences of that choice.
The format was proposed by Michael Nygard in 2011 and refined by ThoughtWorks into the template most teams use. The standard ADR has 5 sections:
- Title — A short noun phrase describing the decision
- Status — Proposed, Accepted, Deprecated, Superseded
- Context — The situation that motivated the decision
- Decision — What we decided and why
- Consequences — What happens as a result, including tradeoffs
That's it. A good ADR fits on one page. A great one fits in half a page. The constraint is the feature — if you can't explain the decision in a few paragraphs, you either don't understand it or it's not a single decision.
ADRs live in your repository, typically in /docs/adr/ or /adr/, numbered sequentially: 001-repository-pattern-for-data-access.md, 002-validate-orders-before-inventory.md, etc. They're versioned alongside your code. When a PR changes the architecture, it should also create or update the relevant ADR. Git is the source of truth. The ADR is the map of what changed and why.
Most teams that use ADRs report the same thing: the act of writing them is more valuable than the documents themselves. Forcing yourself to articulate "we decided X because Y" reveals assumptions you didn't know you were making. Surfaces alternatives you hadn't considered. Turns a gut feeling into a traceable decision.
Why ADRs Matter MORE for AI-Built Products
Every problem ADRs solve gets amplified when AI generates your code. The three structural gaps:
Gap 1: AI makes decisions without rationale
When a developer chooses to validate orders before checking inventory, the choice carries implicit reasoning. Even if they don't write it down, they know why they made that choice — or at least they did at the time. When AI generates the same code, the choice is statistical, not intentional. The model placed validation before inventory because that ordering is more common in its training data. There's no reasoning to recover. No "I chose this because..." moment to reconstruct later.
The developer who accepted the AI's suggestion might have had a reason. More often, it's: "the AI suggested it and it looked reasonable." That's honest — and it's exactly why you need an ADR. Because "the AI suggested it" is not a rationale that survives 3 months of feature development. When the next developer asks "why is it this way?", "the AI suggested it" tells them nothing — and sends them on a 3-hour archaeology expedition through the codebase.
Gap 2: AI-generated code looks better than it is
Clean structure. Consistent naming. Good type annotations. AI-generated code has a surface quality that masks missing intent. It looks like someone made deliberate choices — the patterns are good patterns. But pattern quality and decision quality are different things. The code follows best practices. It doesn't follow your practices, or account for your constraints, or respect your history.
An ADR cuts through the surface quality. It says: "This code looks like it validates orders before inventory because that's a common pattern. The real reason is [specific incident]. If you reverse the ordering, [specific consequence]." The ADR doesn't care how clean the code looks. It cares why the code is the way it is.
Gap 3: The understanding gap compounds faster with AI
The cognitive debt test asks: can you explain why each architectural decision was made? For a 3-person team building with AI for 6 months, the answer is almost always no — not because the decisions were bad, but because the decisions were never documented at the time they were made. Each day of undocumented decisions adds to the debt. Each week that passes makes the original reasoning harder to recover. The Anthropic study found AI-assisted developers scored 17% lower on comprehension tests — and that gap grows over time, not shrinks.
ADRs are the payment plan for cognitive debt. Each one you write closes the gap on one decision. Over time, the decisions you documented accumulate into a body of architectural knowledge that any team member (or any AI session) can reference. Cognitive debt compounds when decisions are invisible. ADRs make decisions visible.
The Enhanced ADR Template for AI-Native Teams
Standard ADRs capture the decision and its context. For AI-built products, you need one extra field. I'll show you the template, then explain why it matters.
# [Number]. [Title]
**Status:** Proposed | Accepted | Deprecated | Superseded by [ADR-XXX]
**Date:** YYYY-MM-DD
**Author:** [Who made this decision]
**Reversal cost:** Low | Medium | High | Very High
## Context
What is the situation that motivates this decision? What problem are
we solving? What constraints exist?
## Decision
What we decided. Not just "we chose X" — but what alternatives we
considered and why we rejected them.
## AI Generation Context
Was this decision influenced by AI-generated output? If yes:
- **What did the AI suggest?** [Describe the AI's initial recommendation]
- **Why did you accept or override it?** [Your reasoning for accepting,
modifying, or rejecting the AI's suggestion]
- **What did the AI miss?** [Context, constraints, or edge cases that
the AI didn't account for — things specific to your system that the
AI couldn't have known]
If this was a human-only decision, write: "Human-only decision. No AI
involvement." (Marking this explicitly is valuable — it tells future
readers that the rationale is fully human and doesn't carry AI's
statistical biases.)
## Consequences
What happens as a result of this decision? Include:
- What becomes easier
- What becomes harder
- What must be maintained going forward
- What happens if this decision is reversed
## Reversal Trigger
Under what conditions should this decision be revisited? Be specific:
"We should revisit this if [condition]" or "This decision stands unless
[condition]."
Why the AI Generation Context field
This is the field that doesn't exist in standard ADR templates. For teams not building with AI, it's unnecessary. For teams that are, it's the most important field in the document.
Here's why: when a developer writes code manually, the reasoning might be recoverable from commit history, the PR discussion, or the developer's memory. When AI generates the code, the reasoning is not recoverable — because the AI's reasoning is statistical, not intentional, and the developer who accepted the suggestion often didn't deeply understand the tradeoffs at the time.
The AI Generation Context field does three things:
-
It captures provenance. When you're debugging 6 months from now and the code does something unexpected, knowing the AI suggested it (and why you accepted) is materially different from knowing a teammate designed it with full intent. The debugging strategy changes. AI-suggested code needs to be verified against your system's actual constraints, not just general best practices.
-
It captures what the AI missed. The "what did the AI miss?" sub-field is the most valuable part. It documents the constraints, edge cases, and system-specific context the AI couldn't have known about — the things you had to add, fix, or override after accepting the AI's output. This is exactly what context engineering aims to preserve: the local knowledge that makes AI output fit your system instead of just looking correct in isolation.
-
It prevents the "AI mythology" problem. Over time, AI-suggested decisions can take on undeserved authority. "The AI said to do it this way" becomes "this is the best way" becomes "we've always done it this way." The AI Generation Context field breaks that chain. It records that the decision was a suggestion, not a directive — and that a human accepted it for specific reasons, which may or may not still be valid.
Why the Reversal Cost field
When AI generates code fast, the cost of writing code feels low. The cost of reversing a decision about that code is not low. If the billing module validates orders before checking inventory because of a specific incident, reversing that decision recreates the conditions for the same incident. The reversal might be a one-line code change. The production impact of that one-line change is measured in days of manual refunds.
Standard ADRs include consequences. They don't include an explicit "how expensive is it to reverse this?" signal. Adding that signal does two things: it forces the author to think about reversibility (which often reveals that a seemingly simple decision is actually expensive to undo), and it gives the reader an immediate sense of how carefully they need to consider changes that conflict with this ADR.
Why the Reversal Trigger field
Decisions have a shelf life. The validation ordering matters today because of a specific incident. If the billing provider changes their API next quarter, maybe the ordering doesn't matter anymore. The ADR should say when it should be revisited, so that future developers (and AI sessions) know whether the decision is permanent or conditional.
Three ADRs, Written Out
Let me make the template concrete. Here are 3 ADRs — the kind you'd write for an AI-built SaaS product — fully filled in.
ADR-001: Repository Pattern for Data Access
# 1. Repository Pattern for Data Access
**Status:** Accepted
**Date:** 2026-02-14
**Author:** Sarah Chen
**Reversal cost:** Very High
## Context
Our data access logic was smeared across 40+ service files. Service
classes directly queried the Prisma client — importing it, calling
it, handling errors locally. When we needed to change how a query
worked (adding caching, switching a table, modifying a where clause),
we had to find and update every service that touched that data.
During Sprint 9, a developer added a raw Prisma query in
`UserStore.findOne()` that bypassed our soft-delete filter. This
caused a production incident: deleted users were appearing in search
results. The fix took 3 hours. Finding all the places where the
same bypass could occur took 2 days.
## Decision
Adopt the repository pattern. Each data entity gets a repository
class that encapsulates all Prisma access. Services call repositories,
not Prisma directly. Prisma imports are restricted to repository files
(enforced by eslint rule).
Alternatives considered:
- **Active record** (Prisma models with business logic): More
convenient but recreates the scattering problem. Business logic
creeps into model files.
- **Query objects** (dedicated query builder classes): More flexible
but doesn't solve the import scattering. Prisma client still
imported everywhere.
- **Data mapper pattern** (separate domain objects from persistence):
More architecturally pure but overkill for our team size. Adds
a mapping layer we don't need yet.
## AI Generation Context
Cursor generated the initial repository implementations for
UserRepository and OrderRepository. The AI suggested using
Prisma's model delegate syntax (`prisma.user.findMany()`) directly
in each repository method.
**Why I modified it:** The AI's approach didn't include our soft-delete
filter as a default. Every repository method would need to manually
add `where: { deletedAt: null }` — which is exactly the kind of
thing someone forgets, which is exactly what caused the Sprint 9
incident. I added a base repository class with soft-delete built
into the default scope.
**What the AI missed:** The soft-delete policy. The AI didn't know
we have a `deletedAt` column on every table, or that forgetting
to filter it caused the incident. This is institutional knowledge
that lives in our sprint retro notes, not in Prisma's documentation.
## Consequences
- What becomes easier: Changing data access logic requires modifying
one file per entity. Adding caching, logging, or soft-delete
enforcement happens in one place.
- What becomes harder: Adding a new entity requires creating a new
repository class. More boilerplate for simple CRUD operations.
- What must be maintained: The eslint rule that prevents direct Prisma
imports outside repository files. If someone bypasses this, the
pattern degrades.
- What happens if reversed: Data access logic re-scatters. The
soft-delete bypass incident recurs. Estimated recovery time: 1 full
sprint to re-centralise.
## Reversal Trigger
Revisit this if we adopt a framework that provides built-in repository
abstraction (e.g., NestJS with TypeORM), or if Prisma adds native
soft-delete scope support that eliminates the need for our base class.
ADR-002: Auth Middleware Before Rate Limiting
# 2. Auth Middleware Before Rate Limiting
**Status:** Accepted
**Date:** 2026-03-22
**Author:** Marcus Rivera
**Reversal cost:** High
## Context
Our middleware chain runs in this order: CORS → Auth → Rate Limiting
→ Fraud Detection → Route Handler. The ordering matters because
the fraud detection middleware depends on the authenticated user's
ID to flag suspicious patterns. If auth doesn't run first, fraud
detection gets no user context and silently skips its checks.
In Sprint 12, a developer (using AI code generation) added a new
route and accidentally placed rate limiting before auth. The route
worked. Tests passed (no test checked middleware ordering). Fraud
detection silently skipped for that route. It took 6 days to discover
the gap — found during an audit, not during testing.
## Decision
Enforce middleware ordering with a centralised middleware configuration
in `src/middleware/index.ts`. Routes import the configured chain
instead of composing their own. Add an integration test that verifies
middleware execution order on every route.
Alternatives considered:
- **Let each route define its own chain:** More flexible but
recreates the Sprint 12 problem. One missed middleware is a silent
security gap.
- **Enforce ordering via Fastify hooks:** Framework-level enforcement
is attractive but doesn't cover our custom middleware. Mixing
framework hooks with custom middleware creates confusion about
what runs when.
- **Static analysis to check middleware ordering:** Clever but
brittle. Won't catch runtime-only middleware or dynamically
composed chains.
## AI Generation Context
GPT-5.5 suggested adding rate limiting before auth for "performance
reasons" — limiting unauthenticated requests before they reach auth.
**Why I overrode it:** The AI's suggestion is correct for public
APIs where the priority is preventing DDoS. Our API is
authenticated-first: we need the user ID for fraud detection and
per-user rate limiting. Putting rate limiting before auth means
rate limiting can't differentiate between users, and fraud detection
gets no user context.
**What the AI missed:** Our fraud detection middleware depends on
auth context. The AI didn't know about the fraud detection
middleware — it wasn't in the files I @-mentioned. This is a
cross-cutting concern that's invisible if you're only looking at
the route handler and the auth middleware.
## Consequences
- What becomes easier: Adding new routes guarantees correct middleware
ordering by default. No more "did I remember to add fraud detection
to this route?"
- What becomes harder: Custom middleware chains for specific routes
require modifying the centralised configuration. Less route-level
flexibility.
- What must be maintained: The integration test for middleware
ordering. The centralised configuration file.
- What happens if reversed: Any developer can omit or reorder
middleware per route. The Sprint 12 incident recurs. Fraud
detection silently skips on misconfigured routes.
## Reversal Trigger
Revisit this if we adopt a framework with built-in middleware
pipe support that enforces ordering at compile time, or if we move
to a microservices architecture where each service manages its own
middleware independently.
ADR-003: Validate Orders Before Checking Inventory
# 3. Validate Orders Before Checking Inventory
**Status:** Accepted
**Date:** 2026-01-30
**Author:** Jess Nakamura
**Reversal cost:** High
## Context
On January 15, 2026, we had a production incident: inventory was
reserved for an invalid order, the customer was charged for items
that didn't pass validation, and it took 3 days of manual refunds
to sort out 847 affected transactions. The chargeback rate hit 4.2%
— above the Stripe threshold. We received a warning: get this under
control or lose processing capability.
The root cause: the checkout flow checked inventory before validating
the order. If inventory was available and the order was invalid, the
reservation happened anyway. The charge was submitted. Validation
failure came too late to prevent the financial impact.
## Decision
Reorder the checkout flow: validate the order first (check that items
exist, quantities are valid, shipping address is complete), then
check and reserve inventory, then submit the charge. If validation
fails, the flow stops before any financial commitment.
Alternatives considered:
- **Check inventory first, then validate:** Better perceived
performance (fast failure on out-of-stock) but recreates the
incident. Invalid orders reserve inventory and trigger charges.
- **Validate and check inventory in parallel:** Faster overall but
creates a race condition. If inventory check succeeds and
validation fails, we need to release the reservation. The
compensation logic is complex and error-prone.
## AI Generation Context
Cursor generated the original checkout flow with inventory-before-
validation ordering. The AI's suggestion is common in e-commerce
tutorials and sample projects — it's the "standard" ordering in
most reference implementations.
**Why I accepted the original (mistake) then overrode it:** I
accepted the AI's suggestion initially because it matched the
tutorial examples I'd seen. After the incident, the override was
obvious — but the incident shouldn't have been necessary. If I'd
documented the decision (or its absence) at the time, the team
would have had a chance to catch the ordering issue before it hit
production.
**What the AI missed:** Our specific business rules. The AI's
training data includes many e-commerce implementations, most of
which don't have our validation requirements (address completeness,
item eligibility, promotional code restrictions). The AI generated
a generic checkout flow that worked for the generic case — and
failed for our specific one.
## Consequences
- What becomes easier: Invalid orders are caught before financial
commitment. No more reserves-then-refunds incidents.
- What becomes harder: Out-of-stock items are discovered later in
the flow. Slightly worse UX for the "item unavailable" case —
the user sees this after filling in the form, not before.
- What must be maintained: The validation-before-inventory ordering
must not be "optimised" back to inventory-first. The ADR exists
to prevent that.
- What happens if reversed: The January 15 incident recurs.
Estimated impact during peak: $2K/day in manual refunds, plus
chargeback risk above Stripe's threshold.
## Reversal Trigger
Revisit this if we implement a two-phase reservation system:
reserve inventory optimistically, then release if validation fails
within a timeout. This would give us both fast UX and financial
safety — but it requires significant changes to the reservation
and compensation logic.
How to Start Tomorrow: Your First ADR in 5 Minutes
The biggest barrier to ADRs is starting. The template looks formal. The numbering system feels like process. The blank page is intimidating. So let me make this as simple as I can.
Step 1: Create the directory
mkdir -p docs/adr
touch docs/adr/000-first-adr.md
Step 2: Pick the most important decision you've made recently
Not the most interesting. Not the most complex. The one that, if someone reversed it, would cause the most damage. For most teams, this is a data access pattern, an auth flow, or a validation ordering. (If the 3 examples above felt familiar — start with one of those.)
Step 3: Write it in 5 minutes using this compressed template
Don't overthink it. Draft it fast. You can refine later.
# 1. [Short Title]
**Status:** Accepted
**Date:** [Today]
**Author:** [You]
**Reversal cost:** Low | Medium | High
## Context
What happened that made this decision necessary? One paragraph.
## Decision
What we decided. Why we didn't pick the alternatives. Two paragraphs
max.
## AI Generation Context
Did the AI suggest this? Did you accept, modify, or override? What
did the AI miss about your specific system? Two sentences minimum.
## Consequences
What changes. What must be maintained. What happens if reversed.
## Reversal Trigger
When should we revisit this? One sentence.
Step 4: Commit it
git add docs/adr/000-first-adr.md
git commit -m "docs: add ADR-001 — [title]"
That's it. Your first ADR, written and committed in 5 minutes. Doesn't need to be perfect. Needs to exist. The value isn't in the writing — it's in the decision being traceable. Every ADR you write after this one gets easier, because the format becomes familiar and the habit forms.
The first-week goal
3 ADRs in your first week. Then one per week after that. After 3 months, you'll have 12-15 ADRs covering the most important architectural decisions in your codebase. That's 12-15 decisions any team member — or any AI coding session — can reference instead of guessing.
ADRs as the Foundation of Your Specification Layer
Here's the connection most teams miss: ADRs aren't just documentation. They're the foundation of your entire specification layer — the thing that makes context engineering actually work.
When you build a specification for your AI coding assistant — the structured, file-specific, context-rich instructions that make AI output fit your system — the ADRs are what feed it. Your rules file says "use the repository pattern." Your ADR says why, what happens if you don't, and what the AI missed when it suggested the alternative. The rules file is the guardrail. The ADR is the map. Your AI assistant needs both.
The specification layer has four components:
- Rules — Conventions, patterns, tech stack declarations. The "what." (Your
.cursor/rules/files.) - Architecture Decision Records — Decisions with rationale, alternatives, and reversal costs. The "why."
- Living Specification — The current state of your system: component relationships, data flows, constraints, what's done and what's in progress. The "how it fits together."
- Atomic Tasks — File-specific, context-rich instructions for individual pieces of work. The "what to do right now."
Most teams have rules (component 1). Some have specs (component 3). Almost none have ADRs (component 2) — and that's the gap that makes the other three less effective. Rules without rationale are brittle — they tell you what to do but not why, so you don't know when they can be safely broken. Specs without rationale are descriptions of current state, not guides to what must be maintained. Tasks without rationale are instructions the AI follows mechanically, without understanding the constraints that make the instructions correct.
ADRs close each of these gaps. They're the connective tissue between "what should the code look like" and "why must it look that way" — which is exactly the information that AI coding assistants don't have and humans forget.
The Honest Trade-offs
ADRs aren't free. They take time to write. They need to be maintained — when a decision is reversed or superseded, the ADR should be updated, not deleted. And they add process, which means some developers will resist them as "documentation overhead."
But the alternative — decisions that exist only in someone's fading memory, or not at all — is more expensive. Not in the abstract. In the specific, measurable cost of 3-hour code review debates, multi-day debugging sessions that should have taken hours, and production incidents caused by reversing decisions nobody knew were intentional. Cognitive debt is real, it's compounding, and ADRs are the most practical tool for paying it down.
For teams building with AI specifically, the trade-off is even clearer. AI generates code faster than you can understand it. Without ADRs, the understanding gap grows with every feature. With ADRs, every decision is captured at the moment it's made — when the reasoning is fresh, the alternatives are visible, and the context is complete. The cost of writing an ADR is maybe 4 minutes. The cost of not writing it is measured in hours, days, and production incidents.
Start with one. Pick the decision that would hurt most if someone reversed it. Write it in 5 minutes. Commit it. You'll know within a week whether it's worth the 4 minutes — because the next time someone asks "why is it this way?", you'll point to the ADR instead of shrugging.
4ge is a context engineering platform — a visual workspace where architecture decisions, business logic constraints, and edge cases are first-class citizens, documented at the point of creation. See how 4ge makes ADRs part of your specification workflow →
Related: Cognitive Debt: The Hidden Cost of AI-Generated Codebases · Don't Build What You Can't Explain: The Cognitive Debt Test · The Complete Guide to Context Engineering