Part The Framework

Chapter

07

The Spec: Writing for Machines, Not Humans

I built a Frankenstein once.

Not the green kind with the bolts. The software kind — where the product looks right from a distance but falls apart the moment a real user touches it. I was building TravelOS, a platform for starting travel agencies, and I was working with Mauricio, a friend with forty years in the tourism industry. Mauricio knows everything about how travel agencies actually work. I know how to translate that knowledge into software. Between us, we had everything we needed.

Except alignment.

For weeks, I had been building what I understood from our conversations. A system to replace his current tools — Go High Level for funnels, WhatsApp for client communication, a coaching program for beginners. I was pulling it all into one beautiful, integrated platform. The architecture was clean. The features were comprehensive. And it was completely wrong.

"The mistake I made," Mauricio told me during one of our realignment sessions, "was putting that feature there. That feature is for established agencies. I made a Frankenstein between your education program and services for agencies already selling."

Two audiences. One spec. A Frankenstein.

The platform worked. It ran. It didn't crash. And it served nobody well — not because the code was broken, precisely because the spec never separated who it was for. The code was technically clean but commercially wrong. I had built what I understood, not what the domain required.

This is the chapter about how to never build a Frankenstein again.


The Hardest Skill in the Pipeline

The bottleneck in AI-assisted development has moved from implementation to specification. Models can generate thousands of lines of code in minutes. But ambiguous specs produce ambiguous software. AI agents don't ask clarifying questions — they make assumptions. And those assumptions compound.

This chapter is the most important one in the book. Not because the concepts are the most novel — trust tiers and evaluation architecture are arguably more original — but because this is the skill that everything depends on. If you can write a spec that an autonomous agent can implement without asking clarifying questions, you can build almost anything. If you can't, no amount of harness engineering or evaluation will save you.

I know this because I learned to write specs before I learned to write code. I'm a business administrator with fifteen years in marketing. I didn't know how to ship a backend — but I understood how to define a deliverable. When I encountered spec-driven development through Nate Jones' work in the community, something clicked — not because it was new to me, precisely because it formalized something I'd been doing my entire career. Every marketing brief, every campaign plan, every client proposal is a specification of sorts. You define the audience. You define the deliverables. You define what success looks like. You define what's out of scope. And then someone else executes.

The difference is that with marketing briefs, the executor is a human who asks clarifying questions. "Hey, did you mean the homepage banner or the sidebar?" "Which audience segment are we targeting here?" Humans fill in the gaps with common sense, context, and the occasional walk to your desk.

AI agents don't walk to your desk. They fill gaps with assumptions. And those assumptions are often plausible, internally consistent, and wrong.


What a Spec Is Not

Before I describe what a spec should contain, let me be direct about what it is not.

A spec is not a PRD. Product requirements documents are designed for humans — they describe the what and why at a high level, leaving the how to the development team. PRDs work when there's a human team that can ask questions, debate tradeoffs, and use professional judgment. They are intentionally incomplete. That incompleteness is a feature when humans execute. It's a catastrophe when agents execute.

A spec is not a user story. "As a building administrator, I want to upload unit data so that I can manage my property" tells a human developer what to build. It tells an AI agent almost nothing. What format is the data? What happens if a field is missing? What if the administrator uploads twice? What if the building adds a new tower later? The user story describes the intent. The spec describes the behavior — exhaustively.

A spec is not documentation. Documentation describes what exists. A spec describes what should exist, including what should not exist. The non-behaviors are as important as the behaviors, because agents are relentlessly helpful — they will build features you didn't ask for if you don't explicitly say "don't."

A spec is a behavioral contract between the person who understands the domain and the agent that will implement it. It's written in a format precise enough that the agent can execute without ambiguity, and structured enough that its completeness can be verified before a single line of code is generated.

Practitioners often distinguish three document types that teams collapse into one:

DocumentAudiencePurpose
PRDHumans / stakeholdersWhy we're building, business value
Architecture DocEngineersHow we're building, design decisions
AI SpecAgentsExecution contract — not for debate, for action

The AI Spec borrows from both — it needs the why to generate the intent contract, and it needs the how to generate implementation constraints — but it exists as a separate artifact with a different purpose. A PRD might say "we need to improve the onboarding experience." An architecture doc might say "we'll use a wizard pattern with three steps." The AI Spec says: "When a new administrator registers, they are redirected to an onboarding wizard. The wizard contains exactly three steps: (1) building details, (2) unit configuration, (3) notification preferences. All three steps are required. The wizard cannot be completed partially. If the administrator abandons the wizard, the system saves their progress and resumes from the last completed step on next login."

Same requirement. Three different levels of precision. The AI Spec is the one that can be implemented without a conversation.


The Spec as a Thinking Tool

Here's what surprised me when I wrote my first real spec.

I was building a CRM for a friend's client — a marketing agency whose pipeline was scattered across fifteen different channels. No budget for an off-the-shelf CRM. No interest in buying one. I offered to build it for free as an experiment.

I had discovered a structured questioning approach through the community — a system that walks you through progressively deeper questions about what you're building, for whom, under what constraints, and against what definition of success. The process was confusing at first. Some of the questions I could answer easily. Others made me pause.

"The questions were driving my mental model to be more precise," I realized afterward. "To think deeper about the problem."

I knew marketing CRMs. Fifteen years of experience. I knew what fields mattered, what workflows existed, what the pain points were. But the spec process forced a different kind of precision — not domain precision (I already had that) but implementational precision. Not "we need a pipeline view" but "the pipeline has five stages, a deal moves when this action occurs, these fields are required at each stage, this notification fires at this threshold, and here's what happens when a deal is stuck for more than seven days."

The spec wasn't just a document I was creating for the agent. It was a thinking tool that forced me to resolve ambiguities I didn't know I had. And that's the first thing I want you to understand about specification:

The output isn't the document. The output is the clarity.

Two hours after feeding that spec to Claude Code, I had a half-working MVP. Not a polished product — the infrastructure was still alien to me (Supabase, Clerk, Vercel, none of which I'd used before). But the bones were clean. The data model was right. The workflows made sense. And three weeks later, my friend presented that CRM to her client, and they were blown away. They said it represented their exact workflow — something they'd been asking for "for many years."

That precision — spec precision — is what made the difference. Not the model. Not the harness. The spec.


The Eight Sections

Every specification in the Dark Factory methodology contains eight required sections. The depth of each section scales with the trust tier (a Tier 2 marketing tool needs less than a Tier 4 patient safety system), but the sections themselves are non-negotiable. Skip one, and you leave a gap the agent will fill with assumptions.

1. System Overview

What is this system? Who is it for? What problem does it solve? What doesn't it do?

This sounds obvious, and it is — until it isn't. My Frankenstein with Mauricio failed here. The system overview described "a platform for travel agencies." But there are two kinds of travel agencies in Mauricio's world: beginners who need education, and established agencies who need operational tools. One system overview. Two audiences. A Frankenstein.

The system overview forces you to answer: who, exactly, is this for? If you can't describe the user in one sentence, you don't have a system — you have two.

2. Behavioral Contract

This is the core of the spec. What does the system do? Not what features does it have — what behaviors does it exhibit?

A feature list says: "The system has CSV upload for unit data." A behavioral contract says: "When an administrator uploads a CSV file, the system validates that all required fields are present (unit number, coefficient, owner name, email). If any field is missing, the upload fails with a specific error message identifying the missing fields and the row numbers affected. If the upload succeeds, the system creates one record per row. If records with the same unit number already exist, the system updates the existing records rather than creating duplicates. The administrator receives a confirmation showing the number of records created and updated."

The behavioral contract answers: if I do X, what happens? For every X.

3. Explicit Non-Behaviors

What the system must not do. This is the section most people skip, and it's the one that costs them the most.

AI agents are relentlessly helpful. If your spec describes a building management system, the agent may decide to add a maintenance request feature, a community forum, or an AI chatbot — because those seem helpful for building management. Without explicit non-behaviors, you'll get features you didn't ask for, designed according to the agent's assumptions about what's useful.

"The system does not handle financial accounting. It stores financial records for transparency purposes but does not calculate taxes, generate invoices, or integrate with accounting software."

"The system does not send notifications unless explicitly configured by the administrator. No default notifications exist."

Non-behaviors are boundaries. They tell the agent where the walls are.

4. Integration Boundaries

What external systems does this connect to? What format do they expect? What happens when they're unavailable?

One of the hardest lessons I learned building the SonIA CRM was that integrations are where specs break down fastest. The CRM needed to connect to Supabase for data, Clerk for authentication, and Vercel for deployment. Each of those connections had its own API, its own authentication flow, its own failure modes. My spec described the CRM's behavior beautifully — and said almost nothing about what happens when Supabase is down, when a Clerk session expires, or when a Vercel deployment fails.

Integration boundaries force you to think about the edges — the places where your system meets the world.

5. Behavioral Scenarios

A minimum of seven scenarios that describe the system's behavior in concrete, testable situations. Think of these as acceptance criteria with context.

"Scenario: New building onboarding. Given an administrator who has just registered, when they access the building configuration page, they see a wizard that walks them through: building name, address, number of units, unit types. When they complete the wizard, the system creates the building record and redirects to the unit upload page."

Scenarios serve two purposes. During specification, they force you to think through real usage. During testing, they become the basis for evaluation — the system either passes the scenario or it doesn't. There's no "sort of works."

The number of scenarios scales with trust tier. Tier 1 (deterministic) needs the minimum seven. Tier 4 (high-stakes) might need thirty or more, each with factorial stress variations.

6. Intent Contract

This is what most specifications lack entirely — and it's the difference between software that does what you asked and software that does what you meant.

An intent contract encodes the organizational context: what are we optimizing for? When two valid behaviors conflict, which one wins? What should the system do when the spec is ambiguous?

Most specifications skip this section. They describe what the system does but not why. For most of the time, that's fine — the behaviors are clear enough that "why" doesn't affect implementation. But agents encounter genuine ambiguity constantly. When two behaviors seem equally valid, when the spec doesn't cover a case, when the user does something unexpected — the agent needs a decision rule. Without an intent contract, it guesses. With one, it resolves consistently.

Intent operates at two layers.

The organizational layer encodes the company's goals, priorities, and tradeoffs at a business level. What are the two or three things that matter most? What is the hierarchy when priorities conflict?

For Edifica, the organizational intent states: "This system serves building administrators operating under Colombian law (Ley 675 de 2001). The highest priority is legal compliance. The second priority is administrative transparency — residents have a legal right to access financial and governance information. The third priority is operational efficiency for administrators."

That hierarchy is the framework for every ambiguous decision in the system. When a resident requests another resident's contact information, privacy vs. transparency conflict. The intent contract resolves it: legal compliance takes precedence, which means the system produces exactly what Ley 675 requires — no more, no less. The agent doesn't have to reason about privacy tradeoffs. The organizational layer already did.

The agent instruction layer translates organizational intent into specific behavioral rules. It answers three questions for every major function:

  1. What is this agent optimizing for when it takes action?
  2. What paths are explicitly prohibited?
  3. When should the agent escalate to a human rather than decide?

For the Edifica financial report generator: optimizing for accuracy and legal compliance (not brevity or readability). Prohibited paths: generating financial summaries that differ from the raw transaction data, even if the administrator requests it. Escalation: when the agent detects a discrepancy between the running balance and the transaction history, it flags for human review rather than resolving the discrepancy itself.

Notice what the intent contract does not cover: report format, line item ordering, how the system handles a building that switches from annual to quarterly reporting. Those are behavioral questions that belong in sections 2 and 5. The intent contract answers a different question: when the spec runs out, what principle guides the next decision?

A test for whether you need a detailed intent contract: imagine the agent encounters a case your spec doesn't explicitly cover. Does it matter which plausible behavior it chooses? If yes — you need an intent contract that closes that decision.

For Tier 1 and Tier 2 systems, a minimal intent contract often suffices: two or three lines about the primary optimization target and any hard prohibitions. For Tier 3 and Tier 4 systems, the intent contract can be as substantial as the behavioral contract — because those systems operate in domains where ambiguity resolution has consequences.

7. Ambiguity Warnings

Every spec contains ambiguity. The question is whether you find it before the agent does.

Ambiguity warnings are self-flagged areas where the spec author knows the specification is incomplete or uncertain. "The notification frequency for overdue payments has not been determined. Current default: one notification at 7 days, one at 15 days. This may change based on user testing."

The most reliable way to find ambiguities is to scan the spec for specific words that signal uncertainty: "should," "ideally," "try to," "usually," "when possible." These are spec bugs. Every one of them represents a decision the author didn't make — and a decision the agent will make for them.

In our methodology, the harness runs an automated scan for these words before the BUILD phase can start. If any are found, the spec goes back for resolution. This is a deterministic gate — no human judgment required, no exceptions. The words are either there or they aren't.

8. Implementation Constraints

Technical boundaries: what stack, what hosting, what performance requirements, what security requirements, what accessibility standards.

"The system is built with Next.js 16, uses Supabase for the database and authentication, deploys to Vercel, and must load initial page content within 2 seconds on a 3G connection."

Constraints prevent the agent from making architectural decisions it shouldn't make. Without them, the agent might choose a technology it's more familiar with, or optimize for a metric you don't care about, or introduce a dependency you can't maintain.


The Bad Spec vs. The Good Spec

Let me show you what this looks like in practice. Same requirement, two specifications.

The requirement: Building administrators (the property managers responsible for the building) need to upload unit records — one per apartment — for their building.

The bad spec:

The system should allow administrators to upload unit information. They can use a CSV file. The system should validate the data and show errors if something is wrong. Units should be displayed in a list after upload.

This spec has four instances of "should" (ambiguity signals), no error handling specifics, no definition of "unit information," no behavior for duplicates, no behavior for partial failures, and no constraint on file size. An agent implementing this will make at least six decisions the spec author didn't make — and the author won't know until they see the result.

An agent given this spec will make at least six silent decisions: it will choose CSV as the only format (the spec implies it but doesn't require it); it will probably overwrite existing records on duplicate upload (most plausible behavior); it will show a generic error on validation failure rather than row-level feedback; it will not impose a file size limit; it will likely allow upload to proceed if some rows pass and some fail; and it will create a unit list in whatever format it considers standard.

None of those are your decisions. They're the agent's. And you won't know what it decided until you see the result.

The good spec:

Behavior: Unit CSV Upload

When an administrator navigates to Building > Units > Import, the system displays a file upload area and a "Download Template" button.

The CSV template contains columns: unit_number (required, string), coefficient (required, decimal 0.0000-1.0000), owner_name (required, string), owner_email (required, valid email format), phone (optional, string).

When the administrator uploads a CSV file:

  • The system validates all rows before processing any. Validation rules: required fields present, coefficient is a valid decimal, email matches format, unit_number is unique within the file.
  • If validation fails: the upload is rejected entirely. The system displays a table showing row number, field name, and specific error for each validation failure. No records are created or modified.
  • If validation passes and no units exist for this building: all records are created. The system displays "X units created successfully."
  • If validation passes and units already exist (matched by unit_number): existing records are updated with the new values. New unit_numbers are created. The system displays "X units created, Y units updated."

The upload button is disabled while processing. Maximum file size: 5MB. Maximum rows: 2,000.

Non-behavior: The upload does NOT delete existing units that aren't in the CSV file. Deletion is a separate, explicit action requiring confirmation.

The difference isn't length — it's decisions. The good spec contains zero instances of "should." Every behavior is described in terms of what happens, not what should happen. Every edge case (duplicates, partial failures, empty fields) has an explicit resolution. The agent implementing this spec has no decisions to make. It only has instructions to follow.


When Specs Drift

A spec is not a one-time document. It's a living contract — and like any contract, it's only useful if it reflects current reality.

I learned this watching Hernan work.

Hernan is a mid-level developer building Edifica with me. He's sharp, methodical, and transitioning from traditional code-first development to AI-augmented work. Over the course of a week, he made about fifteen changes to the codebase — adding a phone field to the unit data model, reorganizing the building page into tabs, implementing CSV upload with coefficient editing. Good work. Solid features.

None of it was in the spec.

When I noticed, I felt the familiar twinge — the one that shows up right before a system starts to drift. I stopped the session. "You're about to work with an outdated spec," I told him. "This is where we start to screw up."

He got it immediately: "I didn't update it with the last changes."

Here's why this matters. The spec serves two functions in AI-augmented development. First, it gives the agent context — the full picture of what the system is and how it behaves. When the spec is current, the agent's suggestions align with the existing architecture. When the spec is stale, the agent works from an outdated mental model and produces changes that conflict with what's already built.

Second — and this is less obvious — the spec gives the developer confidence. Hernan told me he still doesn't fully trust the agent's output. "I still don't trust what I send it," he admitted. "When the result comes out, I have a lot of distrust about what's going to come out." The spec is what bridges that trust gap. When the spec is current and the agent's output matches the spec, the developer can verify alignment. When the spec is stale, there's no reference point — and the developer falls back to manual review of every line.

Spec drift is the silent killer of AI-augmented development. The code moves forward. The spec stays behind. The agent loses its source of truth. The developer loses their safety net. And slowly, incrementally, the system drifts from its intended behavior.

The discipline is simple but counterintuitive for developers trained in the "just ship it" culture: every time the code changes, the spec changes. Not eventually. Not in a documentation sprint. Now. Before the next change starts.


Knowledge Extraction: The Hidden Skill

The hardest part of writing a spec isn't the format. It's getting the information.

When I built TravelOS for my friend, the technical challenge was trivial — I knew the stack, I knew the spec structure, I knew how to work with the agent. The challenge was getting my friend to tell me what I needed to know.

He didn't understand why I was asking so many "basic" questions. He'd been running his travel education business for years. He knew every detail intuitively. I realized his expertise was the bottleneck — not because he lacked answers, precisely because the answers never had to leave his head before. Intuitive knowledge doesn't transfer into specifications. I needed him to articulate the nitty-gritty — the daily workflows, the edge cases, the exceptions, the things that "everyone just knows."

"He was weird and didn't want to talk to me too much," I remember. Not hostile — just confused. Why was I, someone who knew his business well, asking things like "what happens when a student misses a payment?" and "how many modules before they get their first client?" These felt like obvious questions to him. They felt like essential spec questions to me.

This is the spec architect's hidden skill: patient, precise knowledge extraction from people who don't know why you're asking.

Domain experts carry their knowledge as compressed intuition. They can make decisions in seconds that would take pages to explain. The spec architect's job is to decompress that intuition — to turn "I just know" into "when X happens, do Y, unless Z, in which case do W."

Mauricio taught me something else about knowledge extraction. During our TravelOS sessions, he would push back on my technical proposals — not because he understood the technology, but because he understood his customers. "For $7, I'm competing with ChatGPT," he told me. "It's trained for tourism, it helps them, it's functional. It doesn't cost much — one coffee per month." Every feature I proposed, he filtered through that $7 lens. Could a person paying $7 a month actually use this? Would they?

And then he said something that I now consider one of the most important pieces of feedback I've ever received: "The problem working with you is that when you're working with us, the ideas make sense to you and you tell me 'yes, we can do that.' Out of 30,000 possibilities, but your priority is something else. And that's where we fail in communication."

The builder's enthusiasm becomes a liability. Saying "yes, we can build that" is not the same as "yes, we should build that." The spec is the discipline that turns "can" into "should" — by forcing every feature through the filter of the system overview, the behavioral contract, the integration boundaries, and the trust tier. If it doesn't belong, the spec says no — even when the builder's instinct says yes.


The Brownfield Problem

Not every spec starts from zero. Most real-world projects have existing code, existing behavior, existing users with existing expectations. Changing these systems is where the most expensive bugs live — not because the changes are technically complex, but because the existing behavior is often undocumented, untested, and encoded only in the code itself.

For brownfield projects, the specification adds three sections to the standard eight:

Existing Behavior to Preserve. What does the current system do that must continue working exactly as it does? This is extracted during the discovery phase (Chapter 6) and becomes the most important section of the brownfield spec. Every behavior listed here is a regression scenario — if the new code breaks it, the change fails.

Behavioral Changes. What's different? These are deltas against the existing contracts — not a description of the new system, but a description of what changes in the existing system. This forces precision: you're not rebuilding, you're modifying. The distinction matters because agents will happily rebuild from scratch if you let them.

Regression Scenarios. Tests that verify existing behavior survives the change. These aren't new scenarios — they're existing scenarios that must continue to pass. The agent implements the new behavior; the regression scenarios verify it didn't break the old behavior.

The brownfield spec is harder to write than a greenfield spec. It requires understanding what already exists — which means reading code, talking to users, and sometimes surfacing behaviors that nobody remembers implementing. And it's where the most value lives, because most software already exists. The world doesn't need more MVPs. It needs better changes to the systems already running.


Spec Quality: How to Know When You're Done

Every spec session ends with the same question: how do I know when this is complete enough to hand to the agent?

There's no perfect answer, but there are reliable signals.

The "should" scan. Search the document for "should," "ideally," "try to," "when possible," "if applicable," and "usually." Every instance is an unresolved decision. In our harness, this scan runs as a pre-BUILD gate — the spec literally cannot proceed to implementation until these words are resolved. This is not optional. These words signal that the spec author stated a preference instead of a fact. The agent will convert that preference into an implementation choice, and you won't know what choice it made until you see the output.

The failure question. For every integration listed in section 4, ask: "What happens when this fails?" If the spec doesn't answer, the agent will decide. Failure handling is where the most expensive bugs live — not in the happy path, but in the moments when external systems are unavailable, when users do the unexpected, when the environment doesn't cooperate. A spec that covers every happy path but leaves failure handling implicit is 80% complete and 20% dangerous.

The handoff test. Give the spec to a developer who knows the technology stack but knows nothing about your domain. Ask them to describe, in plain language, what the system does in five different scenarios. If their description diverges from your intent, the spec has a gap. This is most valuable for sections 2 and 3 (behavioral contract and non-behaviors). Technical developers are good at inferring technical behaviors. They are bad at inferring business rules — because business rules aren't logical, they're accumulated organizational experience. The handoff test finds the gaps between "what any competent agent can infer from general knowledge" and "what it needs to know specifically about your domain."

The decision count. A rough heuristic: for every 100 lines of generated code, an agent makes approximately five to ten significant decisions. After generation, trace those decisions back to the spec. Are they explicitly covered? Were they resolved by the intent contract? Or did the agent guess?

When you can say "the spec covers it" or "the intent contract resolves it" for every significant agent decision you find — you're done. Not when the document reaches a certain length. Not when you've stopped thinking of edge cases. When the agent's decision space is closed.


The Spec as Organizational Memory

There is one use for the spec that nobody thinks about until they need it — and by then, it's too late to build it.

Three months after a build session, a bug surfaces in production. Something is behaving unexpectedly. A developer opens the codebase. The code is there — but the intent is gone. Why was this behavior implemented this way? What edge case was it designed to handle? That information lived in the mind of the spec architect during the build session. It was never captured. Now the developer making the fix doesn't know whether the unexpected behavior is a bug or a feature.

This is where specs outlive their original purpose.

The code describes what the system does. The spec describes what the system was meant to do. These are not always the same. Sometimes an agent produces code that diverges slightly from the spec — and nobody catches it because the output looks right. Sometimes a developer makes a change that's locally correct but breaks an unstated assumption. Sometimes the spec was right and the code was wrong, and the bug is the gap between them.

When Hernan made fifteen changes without updating the spec, the immediate problem was drift — the agent working from an outdated mental model. The deeper problem was that Edifica's institutional knowledge was splitting. The spec said one thing. The code did another. The next time someone needed to understand the system's behavior — a new developer, a new build session, a new model upgrade — they had two conflicting sources of truth.

The spec wins. Not because it's more authoritative than the code by definition, but because it encodes intent. The code might be wrong. The spec represents what was decided, why it was decided, and what edge cases it was designed to address. Future agents and future developers should resolve conflicts between spec and code by asking "which one represents the correct intent?" — not by assuming the code is right.

The practical implication: the spec is not done when the build is done. The spec is done when the system is retired. Every change to the system produces a corresponding spec update. This is the only way to keep organizational memory intact.

I've shipped two specs that now exceed twelve thousand words — one for the Edifica building management system, one for the Dark Factory ERP. Those documents are not technical artifacts. They're institutional knowledge: the business rules, the governance requirements, the decisions we made and why, the edge cases we handled and how. Anyone who reads them understands the system — not just the code. Future agents building on or modifying these systems will start with the spec, not the code history. The spec is how organizational knowledge becomes machine-readable.


Specification Fatigue

There is a real risk in everything I've described. If the spec is the bottleneck — if every decision must be resolved before the agent starts building — then the spec becomes the new source of delays, frustration, and organizational drag.

I call this specification fatigue, and it's the number one systemic risk in spec-driven development.

It manifests in predictable ways. The spec gets longer and longer as the team tries to anticipate every edge case. Reviews take days instead of hours. The spec author burns out from the cognitive load of resolving ambiguities that may never matter in practice. The team starts treating the spec as a bureaucratic hurdle rather than a thinking tool. And eventually, someone says: "Can we just skip the spec for this one? It's a small change."

The moment you skip the spec is the moment the methodology breaks.

The defense against specification fatigue is trust tiers. Not every system needs a thirty-page spec with sixty behavioral scenarios. A Tier 1 internal tool needs the standard eight sections at minimum depth — maybe two pages total. A Tier 4 patient safety system needs every section at maximum depth, with factorial stress variations on every scenario. The spec scales to the risk.

A second defense is decomposition. Carlos, one of the developers building VZYN Labs, described his approach after working with a large spec for the first time: "I first analyzed the spec. Then from the spec I generated tasks. And then each task I passed to an agent." He wasn't overwhelmed by the spec's length. He used it as a decomposition tool — a source of precision he could slice into implementable units. The spec was too large to hand to an agent whole. But it was exactly the right size to hand to a developer who could generate tasks from it.

This is the correct relationship between developer and spec. The spec is the source of truth. The developer is the granularizer — breaking it into tasks precise enough for the agent, verifying each output against the spec, catching the moments when the implementation drifts from intent. The cognitive load of "knowing everything" shifts from the developer's head to the document. The developer's job becomes navigating the document and directing the agent, not carrying the full context of the system.

The other defense is tooling. The structured questioning approach that Nate Jones introduced to the community — the system that walks you through progressively deeper questions about behavior, intent, and constraints — dramatically reduces the cognitive load of spec writing. You're not staring at a blank page trying to think of everything. You're answering questions. The questions are organized by importance. The system tells you when you're done.

Specification fatigue is real. But so is specification absence. The solution isn't to skip the spec — it's to right-size it.


The Spec Is a Conversation

I want to close with something I noticed during the Mauricio meeting that I think captures the essence of everything in this chapter.

For the middle forty-five minutes of our session, we weren't building software. We weren't writing code. We weren't even writing a spec document. We were having a conversation — about his customers, about their journey, about where friction exists, about what a person paying $7 a month actually needs versus what a person paying $597 expects.

By the end of that conversation, we were aligned. For the first time in weeks, we agreed on what we were building, for whom, and in what order. Mauricio said: "Good that we could align. I was worried because we were completely misaligned."

That conversation was the spec. Not the document that came after — the conversation itself. The document is just the artifact. The real specification happens when the domain expert and the builder sit together and resolve ambiguities in real time, challenge each other's assumptions, and arrive at a shared understanding of what the system should do.

Neither Mauricio nor I could have written the spec alone. He knows everything about travel agencies and nothing about software architecture. I know how to structure systems and nothing about tourism pricing tiers. The spec emerged from the intersection — from sustained, disciplined dialogue between two kinds of expertise.

This is why specification is the hardest skill in the pipeline. It's not a solitary act of documentation. It's a collaborative act of alignment. The spec architect's job isn't to write a perfect document in isolation — it's to extract, structure, and formalize the shared understanding between the people who know the domain and the machines that will implement it.

The document is important. The eight sections are non-negotiable. The ambiguity detection is essential. But underneath all of it, the spec is a conversation — the one most teams skip, and the one that separates software that ships from a Frankenstein.


Next chapter: what happens once the conversation is on the page. How the agent runs on deterministic rails, why the harness carries as much weight as the intelligence, and where the builder stops being the bottleneck.