Trevize — A software factory your team and agents run together

There’s a growing gap in how teams use AI to build software. On one end, you have vibe coding — describing what you want in a chat prompt and hoping the output is close enough. On the other, you have engineers manually translating requirements into tasks, then using AI assistants line by line. Neither approach scales.

Spec-driven development sits in the middle. It’s a structured way to work with AI coding agents where a well-written specification — not a prompt, not a Jira ticket — becomes the source of truth for what gets built.

The idea is simple: if you give an AI agent a clear contract that defines the intended behavior, the acceptance criteria, and the context it needs, the output is dramatically better than if you just ask it to “build a sign-up page.”

The problem with unstructured AI coding

Vibe coding took off in early 2025 after Andrej Karpathy coined the term. The appeal is obvious — describe what you want, let the AI write it, ship fast. But the cracks showed up quickly.

A Wiz study found that roughly one in five vibe-coded applications contain serious vulnerabilities. Codebases produced this way tend to be inconsistent, poorly documented, and hard to extend. In a 2025 survey, 16 out of 18 CTOs reported experiencing production incidents directly caused by AI-generated code that nobody actually reviewed or understood.

The core issue isn’t that AI writes bad code. It’s that AI writes exactly what you ask for — and most prompts are underspecified. When a human developer gets a vague requirement, they fill in the gaps with experience, context, and judgment. When an AI agent gets a vague prompt, it fills in the gaps with assumptions. Sometimes those assumptions are correct. Often, they’re not.

This is what Thoughtworks identified as a key insight: AI-assisted development dramatically raises the cost of ambiguity. When an AI agent is involved, unclear intent doesn’t just slow things down — it actively creates risk.

What spec-driven development actually means

Spec-driven development is the practice of writing a detailed specification before any code is generated, then using that spec as the governing contract for an AI agent’s work.

A spec in this context isn’t a 40-page PRD that lives in Confluence and nobody reads. It’s a focused, structured document that contains:

A user story with context. Not just “add authentication” — but who the user is, where they arrive from, what they need to accomplish, and what constraints apply. For example: “As a new user arriving from the homepage, I want to create an account with email and password so I can access the platform. Email must be verified before first login.”

Acceptance criteria. These are the specific, testable conditions that define when the implementation is complete. Good acceptance criteria are binary — they either pass or they don’t. For instance: “Email field validates format on blur,” “Password requires minimum 8 characters,” “Duplicate emails show an inline error message.”

Scope and constraints. What repositories, frameworks, and conventions does the agent need to follow? What’s explicitly out of scope? This prevents the agent from making architectural decisions it shouldn’t.

The spec becomes the single source of truth. The AI agent reads it, plans an implementation, writes the code, and — critically — verifies its own work against the acceptance criteria before declaring itself done.

Why this is different from writing better prompts

You might think this sounds like prompt engineering with extra steps. It’s not — the distinction matters.

A prompt is a one-shot instruction. You send it, the AI responds, and the quality depends entirely on how well you articulated the request in that moment. If the output is wrong, you iterate through conversation, which introduces drift and loses context over time.

A spec is a persistent contract. The agent can refer back to it at every step of implementation. When it finishes writing code, it checks the acceptance criteria. When a reviewer leaves feedback on a pull request, the agent can re-read the spec to understand the original intent. The spec survives the entire development lifecycle — it’s not a message in a chat window that scrolls away.

This is why spec-driven development works for teams, not just individuals. A product manager can write the spec (with AI assistance). An engineer can refine the acceptance criteria. The AI agent implements against both of their contributions. Everyone is working from the same document, with the same definition of “done.”

The workflow in practice

A typical spec-driven development cycle has five steps:

1. Write the spec. The product owner or engineer describes the feature in structured natural language. AI can help here — you can chat with an AI assistant to refine your thinking, flesh out edge cases, and structure the spec. But a human makes the final call on what the spec says.

2. Define acceptance criteria. This is the most important step. Each criterion should be specific enough that a machine can verify it. “The page looks good” is not a criterion. “The form is accessible and keyboard-navigable” is.

3. Assign to an agent. The spec and criteria are handed off to an AI coding agent. The agent reads the spec, analyzes the target codebase, and plans its implementation approach.

4. Agent implements and self-verifies. The agent writes the code, writes tests, and checks each acceptance criterion. If a criterion fails, the agent iterates on its own implementation before surfacing anything for review.

5. Pull request and review. The agent opens a PR with all changes, a summary of what was implemented, and a report on which criteria pass. An engineer reviews the PR like they would any other — but with the confidence that the agent has already verified its work against a shared contract.

This cycle can repeat in parallel. If you have ten specs ready, you can run ten agents simultaneously, each working in an isolated sandbox. The backlog moves at machine speed; the review process stays human.

Who writes the specs?

One of the most significant shifts in spec-driven development is who gets to contribute to shipping software.

Traditionally, only engineers translate requirements into working code. Product managers write PRDs, designers hand off mockups, and engineers interpret everything into implementation. Every handoff is a potential point of miscommunication.

With spec-driven development, anyone who can describe a feature clearly — including acceptance criteria — can initiate work that results in a pull request. Product managers become direct contributors to the engineering pipeline. Junior engineers can take on more complex work because the spec provides guard rails. The AI agent handles the implementation details; the human handles the intent.

This doesn’t mean engineers become irrelevant — far from it. Engineers review every PR, steer agents when they go off track, make architectural decisions, and handle the problems that agents can’t solve. But they spend less time on translation work and more time on judgment work.

The role of acceptance criteria

If the spec is the contract, acceptance criteria are the enforcement mechanism.

Without criteria, an AI agent has no way to objectively evaluate its own work. It writes code, and a human has to manually verify that it does what was intended. This is the same review bottleneck that existed before AI — you’ve just replaced “engineer writes code” with “engineer reviews AI code,” and the throughput doesn’t actually improve much.

With criteria, the agent can close the loop on its own. It runs the tests, checks the conditions, and only opens a PR when everything passes. The human reviewer’s job shifts from “does this work?” to “is this well-architected?” — a much more valuable use of their time.

Good acceptance criteria share a few properties. They’re specific enough to test programmatically or verify visually. They’re independent — one criterion’s pass or fail doesn’t depend on another. And they’re complete — together, they define the full scope of “done” for the feature.

Where this is heading

Spec-driven development isn’t a niche experiment anymore. GitHub released an open-source spec toolkit. AWS launched Kiro, an IDE built around the specify-plan-execute workflow. Thoughtworks featured it as one of the most important new engineering practices of 2025. The industry is converging on the idea that specifications, not prompts, are the right interface between humans and AI agents.

The underlying insight is straightforward: AI agents are powerful but literal. They’ll build exactly what you describe. The teams that get the best results will be the ones that get better at describing what they want — with structure, precision, and verifiable criteria.

That’s what spec-driven development is. Not a framework, not a methodology to certify in. Just the discipline of writing down what you want, defining how you’ll know it’s done, and letting an agent do the rest while you stay in control of the outcome.

Trevize is built for exactly this workflow. Write specs with AI assistance, define acceptance criteria, assign to an agent, and get a verified pull request back — without translating requirements into tasks yourself. Start building for free.

What Is Spec-Driven Development? A Better Way to Build Software with AI