Articles/Claude Code

⟐ Claude Code/2026-04-17Advanced

From Spec to Production: Spec-Driven Development with Claude Code

Write a YAML spec and Claude Code auto-generates tests, implementation, and documentation. A practical guide to Spec-Driven Development covering spec formats, TDD automation, and CI/CD pipeline integration with real code examples.

Claude Code²⁰³ Spec-Driven Development TDD⁴ Test Automation CI/CD¹⁸ Specifications Quality Engineering Workflow⁸

✦ Premium Article

Three months into a project, you open a function you wrote and think: "Wait, what was this supposed to do?" If you've experienced that moment, you're not alone.

The divergence between specs and code is one of software development's oldest problems. Test-Driven Development (TDD) offers one solution — "the test is the spec" — but the psychological overhead of writing tests first often slows teams down, especially in solo or small-team projects.

Over the past six months, I've been experimenting with a workflow built around Claude Code that sidesteps this problem entirely: write a structured specification, and let Claude Code generate the tests, implementation, and documentation from it. I call this Spec-Driven Development.

This article shares that workflow in full, including the code examples and the pitfalls I hit along the way.

Why Specs and Code Always Drift Apart

The typical development flow looks something like this:

Gather requirements (Slack messages, Confluence pages, meetings)
Write the implementation
Open a PR once it "works"
Add tests if there's time
Document it later (which never comes)

The root problem is that specs live outside the code. They exist in chat logs and wiki pages, and keeping them synchronized with the implementation depends entirely on developer discipline.

TDD reframes this: "the test is the spec." But TDD requires you to write the test before you can write the implementation, which means you need to design the test first — another time-consuming task.

Claude Code changes the equation. When you write a structured specification, Claude Code can generate the test from it. You skip the test design step entirely.

The Spec-Driven Development Overview

The core idea is a clear division of labor:

Humans write:

Feature specifications (input → output contracts)
Business rules (edge cases, error handling policies)
Quality requirements (performance, security constraints)

Claude Code generates:

Test code based on the spec
Implementation that passes the tests
API documentation from the implementation
Review comments on diffs

The key is writing specs in a format Claude can read accurately. Structured YAML works better than natural language descriptions — Claude interprets it with higher fidelity.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦Once you write a YAML spec, Claude Code auto-generates tests, implementation, and documentation — freeing you from manual test writing while maintaining code quality

✦Automating the 'write tests first' TDD principle with AI lets you eliminate bugs from the start. You can run the Red→Green cycle without writing a single line of code manually

✦A fully automated GitHub Actions pipeline—from spec to CI/CD—can be integrated into your project today, ensuring spec changes automatically propagate to tests, implementation, and docs

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Step 1 — Writing Specs Claude Can Read

Here's a YAML spec format I've settled on after several iterations. It captures the function name, inputs, outputs, and test cases in a structure Claude Code can parse reliably.

# specs/user_auth.yaml
feature: User Authentication
version: "1.0"
module: src/auth
 
functions:
  - name: validatePassword
    description: Validates a password and returns a strength score
    inputs:
      - name: password
        type: string
        constraints:
          - min_length: 8
          - max_length: 128
    outputs:
      - name: result
        type: object
        fields:
          - name: valid
            type: boolean
          - name: score
            type: integer
            range: [0, 100]
          - name: errors
            type: array
            items: string
    test_cases:
      - description: Valid password with mixed characters
        input: "Secure@Pass1"
        expected:
          valid: true
          score_min: 80
          errors: []
      - description: Password too short
        input: "abc"
        expected:
          valid: false
          errors: ["Password must be at least 8 characters"]
      - description: Empty string
        input: ""
        expected:
          valid: false
          errors: ["Password is required"]
      - description: Exceeds maximum length
        input: "a" * 129
        expected:
          valid: false
          errors: ["Password must not exceed 128 characters"]
 
  - name: hashPassword
    description: Hash a plaintext password using bcrypt
    inputs:
      - name: plaintext
        type: string
    outputs:
      - name: hash
        type: string
        format: bcrypt
    security:
      - salt_rounds: 12
      - timing_safe: true
    test_cases:
      - description: Same password produces different hashes each time
        assertion: hash1 \!= hash2 for same input
      - description: Hash is in bcrypt format
        assertion: hash.startswith("$2b$")

With this spec written, the Claude Code prompt is simple:

Read @specs/user_auth.yaml and generate:
1. tests/test_user_auth.py — pytest test code covering all test_cases
2. Include explicit assertions with error messages for each case
3. Add boundary value tests for numeric constraints

How specific should specs be?

The more concrete the test cases, the better the generated tests. "Return an error" is vague. "Return errors as an array where each item is a user-facing string" gives Claude Code what it needs to write a meaningful assertion.

On the other hand, don't over-specify the implementation. "Use bcrypt" is a valid constraint. "Loop through characters three times" is an implementation detail — leave that to Claude Code's judgment. The best implementations usually come from giving Claude Code clear contracts, not prescriptive procedures.

Step 2 — Auto-Generating Tests from the Spec

Here's what Claude Code generates from the spec above:

# tests/test_user_auth.py
import pytest
from src.auth import validatePassword, hashPassword
import bcrypt
 
class TestValidatePassword:
    """Tests for password validation"""
 
    def test_valid_password_returns_high_score(self):
        """Valid password with mixed chars: valid=True, score>=80"""
        result = validatePassword("Secure@Pass1")
        assert result["valid"] is True, f"Expected valid=True, got {result['valid']}"
        assert result["score"] >= 80, f"Expected score >= 80, got {result['score']}"
        assert result["errors"] == [], f"Expected no errors, got {result['errors']}"
 
    def test_too_short_password(self):
        """Password under 8 chars returns valid=False with length error"""
        result = validatePassword("abc")
        assert result["valid"] is False
        assert "Password must be at least 8 characters" in result["errors"], \
            f"Expected length error, got {result['errors']}"
 
    def test_empty_password(self):
        """Empty string returns valid=False with required field error"""
        result = validatePassword("")
        assert result["valid"] is False
        assert "Password is required" in result["errors"]
 
    def test_password_at_max_length(self):
        """128-character password is valid (boundary value)"""
        result = validatePassword("A" * 120 + "@1aBcDe\!\!")
        assert result["valid"] is True, f"128-char password should be valid: {result}"
 
    def test_password_exceeds_max_length(self):
        """129-character password returns valid=False"""
        result = validatePassword("a" * 129)
        assert result["valid"] is False
        assert "Password must not exceed 128 characters" in result["errors"]
 
    def test_password_exactly_8_chars(self):
        """8-character password is valid (boundary value)"""
        result = validatePassword("Abcde\!1x")
        assert result["valid"] is True, f"8-char valid password should pass: {result}"
 
    def test_score_is_always_in_range(self):
        """Score is always between 0 and 100"""
        for pwd in ["weak", "Stronger1\!", "V3ryStr0ng@Passw0rd\!"]:
            result = validatePassword(pwd)
            assert 0 <= result.get("score", 0) <= 100, \
                f"Score out of range for '{pwd}': {result.get('score')}"
 
class TestHashPassword:
    """Tests for password hashing"""
 
    def test_same_password_produces_different_hashes(self):
        """Salt ensures different hashes for the same input"""
        hash1 = hashPassword("TestPassword1\!")
        hash2 = hashPassword("TestPassword1\!")
        assert hash1 \!= hash2, "Same password should produce different hashes due to salt"
 
    def test_hash_is_bcrypt_format(self):
        """Generated hash is in bcrypt format"""
        hash_value = hashPassword("TestPassword1\!")
        assert hash_value.startswith("$2b$"), \
            f"Hash should start with $2b$, got: {hash_value[:10]}..."
 
    def test_hash_can_be_verified(self):
        """Original password verifies against generated hash"""
        password = "TestPassword1\!"
        hash_value = hashPassword(password)
        assert bcrypt.checkpw(password.encode(), hash_value.encode()), \
            "Original password should match the generated hash"
 
    def test_wrong_password_fails_verification(self):
        """Wrong password fails hash verification"""
        hash_value = hashPassword("CorrectPassword1\!")
        assert not bcrypt.checkpw("WrongPassword1\!".encode(), hash_value.encode()), \
            "Wrong password should fail hash verification"

Notice that Claude Code added boundary value tests (127, 128, 129 characters) that weren't explicitly listed in the spec — it inferred them from the constraint definitions. The timing_safe: true security constraint also prompted Claude Code to add the bcrypt salt verification test.

Run this test suite now and everything fails. That's the Red phase, and it's intentional.

Step 3 — Auto-Generating Implementation from Failing Tests

Feed the pytest output back to Claude Code:

The following tests are failing:
[paste pytest output]

Following the spec at @specs/user_auth.yaml, generate an implementation in
src/auth.py that passes all tests.

Constraints:
- Use bcrypt with salt_rounds=12
- Use early return for input validation
- Return errors as an array (support multiple simultaneous errors)
- Use Python 3.11+ type hints throughout

The generated implementation:

# src/auth.py
from __future__ import annotations
import re
import bcrypt
 
def validatePassword(password: str) -> dict:
    """
    Validate a password and return a strength score.
 
    Args:
        password: The password string to validate
 
    Returns:
        dict with keys: valid (bool), score (int 0-100), errors (list[str])
    """
    # Empty check — highest priority
    if not password:
        return {"valid": False, "score": 0, "errors": ["Password is required"]}
 
    errors: list[str] = []
 
    # Length constraints
    if len(password) > 128:
        errors.append("Password must not exceed 128 characters")
 
    if len(password) < 8:
        errors.append("Password must be at least 8 characters")
 
    # Early return if validation failed
    if errors:
        return {"valid": False, "score": 0, "errors": errors}
 
    # Strength scoring
    score = min(len(password) * 2, 30)          # Length bonus (max 30)
    if re.search(r'[a-z]', password): score += 15  # Lowercase
    if re.search(r'[A-Z]', password): score += 15  # Uppercase
    if re.search(r'\d', password):    score += 20  # Digits
    if re.search(r'[\!@#$%^&*(),.?":{}|<>]', password): score += 20  # Symbols
    score = min(score, 100)
 
    return {"valid": True, "score": score, "errors": []}
 
def hashPassword(plaintext: str) -> str:
    """
    Hash a plaintext password using bcrypt.
 
    Args:
        plaintext: The password to hash
 
    Returns:
        bcrypt hash string in $2b$ format
 
    Note:
        Uses salt_rounds=12. Each call generates a unique salt,
        so the same password produces a different hash each time.
    """
    salt = bcrypt.gensalt(rounds=12)
    hashed = bcrypt.hashpw(plaintext.encode('utf-8'), salt)
    return hashed.decode('utf-8')

Run pytest again — everything passes. That's the Green phase.

Running the Red → Green cycle autonomously

If you'd rather not copy-paste test results manually:

Repeat the following until all tests pass:
1. Run: pytest tests/
2. Analyze which tests are failing and why
3. Update src/auth.py to fix the failures
4. If a new error appears, treat it as a separate issue to address

Stop when all tests pass and report the final results.

Claude Code will autonomously loop through test → analyze → fix → test. For straightforward features, it typically reaches Green in 3–5 cycles.

You can also pair this with Claude Code's Hooks workflow automation to trigger the cycle automatically when tests fail — eliminating even the manual step of starting the loop.

Step 4 — Auto-Updating Documentation from Implementation

Once tests and implementation are solid, documentation follows automatically:

Analyze the implementation in src/auth.py and update:
1. docs/api/auth.md — API documentation for each function (args, returns, usage example)
2. The "Authentication Module" section in CLAUDE.md
3. Append a summary of changes to CHANGELOG.md in Keep a Changelog format

Example generated API documentation:

# Authentication Module API Reference
 
## `validatePassword(password: str) -> dict`
 
Validates password strength and returns a result object.
 
#### Arguments
 
- **password** (str): The password string to validate
 
#### Return value
 
```json
{
  "valid": true,
  "score": 85,
  "errors": []
}
```
 
- **valid** (bool): Whether the password passed validation
- **score** (int): Strength score from 0 to 100
- **errors** (list[str]): Error messages when valid is False
 
#### Usage example
 
```python
result = validatePassword("Secure@Pass1")
if not result["valid"]:
    for error in result["errors"]:
        print(f"Error: {error}")
```

Step 5 — Full Automation with GitHub Actions

Here's a workflow that triggers whenever a spec file changes. Pairing it with a Claude Code × GitHub Actions guide gives you fine-grained control over each pipeline stage.

# .github/workflows/spec-driven.yml
name: Spec-Driven Development Pipeline
 
on:
  push:
    paths:
      - 'specs/**/*.yaml'
 
jobs:
  generate-and-test:
    runs-on: ubuntu-latest
 
    steps:
      - uses: actions/checkout@v4
 
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'
          cache: 'pip'
 
      - name: Install dependencies
        run: pip install -r requirements.txt
 
      - name: Install Claude Code
        run: npm install -g @anthropic-ai/claude-code
 
      - name: Generate tests from changed specs
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          CHANGED_SPECS=$(git diff --name-only HEAD~1 HEAD -- 'specs/**/*.yaml')
 
          for spec in $CHANGED_SPECS; do
            echo "Generating from spec: $spec"
            claude --print "
              Read $spec and generate pytest test code.
              - Output path: tests/$(basename $spec .yaml)_test.py
              - Cover all test_cases in the spec
              - Add boundary value tests for numeric constraints
              - Include error handling tests
            "
          done
 
      - name: Run tests (Red phase — expected to fail)
        run: pytest tests/ -v --tb=short 2>&1 | tee /tmp/test_results.txt || true
 
      - name: Generate implementation from failing tests
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          TEST_OUTPUT=$(cat /tmp/test_results.txt)
          claude --print "
            The following tests are failing. Generate implementation
            following the spec to make all tests pass.
 
            Test output:
            $TEST_OUTPUT
 
            Constraints:
            - Do not break existing code
            - Use type hints
            - Include error handling
          "
 
      - name: Verify Green phase
        run: pytest tests/ -v --tb=long
 
      - name: Update documentation
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          CHANGED=$(git diff --name-only HEAD~1 HEAD -- 'specs/**/*.yaml')
          claude --print "
            Analyze the updated implementation and update docs/ API documentation.
            Changed specs: $CHANGED
          "
 
      - name: Commit generated files
        run: |
          git config --local user.email "github-actions[bot]@users.noreply.github.com"
          git config --local user.name "GitHub Actions"
          git add tests/ src/ docs/
          git diff --staged --quiet || git commit -m "chore: auto-update tests, implementation, and docs from spec change"
          git push

Once this is in place, a spec change automatically flows through tests → implementation → documentation, all committed back to the repo.

Five Pitfalls I Hit in Production

Pitfall 1: YAML escaping breaks on special characters

Error messages containing : or # need quoting in YAML, or the parser will fail silently.

# ❌ This causes a YAML parse error
errors: ["Password is required: enter a value"]
 
# ✅ Single-quote the string
errors: ['Password is required: enter a value']

Pitfall 2: Claude Code "improves" the spec without being asked

Sometimes Claude Code adds constraints that aren't in the spec — for example, generating a test that requires at least one uppercase letter, even though the spec doesn't require that. To prevent this, be explicit in the prompt:

Do not add constraints that aren't in the spec.
If you add extra tests beyond the spec, mark them with:
# EXTRA: not required by spec

Pitfall 3: "Cheat" implementations that game the tests

When Claude Code runs the Red → Green cycle autonomously, it occasionally generates implementations that hardcode test input values instead of implementing the actual logic.

# ❌ What Claude Code sometimes generates
def validatePassword(password: str) -> dict:
    if password == "Secure@Pass1":
        return {"valid": True, "score": 85, "errors": []}
    if password == "abc":
        return {"valid": False, "score": 0, "errors": ["..."]}
    # ...

The fix is to include boundary value tests in the spec. Hardcoding can't cover all boundary values, so Claude Code is forced to implement the actual logic.

Pitfall 4: Generated files get manually edited and fall out of sync

Once developers realize generated code "almost works," the temptation to edit it directly is strong. This defeats the whole system.

The solution: add a header comment to every generated file.

# AUTO-GENERATED from specs/user_auth.yaml v1.1
# Last updated: 2026-04-17
# DO NOT EDIT MANUALLY — update the spec and regenerate

Add a CI check that verifies this header is present and the version matches the spec.

Pitfall 5: External dependencies require mocks in CI

If a function calls a database or external API, Claude Code will generate tests that make real connections — which fail in CI. Add a testing section to the spec:

testing:
  environment_constraints:
    - database: "mock required (no real DB in CI)"
    - external_apis: "mock with pytest-mock"
  mock_library: "pytest-mock"

With this, Claude Code automatically generates mock-based tests instead of relying on real connections.

Where to Start Today

You don't need to build the full pipeline from day one. The simplest starting point is writing a YAML spec for a single function you're about to implement, then asking Claude Code to generate the tests.

# Start small — a single function
feature: Currency Formatting
functions:
  - name: formatCurrency
    inputs:
      - name: amount
        type: number
    outputs:
      - name: result
        type: string
    test_cases:
      - input: 1000
        expected: "$1,000"
      - input: 0
        expected: "$0"
      - input: -500
        expected: "-$500"

Even a 20-line spec like this is enough to see Claude Code generate a useful test suite. Start there, get a feel for how precise you need to be, and iterate.

The CI/CD integration can wait until you're comfortable with the local workflow. Begin with the three-step loop: write spec → generate tests → generate implementation. The full automation follows naturally once the workflow clicks.

The real value of Spec-Driven Development isn't any individual tool — it's the structural guarantee that specs and code stay in sync. When a spec changes, the tests and implementation change with it. That single property eliminates an entire class of bugs and documentation debt that most projects quietly carry for years.

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.