●WWDC — WWDC 2026 confirms Siri runs on Google Gemini; third-party handoff to ChatGPT is dropped, and Siri AI won't ship in the EU under the DMA at iOS 27●BILLING — 6 days until the Jun 15 change: Agent SDK, headless Claude Code, GitHub Actions, and third-party agents move to API-rate monthly credit●OUTAGE — claude.ai, Claude Code, and Cowork saw an outage (Jun). Scheduled runs are safest when built around fallbackModel and retries●DYNAMIC-WORKFLOWS — Dynamic workflows are on by default on Max/Team and the API, for codebase-wide bug hunts and independent verification●ULTRACODE — Claude Code's new ultracode setting sits in the effort menu, fixing effort to xhigh while Claude decides when to run a workflow●OPUS4.8 — Claude Opus 4.8 is settled in as the default across major plans, with stronger coding, agentic, and reasoning skills●WWDC — WWDC 2026 confirms Siri runs on Google Gemini; third-party handoff to ChatGPT is dropped, and Siri AI won't ship in the EU under the DMA at iOS 27●BILLING — 6 days until the Jun 15 change: Agent SDK, headless Claude Code, GitHub Actions, and third-party agents move to API-rate monthly credit●OUTAGE — claude.ai, Claude Code, and Cowork saw an outage (Jun). Scheduled runs are safest when built around fallbackModel and retries●DYNAMIC-WORKFLOWS — Dynamic workflows are on by default on Max/Team and the API, for codebase-wide bug hunts and independent verification●ULTRACODE — Claude Code's new ultracode setting sits in the effort menu, fixing effort to xhigh while Claude decides when to run a workflow●OPUS4.8 — Claude Opus 4.8 is settled in as the default across major plans, with stronger coding, agentic, and reasoning skills
From Spec to Production: Spec-Driven Development with Claude Code
Write a YAML spec and Claude Code auto-generates tests, implementation, and documentation. A practical guide to Spec-Driven Development covering spec formats, TDD automation, and CI/CD pipeline integration with real code examples.
Three months into a project, you open a function you wrote and think: "Wait, what was this supposed to do?" If you've experienced that moment, you're not alone.
The divergence between specs and code is one of software development's oldest problems. Test-Driven Development (TDD) offers one solution — "the test is the spec" — but the psychological overhead of writing tests first often slows teams down, especially in solo or small-team projects.
Over the past six months, I've been experimenting with a workflow built around Claude Code that sidesteps this problem entirely: write a structured specification, and let Claude Code generate the tests, implementation, and documentation from it. I call this Spec-Driven Development.
This article shares that workflow in full, including the code examples and the pitfalls I hit along the way.
Why Specs and Code Always Drift Apart
The typical development flow looks something like this:
The root problem is that specs live outside the code. They exist in chat logs and wiki pages, and keeping them synchronized with the implementation depends entirely on developer discipline.
TDD reframes this: "the test is the spec." But TDD requires you to write the test before you can write the implementation, which means you need to design the test first — another time-consuming task.
Claude Code changes the equation. When you write a structured specification, Claude Code can generate the test from it. You skip the test design step entirely.
The Spec-Driven Development Overview
The core idea is a clear division of labor:
Humans write:
Feature specifications (input → output contracts)
Business rules (edge cases, error handling policies)
The key is writing specs in a format Claude can read accurately. Structured YAML works better than natural language descriptions — Claude interprets it with higher fidelity.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦Once you write a YAML spec, Claude Code auto-generates tests, implementation, and documentation — freeing you from manual test writing while maintaining code quality
✦Automating the 'write tests first' TDD principle with AI lets you eliminate bugs from the start. You can run the Red→Green cycle without writing a single line of code manually
✦A fully automated GitHub Actions pipeline—from spec to CI/CD—can be integrated into your project today, ensuring spec changes automatically propagate to tests, implementation, and docs
Secure payment via Stripe · Cancel anytime
Step 1 — Writing Specs Claude Can Read
Here's a YAML spec format I've settled on after several iterations. It captures the function name, inputs, outputs, and test cases in a structure Claude Code can parse reliably.
# specs/user_auth.yamlfeature: User Authenticationversion: "1.0"module: src/authfunctions: - name: validatePassword description: Validates a password and returns a strength score inputs: - name: password type: string constraints: - min_length: 8 - max_length: 128 outputs: - name: result type: object fields: - name: valid type: boolean - name: score type: integer range: [0, 100] - name: errors type: array items: string test_cases: - description: Valid password with mixed characters input: "Secure@Pass1" expected: valid: true score_min: 80 errors: [] - description: Password too short input: "abc" expected: valid: false errors: ["Password must be at least 8 characters"] - description: Empty string input: "" expected: valid: false errors: ["Password is required"] - description: Exceeds maximum length input: "a" * 129 expected: valid: false errors: ["Password must not exceed 128 characters"] - name: hashPassword description: Hash a plaintext password using bcrypt inputs: - name: plaintext type: string outputs: - name: hash type: string format: bcrypt security: - salt_rounds: 12 - timing_safe: true test_cases: - description: Same password produces different hashes each time assertion: hash1 \!= hash2 for same input - description: Hash is in bcrypt format assertion: hash.startswith("$2b$")
With this spec written, the Claude Code prompt is simple:
Read @specs/user_auth.yaml and generate:
1. tests/test_user_auth.py — pytest test code covering all test_cases
2. Include explicit assertions with error messages for each case
3. Add boundary value tests for numeric constraints
How specific should specs be?
The more concrete the test cases, the better the generated tests. "Return an error" is vague. "Return errors as an array where each item is a user-facing string" gives Claude Code what it needs to write a meaningful assertion.
On the other hand, don't over-specify the implementation. "Use bcrypt" is a valid constraint. "Loop through characters three times" is an implementation detail — leave that to Claude Code's judgment. The best implementations usually come from giving Claude Code clear contracts, not prescriptive procedures.
Step 2 — Auto-Generating Tests from the Spec
Here's what Claude Code generates from the spec above:
# tests/test_user_auth.pyimport pytestfrom src.auth import validatePassword, hashPasswordimport bcryptclass TestValidatePassword: """Tests for password validation""" def test_valid_password_returns_high_score(self): """Valid password with mixed chars: valid=True, score>=80""" result = validatePassword("Secure@Pass1") assert result["valid"] is True, f"Expected valid=True, got {result['valid']}" assert result["score"] >= 80, f"Expected score >= 80, got {result['score']}" assert result["errors"] == [], f"Expected no errors, got {result['errors']}" def test_too_short_password(self): """Password under 8 chars returns valid=False with length error""" result = validatePassword("abc") assert result["valid"] is False assert "Password must be at least 8 characters" in result["errors"], \ f"Expected length error, got {result['errors']}" def test_empty_password(self): """Empty string returns valid=False with required field error""" result = validatePassword("") assert result["valid"] is False assert "Password is required" in result["errors"] def test_password_at_max_length(self): """128-character password is valid (boundary value)""" result = validatePassword("A" * 120 + "@1aBcDe\!\!") assert result["valid"] is True, f"128-char password should be valid: {result}" def test_password_exceeds_max_length(self): """129-character password returns valid=False""" result = validatePassword("a" * 129) assert result["valid"] is False assert "Password must not exceed 128 characters" in result["errors"] def test_password_exactly_8_chars(self): """8-character password is valid (boundary value)""" result = validatePassword("Abcde\!1x") assert result["valid"] is True, f"8-char valid password should pass: {result}" def test_score_is_always_in_range(self): """Score is always between 0 and 100""" for pwd in ["weak", "Stronger1\!", "V3ryStr0ng@Passw0rd\!"]: result = validatePassword(pwd) assert 0 <= result.get("score", 0) <= 100, \ f"Score out of range for '{pwd}': {result.get('score')}"class TestHashPassword: """Tests for password hashing""" def test_same_password_produces_different_hashes(self): """Salt ensures different hashes for the same input""" hash1 = hashPassword("TestPassword1\!") hash2 = hashPassword("TestPassword1\!") assert hash1 \!= hash2, "Same password should produce different hashes due to salt" def test_hash_is_bcrypt_format(self): """Generated hash is in bcrypt format""" hash_value = hashPassword("TestPassword1\!") assert hash_value.startswith("$2b$"), \ f"Hash should start with $2b$, got: {hash_value[:10]}..." def test_hash_can_be_verified(self): """Original password verifies against generated hash""" password = "TestPassword1\!" hash_value = hashPassword(password) assert bcrypt.checkpw(password.encode(), hash_value.encode()), \ "Original password should match the generated hash" def test_wrong_password_fails_verification(self): """Wrong password fails hash verification""" hash_value = hashPassword("CorrectPassword1\!") assert not bcrypt.checkpw("WrongPassword1\!".encode(), hash_value.encode()), \ "Wrong password should fail hash verification"
Notice that Claude Code added boundary value tests (127, 128, 129 characters) that weren't explicitly listed in the spec — it inferred them from the constraint definitions. The timing_safe: true security constraint also prompted Claude Code to add the bcrypt salt verification test.
Run this test suite now and everything fails. That's the Red phase, and it's intentional.
Step 3 — Auto-Generating Implementation from Failing Tests
Feed the pytest output back to Claude Code:
The following tests are failing:
[paste pytest output]
Following the spec at @specs/user_auth.yaml, generate an implementation in
src/auth.py that passes all tests.
Constraints:
- Use bcrypt with salt_rounds=12
- Use early return for input validation
- Return errors as an array (support multiple simultaneous errors)
- Use Python 3.11+ type hints throughout
The generated implementation:
# src/auth.pyfrom __future__ import annotationsimport reimport bcryptdef validatePassword(password: str) -> dict: """ Validate a password and return a strength score. Args: password: The password string to validate Returns: dict with keys: valid (bool), score (int 0-100), errors (list[str]) """ # Empty check — highest priority if not password: return {"valid": False, "score": 0, "errors": ["Password is required"]} errors: list[str] = [] # Length constraints if len(password) > 128: errors.append("Password must not exceed 128 characters") if len(password) < 8: errors.append("Password must be at least 8 characters") # Early return if validation failed if errors: return {"valid": False, "score": 0, "errors": errors} # Strength scoring score = min(len(password) * 2, 30) # Length bonus (max 30) if re.search(r'[a-z]', password): score += 15 # Lowercase if re.search(r'[A-Z]', password): score += 15 # Uppercase if re.search(r'\d', password): score += 20 # Digits if re.search(r'[\!@#$%^&*(),.?":{}|<>]', password): score += 20 # Symbols score = min(score, 100) return {"valid": True, "score": score, "errors": []}def hashPassword(plaintext: str) -> str: """ Hash a plaintext password using bcrypt. Args: plaintext: The password to hash Returns: bcrypt hash string in $2b$ format Note: Uses salt_rounds=12. Each call generates a unique salt, so the same password produces a different hash each time. """ salt = bcrypt.gensalt(rounds=12) hashed = bcrypt.hashpw(plaintext.encode('utf-8'), salt) return hashed.decode('utf-8')
Run pytest again — everything passes. That's the Green phase.
Running the Red → Green cycle autonomously
If you'd rather not copy-paste test results manually:
Repeat the following until all tests pass:
1. Run: pytest tests/
2. Analyze which tests are failing and why
3. Update src/auth.py to fix the failures
4. If a new error appears, treat it as a separate issue to address
Stop when all tests pass and report the final results.
Claude Code will autonomously loop through test → analyze → fix → test. For straightforward features, it typically reaches Green in 3–5 cycles.
You can also pair this with Claude Code's Hooks workflow automation to trigger the cycle automatically when tests fail — eliminating even the manual step of starting the loop.
Step 4 — Auto-Updating Documentation from Implementation
Once tests and implementation are solid, documentation follows automatically:
Analyze the implementation in src/auth.py and update:
1. docs/api/auth.md — API documentation for each function (args, returns, usage example)
2. The "Authentication Module" section in CLAUDE.md
3. Append a summary of changes to CHANGELOG.md in Keep a Changelog format
Example generated API documentation:
# Authentication Module API Reference## `validatePassword(password: str) -> dict`Validates password strength and returns a result object.#### Arguments- **password** (str): The password string to validate#### Return value```json{ "valid": true, "score": 85, "errors": []}```- **valid** (bool): Whether the password passed validation- **score** (int): Strength score from 0 to 100- **errors** (list[str]): Error messages when valid is False#### Usage example```pythonresult = validatePassword("Secure@Pass1")if not result["valid"]: for error in result["errors"]: print(f"Error: {error}")```
Step 5 — Full Automation with GitHub Actions
Here's a workflow that triggers whenever a spec file changes. Pairing it with a Claude Code × GitHub Actions guide gives you fine-grained control over each pipeline stage.
# .github/workflows/spec-driven.ymlname: Spec-Driven Development Pipelineon: push: paths: - 'specs/**/*.yaml'jobs: generate-and-test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Set up Python uses: actions/setup-python@v5 with: python-version: '3.11' cache: 'pip' - name: Install dependencies run: pip install -r requirements.txt - name: Install Claude Code run: npm install -g @anthropic-ai/claude-code - name: Generate tests from changed specs env: ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} run: | CHANGED_SPECS=$(git diff --name-only HEAD~1 HEAD -- 'specs/**/*.yaml') for spec in $CHANGED_SPECS; do echo "Generating from spec: $spec" claude --print " Read $spec and generate pytest test code. - Output path: tests/$(basename $spec .yaml)_test.py - Cover all test_cases in the spec - Add boundary value tests for numeric constraints - Include error handling tests " done - name: Run tests (Red phase — expected to fail) run: pytest tests/ -v --tb=short 2>&1 | tee /tmp/test_results.txt || true - name: Generate implementation from failing tests env: ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} run: | TEST_OUTPUT=$(cat /tmp/test_results.txt) claude --print " The following tests are failing. Generate implementation following the spec to make all tests pass. Test output: $TEST_OUTPUT Constraints: - Do not break existing code - Use type hints - Include error handling " - name: Verify Green phase run: pytest tests/ -v --tb=long - name: Update documentation env: ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} run: | CHANGED=$(git diff --name-only HEAD~1 HEAD -- 'specs/**/*.yaml') claude --print " Analyze the updated implementation and update docs/ API documentation. Changed specs: $CHANGED " - name: Commit generated files run: | git config --local user.email "github-actions[bot]@users.noreply.github.com" git config --local user.name "GitHub Actions" git add tests/ src/ docs/ git diff --staged --quiet || git commit -m "chore: auto-update tests, implementation, and docs from spec change" git push
Once this is in place, a spec change automatically flows through tests → implementation → documentation, all committed back to the repo.
Five Pitfalls I Hit in Production
Pitfall 1: YAML escaping breaks on special characters
Error messages containing : or # need quoting in YAML, or the parser will fail silently.
# ❌ This causes a YAML parse errorerrors: ["Password is required: enter a value"]# ✅ Single-quote the stringerrors: ['Password is required: enter a value']
Pitfall 2: Claude Code "improves" the spec without being asked
Sometimes Claude Code adds constraints that aren't in the spec — for example, generating a test that requires at least one uppercase letter, even though the spec doesn't require that. To prevent this, be explicit in the prompt:
Do not add constraints that aren't in the spec.
If you add extra tests beyond the spec, mark them with:
# EXTRA: not required by spec
Pitfall 3: "Cheat" implementations that game the tests
When Claude Code runs the Red → Green cycle autonomously, it occasionally generates implementations that hardcode test input values instead of implementing the actual logic.
# ❌ What Claude Code sometimes generatesdef validatePassword(password: str) -> dict: if password == "Secure@Pass1": return {"valid": True, "score": 85, "errors": []} if password == "abc": return {"valid": False, "score": 0, "errors": ["..."]} # ...
The fix is to include boundary value tests in the spec. Hardcoding can't cover all boundary values, so Claude Code is forced to implement the actual logic.
Pitfall 4: Generated files get manually edited and fall out of sync
Once developers realize generated code "almost works," the temptation to edit it directly is strong. This defeats the whole system.
The solution: add a header comment to every generated file.
# AUTO-GENERATED from specs/user_auth.yaml v1.1# Last updated: 2026-04-17# DO NOT EDIT MANUALLY — update the spec and regenerate
Add a CI check that verifies this header is present and the version matches the spec.
Pitfall 5: External dependencies require mocks in CI
If a function calls a database or external API, Claude Code will generate tests that make real connections — which fail in CI. Add a testing section to the spec:
testing: environment_constraints: - database: "mock required (no real DB in CI)" - external_apis: "mock with pytest-mock" mock_library: "pytest-mock"
With this, Claude Code automatically generates mock-based tests instead of relying on real connections.
Where to Start Today
You don't need to build the full pipeline from day one. The simplest starting point is writing a YAML spec for a single function you're about to implement, then asking Claude Code to generate the tests.
# Start small — a single functionfeature: Currency Formattingfunctions: - name: formatCurrency inputs: - name: amount type: number outputs: - name: result type: string test_cases: - input: 1000 expected: "$1,000" - input: 0 expected: "$0" - input: -500 expected: "-$500"
Even a 20-line spec like this is enough to see Claude Code generate a useful test suite. Start there, get a feel for how precise you need to be, and iterate.
The CI/CD integration can wait until you're comfortable with the local workflow. Begin with the three-step loop: write spec → generate tests → generate implementation. The full automation follows naturally once the workflow clicks.
The real value of Spec-Driven Development isn't any individual tool — it's the structural guarantee that specs and code stay in sync. When a spec changes, the tests and implementation change with it. That single property eliminates an entire class of bugs and documentation debt that most projects quietly carry for years.
Share
Thank You for Reading
Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.