●WWDC — WWDC 2026 confirms Siri runs on Google Gemini; third-party handoff to ChatGPT is dropped, and Siri AI won't ship in the EU under the DMA at iOS 27●BILLING — 6 days until the Jun 15 change: Agent SDK, headless Claude Code, GitHub Actions, and third-party agents move to API-rate monthly credit●OUTAGE — claude.ai, Claude Code, and Cowork saw an outage (Jun). Scheduled runs are safest when built around fallbackModel and retries●DYNAMIC-WORKFLOWS — Dynamic workflows are on by default on Max/Team and the API, for codebase-wide bug hunts and independent verification●ULTRACODE — Claude Code's new ultracode setting sits in the effort menu, fixing effort to xhigh while Claude decides when to run a workflow●OPUS4.8 — Claude Opus 4.8 is settled in as the default across major plans, with stronger coding, agentic, and reasoning skills●WWDC — WWDC 2026 confirms Siri runs on Google Gemini; third-party handoff to ChatGPT is dropped, and Siri AI won't ship in the EU under the DMA at iOS 27●BILLING — 6 days until the Jun 15 change: Agent SDK, headless Claude Code, GitHub Actions, and third-party agents move to API-rate monthly credit●OUTAGE — claude.ai, Claude Code, and Cowork saw an outage (Jun). Scheduled runs are safest when built around fallbackModel and retries●DYNAMIC-WORKFLOWS — Dynamic workflows are on by default on Max/Team and the API, for codebase-wide bug hunts and independent verification●ULTRACODE — Claude Code's new ultracode setting sits in the effort menu, fixing effort to xhigh while Claude decides when to run a workflow●OPUS4.8 — Claude Opus 4.8 is settled in as the default across major plans, with stronger coding, agentic, and reasoning skills
BDD with Claude Code in Production— From Gherkin Scenario Generation to Cross-Team Test Culture
A production-ready guide to Behavior-Driven Development with Claude Code. Learn how to auto-generate Gherkin scenarios, implement step definitions, integrate with Playwright/Cucumber, and build a cross-team test culture — all with working code examples.
Have you ever tried introducing BDD (Behavior-Driven Development), only to hit a wall — scenarios you couldn't write, an explosion of step definitions to maintain, and a framework that only engineers ended up touching?
I've been there. Across multiple projects, I've attempted and then abandoned BDD. Learning Gherkin syntax was manageable, but writing scenarios that genuinely reflected business value from scratch was harder than expected, and the maintenance cost never felt worth it.
That changed when I started using Claude Code seriously. When you delegate scenario generation, step definitions, and test code automation to Claude Code, BDD transforms from something you write to something you cultivate. This guide walks you through that implementation with working, production-tested code.
What BDD Is — and Why Claude Code Changes Everything
BDD (Behavior-Driven Development) is a development methodology that describes application behavior in natural language, then uses that language as the specification for test code. Where TDD verifies code correctness, BDD documents business intent.
Using a DSL called Gherkin, scenarios look like this:
Feature: User Login Value: Only authenticated users can access the dashboard Scenario: Login succeeds with valid credentials Given the user has a registered account When they enter email "test@example.com" and password "SecurePass123" And they click the Login button Then they are redirected to the dashboard And the message "Welcome, Test User" is displayed Scenario: Login fails with incorrect password Given the user has a registered account When they enter email "test@example.com" and the wrong password "WrongPass" And they click the Login button Then the error message "Incorrect email or password" is displayed And they are not redirected to the dashboard
Writing these scenarios by hand becomes impractical as features grow in complexity. With Claude Code, you can auto-generate scenarios from requirements documents or user stories, and then automate the step definitions as well.
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦Developers stuck on BDD scenario design will be able to auto-generate Gherkin with Claude Code and start integrating it into real projects today
✦You'll gain a production-ready quality assurance system that integrates step definitions, E2E tests, and CI/CD pipelines in a single workflow
✦You'll learn how to design a test culture where non-engineers (POs, QA, designers) can actively participate — plus real team operating patterns
Secure payment via Stripe · Cancel anytime
Auto-Generating Gherkin Scenarios with Claude Code
This is where Claude Code really earns its keep. Pass requirements documents or user stories to Claude Code, and it generates BDD-appropriate scenarios automatically.
Writing BDD Guidelines in CLAUDE.md
<!-- .claude/CLAUDE.md -->## BDD Scenario Generation GuidelinesWhen implementing new features or modifying existing ones,always create BDD scenarios first using the following process.### Gherkin Writing Rules- Feature descriptions must include a one-sentence "underlying value"- Scenario titles should make the condition and outcome immediately clear- Each Given/When/Then step should describe only one action per line- Use data tables and Examples to keep scenarios concise- Use business terminology — never include technical implementation details### Generation Commands- `bdd generate <feature-name>` to generate scenarios- `bdd steps <feature-file>` to generate step definitions- `bdd review` for quality review of existing scenarios
A sample Claude Code session for scenario generation:
# Example Claude Code interaction
You: Generate BDD scenarios for a shopping cart feature.
Requirements: add/remove items, change quantity, calculate total price.
Claude Code: [Generates features/cart/cart.feature]
Feature: Shopping Cart
Value: Users can temporarily store desired products and review the total price
Background:
Given the user is logged in
And product "Wireless Earbuds Pro" is in stock with quantity 10 at price $128.00
And product "USB Cable" is in stock with quantity 50 at price $8.80
Scenario: Adding a product to the cart
When the user opens the product page for "Wireless Earbuds Pro"
And they click the "Add to Cart" button
Then the cart icon shows the number "1"
And opening the cart page shows 1 "Wireless Earbuds Pro"
And the cart total shows $128.00
Scenario: Changing the quantity of a cart item
Given "Wireless Earbuds Pro" is in the cart with quantity 1
When they change the quantity to "3"
Then the cart total shows $384.00
Scenario Outline: Adding multiple products to the cart
Given the cart is empty
When they add the following products to the cart:
| Product Name | Qty |
| Wireless Earbuds Pro | <qty1> |
| USB Cable | <qty2> |
Then the cart total shows $<expected>
Examples:
| qty1 | qty2 | expected |
| 1 | 1 | 136.80 |
| 2 | 3 | 282.40 |
| 0 | 5 | 44.00 |
Scenario: Error is shown when quantity exceeds stock
Given "Wireless Earbuds Pro" is in the cart with quantity 9
When they try to change the quantity to "11"
Then the error message "Cannot exceed available stock (Available: 10)" is displayed
And the cart quantity remains at 9
Writing exhaustive scenarios like these manually is extremely difficult, but when you delegate to Claude Code, edge cases you'd typically overlook — insufficient stock, zero quantities, etc. — are naturally included.
Auto-Generating Step Definitions
Once scenarios are in place, step definitions come next. Pass the .feature file to Claude Code and it generates TypeScript step definitions automatically.
// steps/cart/cart.steps.ts// Auto-generated step definitions by Claude Code (with error handling)import { Given, When, Then } from '@cucumber/cucumber';import { expect } from '@playwright/test';import type { CustomWorld } from '../support/world';Given('the cart is empty', async function(this: CustomWorld) { // Clear the cart via API const response = await this.page.request.post('/api/cart/clear', { headers: { Authorization: `Bearer ${this.authToken}` } }); if (!response.ok()) { throw new Error(`Failed to clear cart: ${response.status()} ${await response.text()}`); } // Navigate to cart page and verify it's empty await this.page.goto('/cart'); await expect(this.page.getByTestId('cart-empty-message')).toBeVisible();});When('the user opens the product page for {string}', async function(this: CustomWorld, productName: string) { await this.page.goto('/products'); const productLink = this.page.getByRole('link', { name: productName }); await expect(productLink).toBeVisible({ timeout: 5000 }); await productLink.click(); await this.page.waitForLoadState('networkidle'); this.currentProductName = productName;});When('they click the {string} button', async function(this: CustomWorld, buttonText: string) { const button = this.page.getByRole('button', { name: buttonText }); await expect(button).toBeEnabled({ timeout: 3000 }); await button.click(); // Wait for the cart API response await this.page.waitForResponse( response => response.url().includes('/api/cart') && response.status() === 200, { timeout: 5000 } );});Then('the cart icon shows the number {string}', async function(this: CustomWorld, count: string) { const cartBadge = this.page.getByTestId('cart-badge'); await expect(cartBadge).toBeVisible(); await expect(cartBadge).toHaveText(count);});When('they add the following products to the cart:', async function(this: CustomWorld, dataTable: any) { const items = dataTable.hashes(); for (const item of items) { const qty = parseInt(item['Qty']); if (qty === 0) continue; await this.page.goto('/products'); const productLink = this.page.getByRole('link', { name: item['Product Name'] }); await productLink.click(); await this.page.waitForLoadState('networkidle'); if (qty > 1) { const qtyInput = this.page.getByRole('spinbutton', { name: 'Quantity' }); await qtyInput.fill(String(qty)); } await this.page.getByRole('button', { name: 'Add to Cart' }).click(); await this.page.waitForResponse( response => response.url().includes('/api/cart') && response.status() === 200, { timeout: 5000 } ); }});Then('the cart total shows ${int}', async function(this: CustomWorld, expectedPrice: number) { await this.page.goto('/cart'); const totalElement = this.page.getByTestId('cart-total-price'); await expect(totalElement).toBeVisible(); const priceText = await totalElement.textContent(); const actualPrice = parseFloat(priceText?.replace(/[^0-9.]/g, '') || '0'); expect(actualPrice).toBeCloseTo(expectedPrice, 2);});
One thing commonly overlooked in step definitions is error handling. When you let Claude Code generate them, it naturally includes waitForResponse, timeout settings, and explicit error messages — the parts I tend to skip when writing by hand. Claude Code implements them faithfully.
Designing the World Object
Cucumber's World is the mechanism for sharing state between steps. Designing it well dramatically improves test readability and reusability.
Here are three places where BDD adoption typically goes wrong.
1. Scenarios become too technical
# ❌ Too technical (avoid this)When the user sends a POST request to /api/auth/loginAnd the response returns 200 OK with a JWT token# ✅ Business-oriented scenarioWhen the user enters valid credentials and clicks LoginThen the dashboard is displayed
Claude Code can generate implementation-heavy scenarios if you don't explicitly tell it to use a business perspective. Make your writing rules clear in CLAUDE.md.
2. Background sections become bloated
When the Background section contains 10+ lines of preconditions, readability drops sharply. Asking Claude Code to "refactor this Background to three lines or fewer" typically produces a proposal to extract test data into fixtures.
3. Inconsistent step granularity
Writing "Login" as one step in some scenarios and three steps in others makes maintenance painful. Organizing shared steps in a common/ folder and periodically asking Claude Code to "identify what can be extracted as shared steps" is an effective practice.
4. Dealing with flaky tests
In modern web apps with heavy async processing, timing-dependent test failures are common. Here's the pattern Claude Code recommends:
The original purpose of BDD is for engineers and business stakeholders to share the same scenarios as a common language. Here's a workflow that enables non-engineers using Claude Code.
Scenario Creation Flow for POs and Product Teams
Have them write requirements in bullet points (natural language is fine)
Use Claude Code to convert to Gherkin
Have the PO review and revise (they shouldn't need to touch the technical parts)
Engineers implement the step definitions
# Requirements written by PO (natural language)
"Users should be able to change their profile photo.
Only JPEG and PNG formats. Under 5MB. Changes reflected immediately."
# Gherkin generated by Claude Code
Feature: Profile Photo Update
Scenario: Profile photo changes successfully with a valid image
Given the user is on the profile settings page
When they upload a 2MB JPEG file
Then the profile photo updates to the new image
And the change is immediately reflected in the header icon
Scenario: Images over 5MB cannot be uploaded
Given the user is on the profile settings page
When they try to upload a 6MB PNG file
Then the error message "File size must be 5MB or less" is displayed
And the profile photo is not changed
Scenario: Unsupported file formats cannot be uploaded
Given the user is on the profile settings page
When they try to upload a GIF file
Then the error message "Please select a JPEG or PNG file" is displayed
The key here is maintaining a state where POs can read and modify scenarios. Keeping implementation details out of scenarios — which you can explicitly instruct Claude Code to do — makes it much easier for non-engineers to understand what's being tested.
Periodic Quality Reviews with Claude Code
Running periodic quality checks on existing scenarios prevents drift and bloat:
# Example Claude Code review command# .claude/commands/bdd-review.md## BDD Scenario Quality ReviewPlease review the entire features/ directory with the following criteria:1. Do any scenarios contain technical implementation details?2. Are any Background sections overloaded with preconditions?3. Are there repeated steps across multiple scenarios? (consolidation opportunity)4. Are there missing edge cases? (boundary values, error cases)5. Are scenario names written in the "condition → outcome" format?6. Is any single scenario testing more than one behavior? (single responsibility)Present improvement suggestions with rewritten scenario examples.
Long-Term Tips for Sustaining BDD
A few things that help BDD stick over time.
Manage test execution time: As scenarios multiply, execution time grows. Tagging scenarios with @smoke, @regression, and @critical — then running only smoke tests per PR and full regression on a schedule — is a practical approach.
@smoke @authScenario: Login succeeds with valid credentials ...@regression @cartScenario: Error is shown when quantity exceeds stock ...
Don't let scenario debt accumulate: Deferring scenario updates during feature changes quickly creates drift from reality. Including "update related scenarios" as a PR merge requirement keeps scenarios accurate.
Set performance targets: In my projects, I aim for smoke tests to finish within 3 minutes and full regression within 20 minutes. When those thresholds look threatened, it's the signal to revisit parallel execution settings.
A Note from an Indie Developer
Your First Step
BDD isn't about "building the framework and calling it done" — it's something a team grows together over time. The best starting point is writing a single scenario for the feature you're currently working on. Just ask Claude Code "please write a Gherkin scenario for this feature" and you'll get a solid result to start from.
Step definitions, CI integration — all of that can come later. The first step is simply getting a scenario to exist. Five minutes is enough to get started. Give it a try in your current project today.
Share
Thank You for Reading
Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.