All Articles
Giving Claude Agents Long-Term Memory in Production — Seven Pitfalls and the Patterns That Fix Them
A production playbook for Claude agents with long-term memory — seven pitfalls that break memory agents live, and the design patterns that fix each one.
Shadow Mode with Claude Agent SDK — Measuring Agent Accuracy on Live Traffic Without Touching Users
You want to ship an AI agent to production, but you can't measure its real accuracy without exposing real users. Shadow mode solves that paradox. This guide shows how to run a Claude Agent SDK agent alongside your existing workflow, log the deltas, and promote it step by step.
High-Availability Patterns for the Claude API — Making Sonnet/Haiku/Opus Fallback Work in Production
A single-model Claude API integration will fall over the first time rate limits or a regional hiccup land at peak hours. This is the production pattern for a Sonnet → Opus → Haiku fallback chain, with circuit breakers, streaming coverage, and the pitfalls you only learn the hard way.
Using tool_choice to Cut Wasted Inference: Four Modes and Cost Patterns for Production
tool_choice is one of the most underused parameters in the Claude API. The four modes — auto, any, tool, and none — each change both behavior and token cost. Here are the patterns I reach for in production, with runnable code.
Running Claude API Parallel Tool Use in Production — Controlling Concurrency, Designing for Partial Failure, and Cutting Latency
Claude API's parallel tool use can cut agent latency in half — but partial failures and state conflicts show up fast in production. Here's how to control concurrency, design error handling, and add observability.
Production Prompt-Injection Defense for the Claude API — Detection, Sanitization, and Layered Guardrails
A practical, code-first design guide for defending Claude API applications against prompt injection — covering input sanitization, channel separation, output validation, and red-teaming for long-term safety.
Implementing Progressive Delivery with the Claude Agent SDK: Canary, Feature Flags, and Automatic Rollback Patterns for Production
Production-grade patterns for safely rolling out AI agents built with the Claude Agent SDK. Combines canary traffic splitting, feature flags, and SLO-driven automatic rollback with runnable TypeScript/Hono implementation code.
Production-Grade Resilience Patterns for Claude API Streaming
Streaming with the Claude API looks easy until you run it in production. This is a battle-tested collection of patterns — disconnection recovery, deduplication, partial tool_use handling — with code you can drop into your codebase today.
Building Fault-Tolerant Long-Running AI Workflows with Claude Agent SDK × Temporal.io — A Production Design Guide to Durable Execution and Saga Patterns
A complete production guide to combining Claude Agent SDK with Temporal.io to build AI workflows that survive crashes, restarts, and multi-day human approval gates. Durable Execution, retry policies, saga compensation, and signal integration patterns.
Handling Frequent 529 Overloaded Errors from the Claude API — A Practical Playbook
A 529 Overloaded response from the Claude API is a very different animal from a 429 rate limit. Here is the retry, fallback, and circuit breaker playbook I actually use in production to keep services responsive when Anthropic's platform is temporarily saturated.
Designing Idempotency in the Claude Agent SDK: Production Patterns for Safe Retries
How to prevent double-charged customers, duplicate emails, and inventory drift when your Claude Agent SDK retries or resumes. Covers idempotency keys, outbox patterns, and wrapper decorators with working code.
How to Set budget_tokens for Claude Extended Thinking: A Practical Guide Based on Cost, Quality, and Latency
Are you setting budget_tokens to 'something generous and hoping for the best'? Here is a practical framework for choosing the right value per task type, grounded in real measurements.