⬡ API & SDK/2026-04-08Advanced

Claude API Webhooks & Async Processing: Error Patterns and Recovery Strategies

A practical guide to handling errors when integrating Claude API with webhooks and async pipelines. Covers timeouts, duplicate processing, idempotency, dead-letter queues, circuit breakers, and graceful degradation with full Python examples.

API²⁸ webhook⁴ async² error-handling¹¹ production¹¹¹ idempotency⁶

✦ Premium Article

Integrating Claude API into webhook-driven or asynchronous processing pipelines introduces a different class of failure modes compared to synchronous calls. You might see a webhook arrive but never trigger processing, the same job execute twice, or Claude's response time exceed a worker's deadline. Each of these has a clear solution — but you need to design for them deliberately.

This guide walks through the error categories you'll encounter in production, the patterns that prevent them, and full Python implementations for each.

Classifying Errors in Async Claude API Workflows

Before writing code, it helps to understand what kind of errors you're dealing with.

Transient errors resolve with automatic retry: 429 Too Many Requests (rate limits), 500/502/503/529 server errors, and network timeouts all fall into this category.

Permanent errors won't resolve with retry — you need to fix the request: 400 Bad Request (malformed payload), 401 Unauthorized (invalid API key), and 404 Not Found (wrong model name).

Business logic errors require application-level handling: responses that don't match your expected structure, content policy refusals, and incomplete generation (truncated output).

Infrastructure errors require system-level fixes: webhook delivery failures, queue overflow, and workers timing out before Claude finishes.

Keeping these categories distinct tells you where to intervene at each layer.

Webhook Delivery Errors: Three Patterns and Fixes

Pattern 1: Delivery Timeout and Duplicate Delivery

Most webhook providers guarantee at-least-once delivery. If your endpoint doesn't respond within a deadline (often 5–30 seconds), they retry. When Claude API calls are the bottleneck, this creates duplicate processing.

The wrong approach — blocking Claude API call in the handler:

from flask import Flask, request
 
app = Flask(__name__)
 
@app.route('/webhook', methods=['POST'])
def handle_webhook():
    data = request.json
    # ❌ This can take 30+ seconds, causing the webhook provider to retry
    result = call_claude_api(data['message'])
    return {'status': 'ok', 'result': result}

The right approach — accept immediately, process asynchronously:

from flask import Flask, request
import redis
import json
import uuid
 
app = Flask(__name__)
redis_client = redis.Redis(host='localhost', port=6379)
 
@app.route('/webhook', methods=['POST'])
def handle_webhook():
    data = request.json
    job_id = str(uuid.uuid4())
 
    # Idempotency check
    idempotency_key = data.get('idempotency_key') or data.get('event_id')
    if idempotency_key:
        existing = redis_client.get(f'processed:{idempotency_key}')
        if existing:
            return {'status': 'already_processed', 'job_id': existing.decode()}
 
    # ✅ Push to queue and return 202 immediately
    redis_client.rpush('claude_jobs', json.dumps({
        'job_id': job_id,
        'idempotency_key': idempotency_key,
        'payload': data
    }))
 
    if idempotency_key:
        redis_client.setex(f'processed:{idempotency_key}', 86400, job_id)
 
    return {'status': 'accepted', 'job_id': job_id}, 202

Pattern 2: Duplicate Processing and Idempotency

Network instability and retry policies mean the same webhook can arrive multiple times. Given Claude API's per-token cost, duplicate processing is both expensive and potentially incorrect.

Idempotency-safe worker:

import anthropic
import redis
import json
import logging
 
client = anthropic.Anthropic()
redis_client = redis.Redis(host='localhost', port=6379)
logger = logging.getLogger(__name__)
 
def process_job(job_data: dict) -> dict:
    job_id = job_data['job_id']
    idempotency_key = job_data.get('idempotency_key')
 
    # Check for cached result
    result_key = f'result:{idempotency_key or job_id}'
    cached_result = redis_client.get(result_key)
    if cached_result:
        logger.info(f"Returning cached result for: {job_id}")
        return json.loads(cached_result)
 
    # Acquire processing lock
    lock_key = f'lock:{idempotency_key or job_id}'
    lock_acquired = redis_client.set(lock_key, '1', ex=300, nx=True)
    if not lock_acquired:
        logger.warning(f"Job already in progress: {job_id}")
        return {'status': 'in_progress'}
 
    try:
        response = call_claude_with_retry(job_data['payload'])
        result = {'status': 'success', 'job_id': job_id, 'response': response}
        # Cache result for 24 hours
        redis_client.setex(result_key, 86400, json.dumps(result))
        return result
    except Exception as e:
        logger.error(f"Job failed: {job_id}, error: {e}")
        raise
    finally:
        redis_client.delete(lock_key)

Pattern 3: Backpressure Under High Webhook Volume

When webhooks arrive faster than Claude API can handle them, you'll hit 429 errors frequently. Throttle consumption with a token bucket.

import time
import threading
 
class TokenBucket:
    def __init__(self, tokens_per_minute: int):
        self.capacity = tokens_per_minute
        self.tokens = tokens_per_minute
        self.last_refill = time.time()
        self.lock = threading.Lock()
 
    def consume(self, tokens: int = 1) -> bool:
        with self.lock:
            now = time.time()
            elapsed = now - self.last_refill
            self.tokens = min(self.capacity, self.tokens + elapsed * (self.capacity / 60))
            self.last_refill = now
            if self.tokens >= tokens:
                self.tokens -= tokens
                return True
            return False
 
    def wait_and_consume(self, timeout: float = 300) -> bool:
        start = time.time()
        while time.time() - start < timeout:
            if self.consume():
                return True
            time.sleep(0.1)
        return False
 
rate_limiter = TokenBucket(tokens_per_minute=60)

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦The three webhook delivery failure patterns and how to design automatic retry logic

✦Implementing idempotency keys to prevent duplicate processing (Python examples)

✦Dead-letter queues, circuit breakers, and graceful degradation for production resilience

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Retry Logic with Exponential Backoff

Transient errors (429, 5xx) should trigger automatic retries with exponential backoff and jitter.

import anthropic
import time
import random
import logging
from typing import Optional
 
logger = logging.getLogger(__name__)
 
def call_claude_with_retry(
    payload: dict,
    max_retries: int = 5,
    base_delay: float = 1.0,
    max_delay: float = 60.0
) -> Optional[str]:
    client = anthropic.Anthropic()
    PERMANENT_ERROR_CODES = {400, 401, 403, 404}
 
    for attempt in range(max_retries + 1):
        try:
            response = client.messages.create(
                model=payload.get('model', 'claude-sonnet-4-6'),
                max_tokens=payload.get('max_tokens', 2048),
                messages=payload['messages']
            )
            return response.content[0].text
 
        except anthropic.RateLimitError as e:
            if attempt >= max_retries:
                raise
            retry_after = e.response.headers.get('retry-after')
            wait_time = float(retry_after) if retry_after else min(
                base_delay * (2 ** attempt) + random.uniform(0, 1), max_delay
            )
            logger.warning(f"Rate limited: retrying in {wait_time:.1f}s (attempt {attempt + 1}/{max_retries})")
            time.sleep(wait_time)
 
        except anthropic.APIStatusError as e:
            if e.status_code in PERMANENT_ERROR_CODES:
                logger.error(f"Permanent error {e.status_code}: not retrying")
                raise
            if attempt >= max_retries:
                raise
            wait_time = min(base_delay * (2 ** attempt) + random.uniform(0, 1), max_delay)
            logger.warning(f"Server error {e.status_code}: retrying in {wait_time:.1f}s")
            time.sleep(wait_time)
 
        except anthropic.APIConnectionError:
            if attempt >= max_retries:
                raise
            wait_time = min(base_delay * (2 ** attempt), max_delay)
            logger.warning(f"Connection error: retrying in {wait_time:.1f}s")
            time.sleep(wait_time)
 
    return None

Dead-Letter Queue Design

Jobs that exhaust all retries should move to a dead-letter queue (DLQ) for manual inspection and recovery.

def handle_job_failure(job_data: dict, error: Exception, max_retries: int = 3):
    retry_count = job_data.get('retry_count', 0)
 
    if retry_count < max_retries:
        delay = 2 ** retry_count
        retry_job = {
            **job_data,
            'retry_count': retry_count + 1,
            'last_error': str(error),
            'retry_after': time.time() + delay
        }
        redis_client.zadd('claude_jobs_delayed',
                          {json.dumps(retry_job): retry_job['retry_after']})
        logger.info(f"Queued for retry: {job_data['job_id']} (attempt {retry_count + 1})")
    else:
        dead_job = {**job_data, 'final_error': str(error), 'failed_at': time.time()}
        redis_client.rpush('claude_jobs_dead', json.dumps(dead_job))
        send_alert(
            f"[Claude API] Job exhausted all retries: {job_data['job_id']}\n"
            f"Error: {error}\nAttempts: {retry_count + 1}"
        )
        logger.error(f"Moved to DLQ: {job_data['job_id']}")

Circuit Breaker Pattern

When Claude API is genuinely down, hammering it with retries makes recovery slower. A circuit breaker detects sustained failures and stops requests temporarily.

from enum import Enum
import threading
 
class CircuitState(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"
 
class ClaudeAPICircuitBreaker:
    def __init__(self, failure_threshold=5, recovery_timeout=60.0, success_threshold=2):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.success_threshold = success_threshold
        self.state = CircuitState.CLOSED
        self.failure_count = 0
        self.success_count = 0
        self.last_failure_time = None
        self.lock = threading.Lock()
 
    def call(self, func, *args, **kwargs):
        with self.lock:
            if self.state == CircuitState.OPEN:
                if time.time() - self.last_failure_time > self.recovery_timeout:
                    self.state = CircuitState.HALF_OPEN
                    self.success_count = 0
                else:
                    raise Exception("Circuit breaker OPEN: Claude API calls blocked")
 
        try:
            result = func(*args, **kwargs)
            with self.lock:
                if self.state == CircuitState.HALF_OPEN:
                    self.success_count += 1
                    if self.success_count >= self.success_threshold:
                        self.state = CircuitState.CLOSED
                        self.failure_count = 0
                elif self.state == CircuitState.CLOSED:
                    self.failure_count = 0
            return result
        except Exception as e:
            with self.lock:
                self.failure_count += 1
                self.last_failure_time = time.time()
                if (self.state == CircuitState.CLOSED and
                        self.failure_count >= self.failure_threshold):
                    self.state = CircuitState.OPEN
                    logger.error(f"Circuit opened after {self.failure_count} consecutive failures")
                elif self.state == CircuitState.HALF_OPEN:
                    self.state = CircuitState.OPEN
            raise
 
circuit_breaker = ClaudeAPICircuitBreaker(failure_threshold=5, recovery_timeout=120.0)

Graceful Degradation

When Claude API is unavailable, graceful degradation keeps your service partially functional. The priority order is: Claude API → response cache → predefined fallback.

import hashlib
 
class GracefulClaudeHandler:
    def __init__(self):
        self.circuit_breaker = ClaudeAPICircuitBreaker()
        self.fallback_responses = {
            'summarize': 'Summarization is temporarily unavailable. Please try again later.',
            'analyze': 'Analysis is currently undergoing maintenance.',
            'default': 'We apologize for the inconvenience — the service is temporarily busy.'
        }
 
    def process(self, task_type: str, payload: dict) -> dict:
        # 1. Check cache
        cache_key = hashlib.md5(json.dumps(payload, sort_keys=True).encode()).hexdigest()
        cached = redis_client.get(f'response_cache:{cache_key}')
        if cached:
            return {'source': 'cache', 'result': json.loads(cached)}
 
        # 2. Try Claude API via circuit breaker
        try:
            result = self.circuit_breaker.call(call_claude_with_retry, payload)
            redis_client.setex(f'response_cache:{cache_key}', 3600, json.dumps(result))
            return {'source': 'claude_api', 'result': result}
 
        except Exception as e:
            logger.error(f"Claude API unavailable: {e}")
            fallback = self.fallback_responses.get(task_type, self.fallback_responses['default'])
            return {'source': 'fallback', 'result': fallback, 'degraded': True}

A Note from an Indie Developer

Looking back: Seven Patterns for Production-Ready Async Claude API

Here's the complete checklist for integrating Claude API into production async workflows.

Immediate priority: First, implement instant webhook acceptance — queue the work and return 202 immediately, never block in the handler. Second, ensure idempotency — check a result cache before processing to prevent duplicate execution.

Infrastructure: Add exponential backoff with jitter for transient error retries, and a dead-letter queue to capture jobs that exhaust all attempts.

High availability: Implement a circuit breaker to stop cascading failures, and graceful degradation to serve fallback responses when Claude API is down.

Observability: Monitor error rates, processing time, and DLQ depth continuously with appropriate alerting thresholds.

Implementing these in order gives you a system that handles production failures without requiring a late-night intervention. We hope this helps you ship Claude-powered features with confidence.

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.