⬡ API & SDK/2026-06-14Advanced

Making Claude Agent SDK Tools Idempotent — Stopping Double Execution with Deterministic Keys and an Outbox

An implementation log for stopping a Claude Agent SDK retry or session resume from processing the same payment twice. Three patterns — deterministic idempotency keys, an outbox, and a lightweight wrapper — with runnable code and production metrics.

claude-agent-sdk⁶ idempotency⁶ outbox reliability¹⁷ production¹¹¹

✦ Premium Article

One morning I found the same invoice ID printed twice in a payment agent's log, and my stomach dropped.

The amount was small and Stripe's Idempotency-Key had blocked the second charge, so there was no real damage. But tracing it back, the SDK's timeout retry had called the same tool twice, and in between, two rows had landed in our own database. If Stripe hadn't caught it, the customer would have been billed twice.

As agents start handling side effects, this class of incident quietly multiplies. Claude Agent SDK has session resume and tool retries built in, which is exactly what makes it robust for long tasks and transient failures. The flip side is that irreversible side effects — payments, emails, inventory decrements — stay exposed to double execution unless you design idempotency. The docs cover the per-API idempotency-key header, but rarely how to build idempotency for the agent as a whole.

What follows is the idempotency layer I rebuilt for a payment agent I ran as an indie developer, written up so you can use it directly. Three patterns: a deterministic idempotency key, an outbox, and a lightweight wrapper — with runnable code and what to measure in production.

Why idempotency is one notch harder in agents

Plain API clients need idempotency too, but agents add a wrinkle: the line between failure and success is blurry. Having caused the incident myself, what I came to understand is that three things bite at once.

First, model nondeterminism — the same intent can produce subtly different tool arguments, so you can't let the model generate the key that decides "is this the same operation." Second, partial success — if the loop crashes right after a tool succeeds, the next start has the model interpret it as "not called yet" and re-run it. Third, resume — session resume and checkpoints rewind state, so if the side-effect layer can't detect duplicates, they slip right through.

A mature SDK like Stripe accepts an Idempotency-Key, but your own DB INSERTs, email sends, and internal APIs need idempotency you build yourself. I make it a rule to get every tool idempotent before shipping; reverse that order and something breaks eventually.

Pattern 1: Derive a deterministic key from the inputs

An idempotency key needs exactly one property: the same intent must produce the same value no matter how many times you generate it. Minting a UUID on the spot is useless, so derive it deterministically by hashing the operation's inputs.

# idempotency_key.py — derive the same key deterministically from the same intent
from __future__ import annotations
import hashlib
import json
from typing import Any
 
 
def stable_idempotency_key(
    session_id: str,
    tool_name: str,
    logical_args: dict[str, Any],
    *,
    version: str = "v1",
) -> str:
    """Generate a deterministic idempotency key.
 
    Pass only the minimal args that represent intent into logical_args.
    Mixing in timestamp or retry_count makes the key change on every retry.
    """
    # Canonicalize with fixed key order (same key regardless of arg order)
    canonical = json.dumps(logical_args, sort_keys=True, separators=(",", ":"), default=str)
    raw = f"{version}|{session_id}|{tool_name}|{canonical}"
    digest = hashlib.sha256(raw.encode("utf-8")).hexdigest()
    return f"idem_{version}_{digest[:32]}"
 
 
if __name__ == "__main__":
    k1 = stable_idempotency_key(
        "sess_abc", "charge_payment",
        {"customer_id": "cus_001", "amount_jpy": 2480, "invoice_id": "inv_555"},
    )
    k2 = stable_idempotency_key(
        "sess_abc", "charge_payment",
        {"invoice_id": "inv_555", "amount_jpy": 2480, "customer_id": "cus_001"},
    )
    assert k1 == k2  # same key even with different arg order

Mixing time.time() or randomness into key generation breaks idempotency, because the key changes on every retry. Internal metadata like a retry count is just as guilty. Extract intent only — that's the rule.

The version field is a safety valve for the day you want to change key generation without colliding with old keys. I once skipped it and burned a night on DB cleanup when the key format changed. Adding it up front is cheap insurance.

Issue the session ID at the caller that launches the agent and pass it in. ClaudeSDKClient manages sessions internally, but to share with an external persistence layer you need to hold an explicit ID.

import uuid
from claude_agent_sdk import ClaudeSDKClient, ClaudeAgentOptions
 
session_id = f"sess_{uuid.uuid4().hex}"
options = ClaudeAgentOptions(
    system_prompt="You are a payment-processing agent.",
    extra_context={"session_id": session_id},  # make it reachable from the tool
)

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦Deriving a deterministic idempotency key from inputs so retries and session resumes never change it — with full code

✦Using the outbox pattern to align the agent loop and external API transaction boundaries and structurally erase double charges

✦Three production metrics — duplicate rate, outbox backlog, key collision rate — and real threshold settings

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Pattern 2: Align transaction boundaries with an outbox

Even with a key, you're left with partial failure: "the DB write succeeded but it crashed before the external API call." The outbox pattern solves this structurally.

The idea is simple: separate "the side effect itself" from "the intent to cause it." The agent's tool only writes a "please do this" record into an outbox table. The actual external call is made by a separate worker reading the outbox.

# outbox_tool.py — idempotent side-effect enqueue via outbox
from __future__ import annotations
from dataclasses import dataclass
from datetime import datetime, timezone
from typing import Any
import asyncpg
 
 
@dataclass
class OutboxEntry:
    idempotency_key: str
    operation: str
    payload: dict[str, Any]
    status: str  # 'pending' | 'completed' | 'failed'
 
 
async def enqueue_operation(
    conn: asyncpg.Connection,
    idempotency_key: str,
    operation: str,
    payload: dict[str, Any],
) -> tuple[OutboxEntry, bool]:
    """Write to outbox idempotently. Returns (entry, was_created)."""
    # ON CONFLICT DO NOTHING blocks duplicate inserts in one shot
    row = await conn.fetchrow(
        """
        INSERT INTO outbox (idempotency_key, operation, payload, status, created_at)
        VALUES ($1, $2, $3, 'pending', $4)
        ON CONFLICT (idempotency_key) DO NOTHING
        RETURNING idempotency_key, operation, payload, status
        """,
        idempotency_key, operation, payload, datetime.now(timezone.utc),
    )
    if row is not None:
        return OutboxEntry(**dict(row)), True
 
    existing = await conn.fetchrow(
        "SELECT idempotency_key, operation, payload, status FROM outbox WHERE idempotency_key=$1",
        idempotency_key,
    )
    return OutboxEntry(**dict(existing)), False
 
 
async def charge_payment_tool(conn, idempotency_key, customer_id, amount_jpy, invoice_id):
    entry, created = await enqueue_operation(
        conn, idempotency_key, "stripe_charge",
        {"customer_id": customer_id, "amount_jpy": amount_jpy, "invoice_id": invoice_id},
    )
    if not created:
        # Already exists = retry. Return state and let the tool finish cleanly.
        return {"status": "already_enqueued", "state": entry.status, "key": idempotency_key}
    return {"status": "enqueued", "key": idempotency_key}
 
 
# DDL (PostgreSQL)
# CREATE TABLE outbox (
#   idempotency_key TEXT PRIMARY KEY,
#   operation TEXT NOT NULL,
#   payload JSONB NOT NULL,
#   status TEXT NOT NULL,
#   attempts INT NOT NULL DEFAULT 0,
#   last_error TEXT,
#   created_at TIMESTAMPTZ NOT NULL,
#   completed_at TIMESTAMPTZ
# );

The outbox works because the tool completes by recording intent, and the external API's success or failure never touches the tool's return value. The model sees either enqueued or already_enqueued and moves on. That aligns the agent loop's and the external API's transaction boundaries, and the room for a double charge structurally disappears.

The downstream worker grabs pending rows with SELECT ... FOR UPDATE SKIP LOCKED, processes them, and flips them to completed. Pass outbox.idempotency_key straight through to Stripe as the Idempotency-Key and you're double-protected against network resends, too. I always shape anything payment-related this way.

Pattern 3: A wrapper when you want it light

Sometimes an outbox is heavier than you need and you just want an existing DB write to be idempotent. A decorator does it quickly.

# idempotent_wrapper.py — make an existing async function idempotent
from __future__ import annotations
import functools
import json
from typing import Awaitable, Callable
import redis.asyncio as redis
 
 
def idempotent(store: redis.Redis, *, ttl_seconds: int = 86400, prefix: str = "idem"):
    """Treat the first arg as the idempotency key and cache the success result."""
    def decorator(func: Callable[..., Awaitable[dict]]):
        @functools.wraps(func)
        async def wrapper(idempotency_key: str, *args, **kwargs) -> dict:
            cache_key = f"{prefix}:{idempotency_key}"
            inflight = f"{cache_key}:inflight"
 
            cached = await store.get(cache_key)
            if cached is not None:
                return {"cached": True, **json.loads(cached)}
 
            # In-flight lock (SET NX EX) prevents concurrent double execution
            locked = await store.set(inflight, "1", nx=True, ex=300)
            if not locked:
                return {"status": "in_progress", "key": idempotency_key}
 
            try:
                result = await func(idempotency_key, *args, **kwargs)
                await store.set(cache_key, json.dumps(result), ex=ttl_seconds)
                return {"cached": False, **result}
            finally:
                # Don't store failures (keep them retryable); always release the lock
                await store.delete(inflight)
        return wrapper
    return decorator

It's light, but it comes with trade-offs. If Redis goes down you can't take the lock and the app jams, so add a Sentinel setup or a fallback to a local dict on error. After the TTL expires, the same key is judged "not cached" and becomes re-runnable, so for payments that must never run twice, drop the TTL and store in a durable DB. And the in-flight lock's TTL must exceed the maximum processing time — too short and a lock expiry starts a concurrent run. I draw the line simply: Redis for plain emails, outbox for payments.

Wiring it into Claude Agent SDK

Drop the three parts into the SDK's @tool and MCP-server tool registration. Hold session_id in a closure and derive the key inside the tool.

# payment_agent.py
from typing import Any
from claude_agent_sdk import (
    ClaudeAgentOptions, tool, create_sdk_mcp_server, query,
)
import asyncpg
from idempotency_key import stable_idempotency_key
from outbox_tool import charge_payment_tool
 
 
def build_payment_tools(session_id: str, conn: asyncpg.Connection):
    @tool(
        "charge_invoice",
        "Charge a customer's invoice. Retries never double-charge.",
        {"invoice_id": str, "customer_id": str, "amount_jpy": int},
    )
    async def charge_invoice(args: dict[str, Any]) -> dict[str, Any]:
        key = stable_idempotency_key(
            session_id=session_id,
            tool_name="charge_invoice",
            logical_args={
                "invoice_id": args["invoice_id"],
                "customer_id": args["customer_id"],
                "amount_jpy": args["amount_jpy"],
            },
        )
        result = await charge_payment_tool(
            conn, key, args["customer_id"], args["amount_jpy"], args["invoice_id"],
        )
        return {"content": [{"type": "text",
                "text": f"charge registered: {result['status']} (key={key[:16]}...)"}]}
 
    return [charge_invoice]
 
 
async def run_payment_agent(session_id: str, instruction: str):
    conn = await asyncpg.connect("postgresql://localhost/myapp")
    try:
        tools = build_payment_tools(session_id, conn)
        server = create_sdk_mcp_server(name="payments", version="1.0.0", tools=tools)
        options = ClaudeAgentOptions(
            mcp_servers={"payments": server},
            allowed_tools=["mcp__payments__charge_invoice"],
            system_prompt="You handle payments. Process invoices as instructed.",
        )
        async for message in query(prompt=instruction, options=options):
            print(message)
    finally:
        await conn.close()

Even if the model emits the same tool twice during a network failure, the outbox's primary-key constraint rejects the duplicate insert. The worker processes one pending row exactly once. The layers of defense each hold independently, so if one tier slips, the whole still stands. On the complementary side — circuit breakers and fallback — Claude API production resilience patterns pairs well with this.

Five things people trip on

A few places I kept getting caught, as lessons.

First, mixing a timestamp into the key. Write hash(f"{invoice_id}|{datetime.now()}") and every retry produces a different key, so idempotency never holds. A timestamp may go in the payload, but never in key generation.

Second, a key TTL that's too short. For a payment where a "resend two hours later" can happen, a one-hour expiry opens a window for double execution. I recommend tiering by the weight of the side effect: at least 30 days for payments, 7 days for important notifications, 24 hours for emails that are fine to resend.

Third, an in-flight lock TTL shorter than the processing time. If the external API takes 30 seconds but the in-flight lock is 10, a lock expiry starts a concurrent run. Aim for three times the maximum processing time, and always release it in finally.

Fourth, nudging retries too casually in the prompt. Tell the model "if it fails, redo the same tool" and it may retry with different arguments. Insert a check instead: "if it fails, look at the current state with a status tool before deciding."

Fifth, logging keys in the clear. If a customer ID or amount can be reverse-engineered from the key, a log leak becomes a PII risk. I keep only the first 32 chars of the SHA256 and mask the original args before logging.

What to measure in production

If you can't see idempotency working, you can't judge it. Have at least these three. The duplicate rate (duplicate_detected / tool_call), which on a spike signals the agent repeating an operation — suspect prompt design or shaky infra. The outbox backlog, the count of rows stuck in pending past some threshold, which catches a stalled worker instantly. The key collision rate, the accident of different operations producing the same key (a bug in your normalization) — a metric that should always be zero.

# metrics.py (Prometheus example)
from prometheus_client import Counter, Gauge
 
tool_calls = Counter("agent_tool_calls_total", "tool calls", ["tool"])
duplicates = Counter("agent_duplicate_detected_total", "duplicates", ["tool"])
outbox_pending = Gauge("outbox_pending_entries", "pending outbox entries")
 
tool_calls.labels(tool="charge_invoice").inc()
if not created:
    duplicates.labels(tool="charge_invoice").inc()

In my own ops I run a two-tier alert: a Slack ping when the duplicate rate tops 3× normal, and PagerDuty when the outbox backlog passes 1,000. One agent showed a steady 5-10% duplicate rate; digging in, its system prompt said "always retry on failure," and the model was retrying for no reason. Measuring idempotency doubles as early detection for design mistakes like that.

The first step

To put this straight into production, start by inventorying just one side-effect tool in an existing agent and replacing it with the outbox. Trying to make every tool idempotent at once outruns your tests and stalls the release. Of payments, email, and inventory, pick the single highest-risk one and apply Pattern 2 there — a realistic one-week scope. For the async picture overall, Claude API webhook async processing and error recovery reads well alongside this.

Idempotency is unglamorous, but it's the base fitness that decides whether an agent survives long in production. Build the layer once and you can add new tools safely on top. The upfront cost always pays for itself.

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.