●CODE — Claude Code adds Trusted Devices, verifying a machine before remote admin sessions begin●CODE — CPU use drops about 37% during streaming, keeping long always-on automation steadier●CODE — Fullscreen mouse-click controls, voice dictation fixes, and better Linux voice detection land●AUTH — Static API keys can now be replaced with short-lived, scoped WIF credentials●TEAM — You can tag Claude directly in Slack and delegate tasks while you focus elsewhere●WORKFLOW — Dynamic workflows arrive in research preview, breaking complex work into steps on their own●CODE — Claude Code adds Trusted Devices, verifying a machine before remote admin sessions begin●CODE — CPU use drops about 37% during streaming, keeping long always-on automation steadier●CODE — Fullscreen mouse-click controls, voice dictation fixes, and better Linux voice detection land●AUTH — Static API keys can now be replaced with short-lived, scoped WIF credentials●TEAM — You can tag Claude directly in Slack and delegate tasks while you focus elsewhere●WORKFLOW — Dynamic workflows arrive in research preview, breaking complex work into steps on their own
Did That Post Actually Go Through? Safely Retrying an Interrupted MCP Write Without Double-Executing
When an MCP write tool call is interrupted by a dropped connection, you can't tell whether the server ran it. Here's why naive retries cause double-execution, and a working wrapper that uses idempotency keys and a reconcile read to retry safely — with examples from an unattended pipeline.
One of my unattended publishing jobs once got its connection cut mid-request while posting to X. All I got back was a timeout error, with no way to know whether the post had landed. The log recorded a "failure" — yet a few minutes later the same post was sitting on the timeline. The server had succeeded; only the result never reached me.
If I had naively decided "it failed, so retry," two identical posts would have ended up side by side. As an indie developer automating announcements across several sites, that is an incident that has nothing to do with content quality. To a reader, posting the same thing twice just signals sloppy operations. If you are going to run write-type tool calls unattended, you have to design for this "did it go through?" state head-on.
"Failed" and "uncertain" are not the same thing
There are two kinds of errors. One is a clear rejection — the server says "I will not accept your request." The other is unknown — the connection dropped before any response came back. The first is safe to retry, because you know the server did nothing.
The troublesome one is the second. The request may have reached the server and been processed, or it may have been cut off before arrival. From your side, you cannot tell. This is the well-known problem in distributed systems: the sender can never be certain the receiver executed. Timeouts, connection resets, and mid-stream disconnects all belong in this "uncertain" bucket.
The June 27, 2026 update improved MCP resilience in Claude Code so that partial responses are preserved even when a stream is cut mid-flight. The receiving side is genuinely more robust now. Even so, the uncertainty that remains the moment a write tool call is interrupted — "did the server execute it?" — is not something a more resilient receiver alone can remove. That part lives in your application.
A common implementation trap is error classification. An HTTP 5xx can mean "the server failed to process" or "it processed but only the response was lost," so pushing it straight to failed is dangerous. I treat every ambiguous error as uncertain. Since uncertain operations are settled by reconciliation during recovery, over-classifying as uncertain never causes double-execution — whereas mislabeling a truly uncertain call as failed leads to it immediately. When in doubt, fall to the safe side: uncertain.
Why naive retries cause double-execution
When people write retry logic, they usually think in two states: success or failure, and retry on failure. That design is the breeding ground for double-execution.
If you collapse "uncertain" into "failed," you re-run even the cases that actually succeeded on the server. A post becomes a double post, a charge becomes a double charge, an email becomes a second copy. That is exactly the trap I fell into first: I wrapped my retry logic in a sloppy except Exception: and unconditionally resent inside it. It never reproduced in testing, and the first double post showed up on a night when the production connection got flaky.
The correct approach splits state into three: committed, failed, and uncertain. Only failed is safe to retry directly. For uncertain, you always insert one extra step — "check before you redo."
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦A three-state ledger that treats a dropped connection as 'uncertain' rather than 'failed', and why two states break down
✦A Python wrapper that protects MCP tools without idempotency support, using a correlation token and a reconcile read
✦A table for deciding when retrying is safe and when it isn't, weighed by the cost of a duplicate vs the cost of a miss
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Idempotency key: attach a unique key to each request and have the server guarantee "the same key executes only once." Stripe's Idempotency-Key header is the canonical example. If the server supports it, resending with the same key has an effect exactly once.
Reconcile read: before retrying, ask the server whether the effect already exists. For a post, search for "a post containing my correlation token" — skip if found, resend if not. This does not depend on server-side idempotency.
Idempotency keys are ideal, but most real MCP tools still don't accept one. So you build a two-tier defense: attach a key where the tool supports it, and protect the rest with reconcile reads.
Attach an idempotency key where the tool supports it
First, generate a stable operation_id per logical operation and record it to a local ledger before the call. This ordering is critical: record before, not after. If you record after the call, a crash mid-call leaves an operation that was "sent but never recorded," and you lose the ability to track it during recovery.
import json, os, time, uuidLEDGER = os.path.expanduser("~/.cache/mcp_ledger.jsonl")def _append(rec: dict) -> None: # Persist with fsync — crash recovery depends on durability os.makedirs(os.path.dirname(LEDGER), exist_ok=True) with open(LEDGER, "a", encoding="utf-8") as f: f.write(json.dumps(rec, ensure_ascii=False) + "\n") f.flush() os.fsync(f.fileno())def begin_operation(action: str, args: dict) -> str: op_id = str(uuid.uuid4()) # Write the "pending" record before the call _append({"op_id": op_id, "action": action, "args": args, "status": "pending", "ts": time.time()}) return op_iddef mark(op_id: str, status: str) -> None: _append({"op_id": op_id, "status": status, "ts": time.time()})
On the call side, pass operation_id as the idempotency key in the tool arguments. If the MCP server interprets the key, this alone makes resends safe.
TRANSIENT = (TimeoutError, ConnectionError, ConnectionResetError)def call_with_idempotency(client, action: str, args: dict): op_id = begin_operation(action, args) payload = {**args, "idempotency_key": op_id} try: result = client.call_tool(action, payload) # MCP tool call mark(op_id, "committed") return result except TRANSIENT: # Connection dropped — execution unknown. Do NOT resend here mark(op_id, "uncertain") raise except Exception: # Server rejected clearly — safe to treat as failed mark(op_id, "failed") raise
The point is that the spot where you catch uncertain must not resend. The decision to retry is funneled entirely into the recovery phase described below. If you catch and immediately retry in the same place, you end up "resending while still uncertain," which defeats the purpose of splitting into three states.
Protect tools without idempotency keys via a reconcile read
When an MCP tool ignores idempotency keys, you have no choice but to verify the effect yourself. To do that, embed a correlation token in the very content you write — mix an identifier invisibly into the post body, put op_id in an external-id field, or attach it as metadata. As long as you can later search by that identifier, a reconcile read becomes possible.
def reconcile(client, op_id: str, action: str, args: dict) -> bool: """Check whether this operation's effect exists on the server. True -> it exists (no resend), False -> resend is safe.""" if action == "post_status": # Search your own posts for the correlation token found = client.call_tool("search_own_posts", {"contains": op_id}) return len(found.get("items", [])) > 0 if action == "create_issue": found = client.call_tool("search_issues", {"external_id": op_id}) return len(found.get("items", [])) > 0 # Actions you can't reconcile must surface "can't verify" to the caller raise LookupError(f"no reconcile strategy for action={action}")def recover_uncertain(client) -> None: """Inspect operations that ended 'uncertain' last run and settle them.""" for op_id, action, args in load_uncertain(LEDGER): # Mixing the correlation token into the write content is a prerequisite args = {**args, "correlation_token": op_id} try: if reconcile(client, op_id, action, args): mark(op_id, "committed") # it had gone through else: client.call_tool(action, args) # it hadn't -> resend mark(op_id, "committed") except LookupError: # Don't auto-resend unreconcilable actions; route to human review mark(op_id, "needs_review")
Run this recovery step right at the start of every unattended pipeline run. If an operation ended uncertain the previous night, the next startup always inspects it and settles it without double-execution. In my Dolice Labs publishing flow, after moving the post-push social announcements onto this scheme, the duplicate posts stopped recurring.
What to do with actions you can't reconcile
Not every action is reconcilable. For something like sending email, where you can't search for what you sent, there is no way to verify the effect without an idempotency key. Here you decide by which is worse: the harm of sending twice, or the harm of never sending at all.
Strategy
Requires
Good for
Risk
Idempotency key
Server dedupes by key
Charges, inventory — heavy side effects
Depends on server implementation
Reconcile read
Effect is searchable later
Posts, issue creation, record inserts
Useless if you forget the correlation token
Resend, tolerate dups
Duplicates are cheap or removable
Idempotent aggregates, overwrites
Not OK if the recipient sees the dup
Don't resend
A miss is acceptable
Unverifiable email sends
Occasionally nothing gets sent
Personally, I make reconcile reads the first choice for actions whose side effects are visible to readers (posts, notifications), and I default to not auto-resending anything I can't reconcile. The discomfort of a duplicate arriving does more operational damage than the rare miss, in my judgment. Conversely, for idempotent updates that overwrite on the server side, I resend without hesitation.
Where to start
Pick one write-type MCP tool you have and check whether its effect is searchable afterward. If it is, embed op_id as a correlation token in the written content and add a single recover_uncertain call at startup — that alone prevents most double-executions from dropped connections. If the tool turns out not to be searchable, the starting point is simply to articulate whether it is "an action that is safe to resend."
Share
Thank You for Reading
Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.