●MODEL — Export controls on Claude Fable 5 are lifted, restoring global access starting July 1●MODEL — Fable 5 is available across the Claude Platform, Claude.ai, Claude Code, and Cowork●SCIENCE — Claude Science offers up to $30,000 in credits for research projects; apply by July 15●CODE — Claude Code weekly limits are raised by 50% through July 13●CODE — Dynamic workflows enter research preview with parallel, verified end-to-end task handling●CODE — A self-hosted gateway brings SSO, policy enforcement, and per-user cost attribution●MODEL — Export controls on Claude Fable 5 are lifted, restoring global access starting July 1●MODEL — Fable 5 is available across the Claude Platform, Claude.ai, Claude Code, and Cowork●SCIENCE — Claude Science offers up to $30,000 in credits for research projects; apply by July 15●CODE — Claude Code weekly limits are raised by 50% through July 13●CODE — Dynamic workflows enter research preview with parallel, verified end-to-end task handling●CODE — A self-hosted gateway brings SSO, policy enforcement, and per-user cost attribution
Five Minutes of Silence, and Something Retries on Your Behalf — Rethinking Retry Ownership After the Streaming Idle Watchdog Became a Default
Claude Code's streaming idle watchdog is now on by default, quietly adding another retrying layer to your stack. This article inventories the four layers (SDK, wrapper, watchdog, scheduler), computes worst-case attempt amplification, and shows how to collapse retry ownership into a single layer.
The Claude Code release notes for July 1 contained one short line: the streaming idle watchdog is now enabled by default, aborting and retrying any stream that stays silent for five minutes.
It should have been a welcome change. But what came to mind first was not gratitude — it was a cross-section of my own pipeline. The SDK retries. My wrapper retries. The scheduler re-runs failed jobs. And now a layer I never asked for had joined them. The number of retrying layers had quietly grown to four.
More layers are not inherently bad. The trouble is that each layer retries "helpfully" without knowing the others exist. This article walks through counting the places where retries can originate, estimating the worst-case attempt count mechanically, and collapsing responsibility into a single layer — based on how I reorganized my own overnight jobs.
Counting the layers that retry
Start with an inventory. A typical unattended Claude stack has at least four sources of retries.
Layer
What it retries
Default behavior
Why it hides
1. Anthropic SDK
Connection errors, 429s, 5xx
maxRetries: 2 (up to 3 attempts including the first)
Never appears in your code, so it escapes inventories
2. Your wrapper
Application-level failures
Your own exponential backoff (say, 3 attempts)
Known, but its overlap with the SDK is easy to forget
3. Streaming idle watchdog
Streams silent for 5 minutes
Abort + retry (on by default since 7/1)
Behavior changed with no config change on your side
4. Scheduler
The whole job
Re-run on failure, or re-kick on the next tick
The outermost layer, invisible from inside the job
Notice that only one of these four layers is code you wrote. Layer 1 is an SDK default, layer 3 is a platform default that just changed, layer 4 is operational configuration. Most of your retry design lives outside your own repository.
The watchdog's exact retry count and interval are assumptions worth verifying against the release notes and observed behavior for your version. Defaults can change without warning — as this one just did — which is why they belong in the assumptions log described below.
Worst cases multiply, they do not add
Retries across layers compose by multiplication, not addition: each attempt of an outer layer can consume every attempt of the inner layers.
SDK: 3 attempts
Wrapper: 3 attempts
Watchdog: 2 attempts (assuming 1 retry)
Scheduler: 2 attempts
The worst case for this configuration is 3 × 3 × 2 × 2 = 36 attempts. A task you wrote as "one call" can hit the API 36 times on a bad night.
Translate that into money. A task with roughly 12,000 input and 3,000 output tokens at Sonnet 5's introductory pricing ($2/$10 per MTok) costs about $0.054 per attempt. Thirty-six attempts is about $1.94. Run 90 such tasks overnight and the theoretical worst case is about $175 in a single night. The budget you built around cheap unit prices evaporates through amplification.
Configuration
Worst-case attempts
Worst-case cost per task
90-task overnight batch
All four layers at defaults
36
≈ $1.94
≈ $175
Single-owner (4 attempts, below)
4
≈ $0.22
≈ $19
The interaction with 429s is even more serious. A 429 is a response to overload; if four layers each dutifully retry it, you become an amplifier applying 36× pressure on an already congested night. Honoring server guidance — the approach I described in the Retry-After backoff strategy notes — only works once exactly one layer is doing the retrying.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦A TypeScript calculator that enumerates SDK, wrapper, watchdog, and scheduler layers and estimates worst-case attempt counts and wall-clock time by multiplication
✦A single-owner retry pattern with concrete steps to stop SDK retries with maxRetries 0 and grant retries only to the outermost layer that holds an idempotency key
✦An operational checklist that logs platform defaults as run-time assumptions at job start, so retry amplification is caught before an incident instead of after
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Putting numbers on the amplification in TypeScript
Here is a small calculator: declare your layer configuration and it mechanically reports worst-case attempts and wall-clock time. Running it once before wiring a job lets you judge risk with numbers instead of intuition.
// retry-budget.ts — estimate the worst case from a layer configurationtype RetryLayer = { name: string; attempts: number; // max attempts including the first perAttemptTimeoutSec: number; // what this layer allows a single attempt backoffSec: (retryIndex: number) => number; // wait before the i-th retry};// Order the array outermost -> innermostconst layers: RetryLayer[] = [ { name: "scheduler", attempts: 2, perAttemptTimeoutSec: 3600, backoffSec: () => 300 }, { name: "watchdog", attempts: 2, perAttemptTimeoutSec: 900, backoffSec: () => 0 }, { name: "wrapper", attempts: 3, perAttemptTimeoutSec: 300, backoffSec: (i) => 10 * 2 ** i }, { name: "sdk", attempts: 3, perAttemptTimeoutSec: 120, backoffSec: (i) => 1 + i },];export function worstCaseAttempts(ls: RetryLayer[]): number { return ls.reduce((acc, l) => acc * l.attempts, 1);}export function worstCaseWallClockSec(ls: RetryLayer[]): number { // Fold from the inside out: one outer attempt = min(inner total, own timeout) return ls.reduceRight((innerCost, layer) => { const perAttempt = innerCost > 0 ? Math.min(innerCost, layer.perAttemptTimeoutSec) : layer.perAttemptTimeoutSec; let total = 0; for (let i = 0; i < layer.attempts; i++) { total += perAttempt; if (i < layer.attempts - 1) total += layer.backoffSec(i); } return total; }, 0);}console.log(worstCaseAttempts(layers)); // => 36console.log((worstCaseWallClockSec(layers) / 60).toFixed(1), "min"); // => ~128.0 min
When I first wrote my own stack into this shape, I discovered the worst-case wall clock exceeded two hours. A task designed to finish in 15 minutes could, during an incident, stretch long enough to collide with the next scheduled cycle. Attempt amplification breaks not just your cost model but your concurrency assumptions.
Collapsing responsibility — the single-owner retry
The principle is simple: only the layer that understands "one unit of business work" is allowed to retry.
A "unit of business work" is the granularity at which you can issue an idempotency key and verify evidence of completion. Inner layers do not know that unit. To the SDK, a failure is a failed HTTP request; it cannot judge whether "tonight's task already wrote its artifact." So the inner layers fail fast and escalate the decision outward.
Three steps:
Silence the inside: construct the SDK client with maxRetries: 0 and an explicit, short timeout
Pick one owner: write the retry loop only in the outermost layer that can hold an idempotency key (in practice, either your wrapper or your scheduler — not both)
Check completion before retrying: before redoing work, always ask whether the previous attempt actually succeeded
import Anthropic from "@anthropic-ai/sdk";// 1) Stop retries on the insideconst client = new Anthropic({ maxRetries: 0, timeout: 120_000 });// 2) This function is the only retry ownerexport async function withSingleOwnerRetry<T>( taskId: string, // idempotency key, e.g. "site-a/2026-07-03/article-x" fn: () => Promise<T>,): Promise<T> { const MAX_ATTEMPTS = 4; for (let attempt = 1; attempt <= MAX_ATTEMPTS; attempt++) { // 3) Before redoing anything, check whether the last attempt completed const done = await loadCompletionRecord(taskId); if (done) return done.result as T; try { const result = await fn(); await saveCompletionRecord(taskId, result); // persist evidence first return result; } catch (err) { if (!isRetryable(err) || attempt === MAX_ATTEMPTS) throw err; const retryAfter = retryAfterSecFrom(err); // for 429s, server guidance wins const backoff = retryAfter ?? Math.min(60, 5 * 2 ** attempt) + Math.random() * 3; await new Promise((r) => setTimeout(r, backoff * 1000)); } } throw new Error("unreachable");}
The completion-record technique itself continues the thinking from the idempotent tools and outbox notes for the Agent SDK. What this article adds is the restriction: designate exactly one owner and explicitly disable everyone else.
One caution about the watchdog. Its abort-and-retry regenerates from scratch, which makes it another retry layer, not a resume. That is a different role from resume-style implementations that detect a silent stall and continue from the text already received (my field notes on detecting silent stalls and resuming mid-stream). If you keep the default watchdog, the practical move is to subtract attempts from your own wrapper so the total stays constant.
An assumptions log — catching default changes before the incident
The lesson that mattered most in practice, and that no release note will tell you: platform defaults are premises of your design, yet their changes never show up in your repository. No code review or diff will surface them.
So my jobs now write out "the assumptions this run depends on" at the very top, every time.
Four unglamorous lines, but their value appears after an incident: you can reconstruct, from logs alone, which night's run operated under which premises. When a premise changed — July 1, in this case — the boundary shows up mechanically in the sequence of logs.
As a checklist:
Enumerate every layer that can retry, feed them to worstCaseAttempts, and look at the number (a value like 36 means the design needs revisiting)
Explicitly disable retries in every non-owner layer; for layers you cannot disable (watchdog, scheduler), record them in the assumptions log
When release notes mention a default change, update the corresponding assumption line before the next run
Where to place ownership — recommendations by execution mode
Where the owner lives depends on how the code runs. My own split:
Execution mode
Retry owner
Other layers
Interactive CLI use
The human (you)
SDK defaults are fine; a person is the outermost layer, so nothing amplifies
Headless batch in your own script
Wrapper (with idempotency keys)
SDK at maxRetries: 0; watchdog goes into the assumptions log
Scheduled tasks
The scheduler's re-run
Fail fast inside the job; always check completion at the start of a re-run
As an indie developer running generation jobs for several sites overnight, I once watched a failed night job nearly write the same artifact twice before morning. The investigation showed my in-job retry and the scheduler's re-run overlapping, each unaware of the other. Since then, every new job starts with one written line: who is allowed to redo this work. Once the owner is fixed, a newcomer like the watchdog costs only one extra line in the assumptions log.
Wrapping up — look at the number before tonight's run
There is exactly one next action: list every layer in your stack that can retry, feed the list to worstCaseAttempts, and look at the number. If it matches your intent, you are done. If something like 36 appears, pick one owner and silence the rest. That alone brings the cost and concurrency estimates for your worst night back to reality.
If you run overnight jobs of your own, I hope this gives you a thread to start pulling on.
Share
Thank You for Reading
Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.