Articles/Claude Code

⟐ Claude Code/2026-07-03Advanced

Five Minutes of Silence, and Something Retries on Your Behalf — Rethinking Retry Ownership After the Streaming Idle Watchdog Became a Default

Claude Code's streaming idle watchdog is now on by default, quietly adding another retrying layer to your stack. This article inventories the four layers (SDK, wrapper, watchdog, scheduler), computes worst-case attempt amplification, and shows how to collapse retry ownership into a single layer.

claude-code¹²⁴ reliability¹³ retry⁷ automation⁸² production¹⁰⁷ typescript¹¹

✦ Premium Article

The Claude Code release notes for July 1 contained one short line: the streaming idle watchdog is now enabled by default, aborting and retrying any stream that stays silent for five minutes.

It should have been a welcome change. But what came to mind first was not gratitude — it was a cross-section of my own pipeline. The SDK retries. My wrapper retries. The scheduler re-runs failed jobs. And now a layer I never asked for had joined them. The number of retrying layers had quietly grown to four.

More layers are not inherently bad. The trouble is that each layer retries "helpfully" without knowing the others exist. This article walks through counting the places where retries can originate, estimating the worst-case attempt count mechanically, and collapsing responsibility into a single layer — based on how I reorganized my own overnight jobs.

Counting the layers that retry

Start with an inventory. A typical unattended Claude stack has at least four sources of retries.

Layer	What it retries	Default behavior	Why it hides
1. Anthropic SDK	Connection errors, 429s, 5xx	`maxRetries: 2` (up to 3 attempts including the first)	Never appears in your code, so it escapes inventories
2. Your wrapper	Application-level failures	Your own exponential backoff (say, 3 attempts)	Known, but its overlap with the SDK is easy to forget
3. Streaming idle watchdog	Streams silent for 5 minutes	Abort + retry (on by default since 7/1)	Behavior changed with no config change on your side
4. Scheduler	The whole job	Re-run on failure, or re-kick on the next tick	The outermost layer, invisible from inside the job

Notice that only one of these four layers is code you wrote. Layer 1 is an SDK default, layer 3 is a platform default that just changed, layer 4 is operational configuration. Most of your retry design lives outside your own repository.

The watchdog's exact retry count and interval are assumptions worth verifying against the release notes and observed behavior for your version. Defaults can change without warning — as this one just did — which is why they belong in the assumptions log described below.

Worst cases multiply, they do not add

Retries across layers compose by multiplication, not addition: each attempt of an outer layer can consume every attempt of the inner layers.

SDK: 3 attempts
Wrapper: 3 attempts
Watchdog: 2 attempts (assuming 1 retry)
Scheduler: 2 attempts

The worst case for this configuration is 3 × 3 × 2 × 2 = 36 attempts. A task you wrote as "one call" can hit the API 36 times on a bad night.

Translate that into money. A task with roughly 12,000 input and 3,000 output tokens at Sonnet 5's introductory pricing ($2/$10 per MTok) costs about $0.054 per attempt. Thirty-six attempts is about $1.94. Run 90 such tasks overnight and the theoretical worst case is about $175 in a single night. The budget you built around cheap unit prices evaporates through amplification.

Configuration	Worst-case attempts	Worst-case cost per task	90-task overnight batch
All four layers at defaults	36	≈ $1.94	≈ $175
Single-owner (4 attempts, below)	4	≈ $0.22	≈ $19

The interaction with 429s is even more serious. A 429 is a response to overload; if four layers each dutifully retry it, you become an amplifier applying 36× pressure on an already congested night. Honoring server guidance — the approach I described in the Retry-After backoff strategy notes — only works once exactly one layer is doing the retrying.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦A TypeScript calculator that enumerates SDK, wrapper, watchdog, and scheduler layers and estimates worst-case attempt counts and wall-clock time by multiplication

✦A single-owner retry pattern with concrete steps to stop SDK retries with maxRetries 0 and grant retries only to the outermost layer that holds an idempotency key

✦An operational checklist that logs platform defaults as run-time assumptions at job start, so retry amplification is caught before an incident instead of after

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Putting numbers on the amplification in TypeScript

Here is a small calculator: declare your layer configuration and it mechanically reports worst-case attempts and wall-clock time. Running it once before wiring a job lets you judge risk with numbers instead of intuition.

// retry-budget.ts — estimate the worst case from a layer configuration
type RetryLayer = {
  name: string;
  attempts: number; // max attempts including the first
  perAttemptTimeoutSec: number; // what this layer allows a single attempt
  backoffSec: (retryIndex: number) => number; // wait before the i-th retry
};
 
// Order the array outermost -> innermost
const layers: RetryLayer[] = [
  { name: "scheduler", attempts: 2, perAttemptTimeoutSec: 3600, backoffSec: () => 300 },
  { name: "watchdog", attempts: 2, perAttemptTimeoutSec: 900, backoffSec: () => 0 },
  { name: "wrapper", attempts: 3, perAttemptTimeoutSec: 300, backoffSec: (i) => 10 * 2 ** i },
  { name: "sdk", attempts: 3, perAttemptTimeoutSec: 120, backoffSec: (i) => 1 + i },
];
 
export function worstCaseAttempts(ls: RetryLayer[]): number {
  return ls.reduce((acc, l) => acc * l.attempts, 1);
}
 
export function worstCaseWallClockSec(ls: RetryLayer[]): number {
  // Fold from the inside out: one outer attempt = min(inner total, own timeout)
  return ls.reduceRight((innerCost, layer) => {
    const perAttempt =
      innerCost > 0 ? Math.min(innerCost, layer.perAttemptTimeoutSec) : layer.perAttemptTimeoutSec;
    let total = 0;
    for (let i = 0; i < layer.attempts; i++) {
      total += perAttempt;
      if (i < layer.attempts - 1) total += layer.backoffSec(i);
    }
    return total;
  }, 0);
}
 
console.log(worstCaseAttempts(layers)); // => 36
console.log((worstCaseWallClockSec(layers) / 60).toFixed(1), "min"); // => ~128.0 min

When I first wrote my own stack into this shape, I discovered the worst-case wall clock exceeded two hours. A task designed to finish in 15 minutes could, during an incident, stretch long enough to collide with the next scheduled cycle. Attempt amplification breaks not just your cost model but your concurrency assumptions.

Collapsing responsibility — the single-owner retry

The principle is simple: only the layer that understands "one unit of business work" is allowed to retry.

A "unit of business work" is the granularity at which you can issue an idempotency key and verify evidence of completion. Inner layers do not know that unit. To the SDK, a failure is a failed HTTP request; it cannot judge whether "tonight's task already wrote its artifact." So the inner layers fail fast and escalate the decision outward.

Three steps:

Silence the inside: construct the SDK client with maxRetries: 0 and an explicit, short timeout
Pick one owner: write the retry loop only in the outermost layer that can hold an idempotency key (in practice, either your wrapper or your scheduler — not both)
Check completion before retrying: before redoing work, always ask whether the previous attempt actually succeeded

import Anthropic from "@anthropic-ai/sdk";
 
// 1) Stop retries on the inside
const client = new Anthropic({ maxRetries: 0, timeout: 120_000 });
 
// 2) This function is the only retry owner
export async function withSingleOwnerRetry<T>(
  taskId: string, // idempotency key, e.g. "site-a/2026-07-03/article-x"
  fn: () => Promise<T>,
): Promise<T> {
  const MAX_ATTEMPTS = 4;
  for (let attempt = 1; attempt <= MAX_ATTEMPTS; attempt++) {
    // 3) Before redoing anything, check whether the last attempt completed
    const done = await loadCompletionRecord(taskId);
    if (done) return done.result as T;
 
    try {
      const result = await fn();
      await saveCompletionRecord(taskId, result); // persist evidence first
      return result;
    } catch (err) {
      if (!isRetryable(err) || attempt === MAX_ATTEMPTS) throw err;
      const retryAfter = retryAfterSecFrom(err); // for 429s, server guidance wins
      const backoff = retryAfter ?? Math.min(60, 5 * 2 ** attempt) + Math.random() * 3;
      await new Promise((r) => setTimeout(r, backoff * 1000));
    }
  }
  throw new Error("unreachable");
}

The completion-record technique itself continues the thinking from the idempotent tools and outbox notes for the Agent SDK. What this article adds is the restriction: designate exactly one owner and explicitly disable everyone else.

One caution about the watchdog. Its abort-and-retry regenerates from scratch, which makes it another retry layer, not a resume. That is a different role from resume-style implementations that detect a silent stall and continue from the text already received (my field notes on detecting silent stalls and resuming mid-stream). If you keep the default watchdog, the practical move is to subtract attempts from your own wrapper so the total stays constant.

An assumptions log — catching default changes before the incident

The lesson that mattered most in practice, and that no release note will tell you: platform defaults are premises of your design, yet their changes never show up in your repository. No code review or diff will surface them.

So my jobs now write out "the assumptions this run depends on" at the very top, every time.

const RUNTIME_ASSUMPTIONS = {
  "anthropic-sdk.maxRetries": 0,
  "claude-code.streaming-idle-watchdog": "default-on / 300s idle -> abort+retry",
  "scheduler.rerunsOnFailure": 1,
  "retry.owner": "wrapper.withSingleOwnerRetry",
} as const;
 
console.log(`[assumptions] ${JSON.stringify(RUNTIME_ASSUMPTIONS)}`);

Four unglamorous lines, but their value appears after an incident: you can reconstruct, from logs alone, which night's run operated under which premises. When a premise changed — July 1, in this case — the boundary shows up mechanically in the sequence of logs.

As a checklist:

Enumerate every layer that can retry, feed them to worstCaseAttempts, and look at the number (a value like 36 means the design needs revisiting)
Explicitly disable retries in every non-owner layer; for layers you cannot disable (watchdog, scheduler), record them in the assumptions log
When release notes mention a default change, update the corresponding assumption line before the next run

Where to place ownership — recommendations by execution mode

Where the owner lives depends on how the code runs. My own split:

Execution mode	Retry owner	Other layers
Interactive CLI use	The human (you)	SDK defaults are fine; a person is the outermost layer, so nothing amplifies
Headless batch in your own script	Wrapper (with idempotency keys)	SDK at `maxRetries: 0`; watchdog goes into the assumptions log
Scheduled tasks	The scheduler's re-run	Fail fast inside the job; always check completion at the start of a re-run

As an indie developer running generation jobs for several sites overnight, I once watched a failed night job nearly write the same artifact twice before morning. The investigation showed my in-job retry and the scheduler's re-run overlapping, each unaware of the other. Since then, every new job starts with one written line: who is allowed to redo this work. Once the owner is fixed, a newcomer like the watchdog costs only one extra line in the assumptions log.

Wrapping up — look at the number before tonight's run

There is exactly one next action: list every layer in your stack that can retry, feed the list to worstCaseAttempts, and look at the number. If it matches your intent, you are done. If something like 36 appears, pick one owner and silence the rest. That alone brings the cost and concurrency estimates for your worst night back to reality.

If you run overnight jobs of your own, I hope this gives you a thread to start pulling on.

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.