⬡ API & SDK/2026-06-20Advanced

Running Subagents in Parallel Without One Failure Sinking the Whole Run

A fan-out / fan-in design for running several subagents in parallel, covering token budgeting, a result contract, and partial-failure handling. Includes an implementation where one branch can fail without stopping the rest, plus measured numbers.

Claude API⁷⁸ Agent SDK³ Subagents⁶ Parallelism Error Handling⁵

✦ Premium Article

For a long time, as an indie developer running four sites solo at Dolice Labs, I collected nightly article candidates one site at a time. Roughly 40 seconds per site, about two and a half minutes for four in a row. It worked, but on a night where the network dropped midway through the third site, everything after it went down with it. Each morning I opened the log and found it stalled on site three, I was reminded how naive the design was.

There was never any real reason to run them in order. Candidate collection for each site is independent. So fire them in parallel, take results as they arrive, and pick up only the ones that failed afterward. That is fan-out / fan-in. Here I will build out the skeleton, plus the three things that are easy to overlook: budgets, pinning the result shape, and partial failure.

What breaks the moment you stop running serially

Parallelizing itself is not hard. What is hard are the three problems that surface the instant you do.

The first is budget. Serially you could naively count "up to N tokens overall," but in parallel several branches eat tokens at once. Hit the rate limit and every branch starts returning 429 together.

The second is the shape of results. Serially you handled one item at a time, almost by eye; in parallel the return order scatters, and a single piece of malformed JSON quietly collapses the aggregation step.

The third is partial failure. This one matters most. When one of four branches fails, throwing away the work of the other three defeats the point of parallelizing at all.

The fan-out / fan-in skeleton

First, define one worker: a plain function that takes a single site and returns a candidate list.

import Anthropic from "@anthropic-ai/sdk";
 
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
 
type Site = { id: string; domain: string; maxTokens: number };
 
async function collectCandidates(site: Site): Promise<string> {
  const res = await client.messages.create({
    model: "claude-sonnet-4-6",
    max_tokens: site.maxTokens,
    system: "You are a technical blog editor. Return only a JSON array.",
    messages: [
      { role: "user", content: `Five article candidates for ${site.domain}, as a JSON array. Each element is {title, angle}.` },
    ],
  });
  const block = res.content.find((b) => b.type === "text");
  return block && block.type === "text" ? block.text : "[]";
}

The fan-out side launches this worker against every site at once and waits with Promise.allSettled. Choosing allSettled over Promise.all is the key. The latter rejects the whole set if even one branch rejects; the former returns every result while keeping success and failure distinct.

const sites: Site[] = [
  { id: "cl", domain: "claudelab.net", maxTokens: 1024 },
  { id: "gl", domain: "gemilab.net", maxTokens: 1024 },
  { id: "ag", domain: "antigravitylab.net", maxTokens: 1024 },
  { id: "rl", domain: "rorklab.net", maxTokens: 1024 },
];
 
const settled = await Promise.allSettled(
  sites.map((s) => collectCandidates(s))
);

This alone shrinks the wait from the serial sum down to roughly the slowest single branch. But as written, malformed responses still pass as successes. The next two sections tighten that.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦A fan-out / fan-in implementation built on Promise.allSettled, and how to size per-branch token budgets

✦A result contract that pins down what each child returns with zod, so the parent safely rejects malformed responses

✦A decision table that routes partial failures to retry, dead-letter, or skip, with measured speedups over serial

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Cut a token budget per branch

The first accident in parallelizing is usually contention over budget. The account-wide rate limit is a shared resource, so the more branches you add, the less room each one gets. In my case, pinning max_tokens on the branch side and capping concurrency made things stable.

Here is the allocation I use as a rough guide.

Concurrent branches	max_tokens per branch	Felt stability
2	2048	Stable, plenty of headroom
4	1024	Everyday range; 429 almost never
8	512	Limit is close; watch it

To cap concurrency itself, slot in a lightweight semaphore. It is small enough to write without pulling in a library.

function pLimit(concurrency: number) {
  let active = 0;
  const queue: (() => void)[] = [];
  const next = () => {
    active--;
    if (queue.length > 0) queue.shift()!();
  };
  return async function <T>(fn: () => Promise<T>): Promise<T> {
    if (active >= concurrency) await new Promise<void>((r) => queue.push(r));
    active++;
    try {
      return await fn();
    } finally {
      next();
    }
  };
}
 
const limit = pLimit(4);
const settled = await Promise.allSettled(
  sites.map((s) => limit(() => collectCandidates(s)))
);

Grow to eight branches and at most four ever run at once; the rest wait their turn. Treating the rate limit as a shared resource is the heart of it.

A result contract: pin down what the child returns

Most of what quietly breaks parallel aggregation is variation in the JSON a child returns. One missing key or a non-array, and the downstream for loop throws, dragging your successful branches down too. Insert schema validation on the parent side and isolate broken results explicitly as failures.

import { z } from "zod";
 
const Candidate = z.object({
  title: z.string().min(1),
  angle: z.string().min(1),
});
const CandidateList = z.array(Candidate).min(1);
 
function parseCandidates(raw: string): z.infer<typeof CandidateList> {
  const json = JSON.parse(raw); // let the caller catch the throw
  return CandidateList.parse(json); // throws if the shape is off
}

The point is to validate in the parent's aggregation phase, not inside the child. Keep the child returning plain text, and gather the trust boundary in the parent. Then "which branch was rejected, and why" collects into one log, and isolating causes gets far easier.

One branch can fail without stopping the rest

This is the center of the design. Sort the allSettled results into three buckets: success, schema violation, and transport failure.

type Outcome =
  | { site: string; ok: true; items: z.infer<typeof CandidateList> }
  | { site: string; ok: false; reason: "schema" | "network" | "unknown"; detail: string };
 
const outcomes: Outcome[] = settled.map((r, i) => {
  const site = sites[i].id;
  if (r.status === "rejected") {
    return { site, ok: false, reason: "network", detail: String(r.reason) };
  }
  try {
    return { site, ok: true, items: parseCandidates(r.value) };
  } catch (e) {
    return { site, ok: false, reason: "schema", detail: String(e) };
  }
});
 
const fulfilled = outcomes.filter((o): o is Extract<Outcome, { ok: true }> => o.ok);
const failed = outcomes.filter((o) => !o.ok);
console.log(`ok ${fulfilled.length} / failed ${failed.length}`);

At this point candidates for the three successful sites are already in hand. The one that failed sits isolated in failed, and the run moves forward. The serial-era "site three takes everyone down" can no longer happen structurally.

Where to route a failure

Retrying every failure the same way mixes "failures that heal if you wait," like a rate-limit overrun, with "failures that never heal," like a schema violation, and the pointless retries squeeze the limit further. Routing by reason is safer.

Failure reason	Nature	Destination
network (incl. 429)	Transient; may recover if you wait	Exponential backoff, up to 2 retries
schema	Prompt-driven; immediate retry is wasted	Regenerate once, then dead-letter
unknown	Unclassified	No retry; record to dead-letter

A dead-letter need not be elaborate. In my setup I append the failed site ID, reason, and raw response as a one-line JSON to a log file, and keep a separate morning batch that picks up only those. Rather than trying to recover everything perfectly on the spot, deciding to "lock in what you got and record the misses for later" is, in the end, what breaks least in unattended nightly runs.

How much it actually changed

Here are rough numbers from running the same candidate collection 20 times each, serially and in parallel, on my four-site setup. Environments differ, but it is useful for the trend.

Method	Avg duration	Items kept on 1 failure	429 rate
Serial	~152 s	Lost everything past the failure	Low
Parallel (uncapped)	~44 s	Kept 3/4	Somewhat high
Parallel (cap 4, fixed budget)	~48 s	Kept 3/4	Near zero

What stands out is that the capped row is a few seconds slower than uncapped yet drops the 429 rate to near zero. For a handful of seconds, you erase the time lost to retries and the risk of overrunning the limit. In a nightly batch, a "speed that does not crash" is worth more than peak speed.

Operational notes the docs do not cover

A few small things that only became clear after running this for a few months.

First. Promise.allSettled waits for every branch to finish, so a single pathologically slow branch drags the whole thing. Wrap each worker in an AbortController timeout and fail slow branches into the network bucket yourself, so the slowest single branch cannot hold the whole run hostage.

Second. Emit logs in "the same one-line format for both success and failure." If successes are verbose and failures terse, the formats do not line up exactly when you need failure analysis most.

Third. Keep the concurrency number out of the code and in an environment variable. Rate limits move with your plan and account state. Making it "change one number when the limit changes" saves you from scrambling at midnight. I had this value hardcoded once and missed out on a limit increase for a while.

The next step

Pick just one independent process you currently run serially, and swap it to Promise.allSettled with concurrency two. Fixed budgets and schema validation can wait until that one is stable. Parallelize small, let yourself get used to handling partial failure, then raise the count — that order grows an automated pipeline the most calmly.

If you have been wrestling with the same naivety in a nightly batch, I hope this gives you a thread to pull on for your own design.

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.