●MODEL — Claude Sonnet 5 becomes the default across all plans, with stronger planning, tool use, and autonomy●PRICE — Sonnet 5 launches at $2 input / $10 output per million tokens through August 31●MODEL — Sonnet 5 nears Opus 4.8 performance at a lower price for always-on agents●CODE — Claude Code adopts Sonnet 5 as default with a native 1M-token context window●CODE — Claude Code adds sandbox credential blocking and org-level model restrictions●CLOUD — Claude is generally available in Microsoft Foundry on Azure with Azure-native access●MODEL — Claude Sonnet 5 becomes the default across all plans, with stronger planning, tool use, and autonomy●PRICE — Sonnet 5 launches at $2 input / $10 output per million tokens through August 31●MODEL — Sonnet 5 nears Opus 4.8 performance at a lower price for always-on agents●CODE — Claude Code adopts Sonnet 5 as default with a native 1M-token context window●CODE — Claude Code adds sandbox credential blocking and org-level model restrictions●CLOUD — Claude is generally available in Microsoft Foundry on Azure with Azure-native access
When 41 of 20,000 Message Batches Requests Quietly Vanished — Field Notes on Reconciling and Requeuing Partial Failures
processing_status: ended does not mean every request succeeded. How errored and expired results hide inside a finished batch, and how a custom_id ledger catches every gap and requeues safely — with real cost and timing numbers.
I watched the batch flip to ended, closed the laptop, and slept well. The problem didn't surface until three days later.
Of the 20,000 requests I had submitted to the Message Batches API for a tag-reclassification job, only 19,959 rows had landed in my aggregation table. Forty-one were simply gone — no exception, no log line, nothing. As an indie developer running several technical blogs in parallel, I lean on batches constantly for bulk metadata work, and this was the first time I had met a failure this quiet.
The cause turned out to be my own assumption, not an API bug. processing_status: "ended" only means the box finished processing. Whether each item inside the box succeeded is a separate question the API expects you to ask.
These notes cover where the counts leak, a custom_id ledger that makes every gap visible, and a requeue design that never double-processes — with the numbers I measured along the way.
"ended" Is Not "All Succeeded" — the Four Ways a Request Finishes
Every request in a batch carries its own result.type. There are four outcomes.
result.type
Meaning
Typical trigger
Correct response
succeeded
Completed normally; contains a message
—
Aggregate as usual
errored
Individual request failed; carries an error object
The crucial part: a batch reaches ended normally even when errored and expired results are mixed in. Nothing throws. The SDK doesn't warn you. If your consumer loop only picks up successes, the failures disappear without a witness.
My 41 missing items broke down as 28 errored (19 overload-class, 9 invalid_request) and 13 expired. The nine invalid_request failures were empty article bodies I had passed through unfiltered — failures that no retry will ever fix. Requeue everything indiscriminately and those nine fail forever, on your budget.
Build the Ledger First — Designing a custom_id Manifest
Reconciliation requires a record of what you sent that lives outside the API. Without it there is nothing to diff against.
At submission time, write each custom_id and a hash of its input to a JSONL manifest.
import anthropicimport hashlibimport jsonfrom pathlib import Pathclient = anthropic.Anthropic() # ANTHROPIC_API_KEY from the environmentMANIFEST = Path("batch_manifest.jsonl")def build_requests(items: list[dict]) -> list[dict]: """items: [{"id": "article-0001", "text": "..."}]""" requests = [] with MANIFEST.open("a", encoding="utf-8") as mf: for item in items: # attempt=1 embedded in the custom_id (bumped to 2, 3 on requeue) custom_id = f"{item['id']}__a1" requests.append({ "custom_id": custom_id, "params": { "model": "claude-sonnet-5", "max_tokens": 300, "messages": [{ "role": "user", "content": f"Return three fitting tags for this article as a JSON array:\n\n{item['text']}" }], }, }) mf.write(json.dumps({ "custom_id": custom_id, "source_id": item["id"], "input_sha": hashlib.sha256(item["text"].encode()).hexdigest()[:16], "attempt": 1, }) + "\n") return requestsbatch = client.messages.batches.create(requests=build_requests(items))print(f"batch_id={batch.id} submitted={len(items)}")
The __a1 suffix is the load-bearing detail. Anthropic rejects duplicate custom_ids within one batch, but does nothing about duplicates across batches. Resubmit a failed item under its original ID and you can no longer tell which response belongs to which attempt when you aggregate. With the attempt number in the ID, the ledger always resolves "which attempt is current for this source_id" unambiguously.
After being burned by those empty bodies, I also added a gate at the top of build_requests — if not item["text"].strip(): continue — and log what it rejects. The cheapest place to fix an invalid_request is before it enters the batch.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦A reconciliation harness that diffs your custom_id ledger against the results stream and mechanically surfaces succeeded / errored / canceled / expired counts
✦An attempt-numbered custom_id scheme that prevents double-processing on requeue, plus a decision table separating retryable failures from permanent ones
✦Measured costs stacking the Sonnet 5 introductory pricing with the 50% batch discount, and practical numbers for polling intervals and the 24-hour window
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Poll with Exponential Backoff — the 24-Hour Window in Practice
It's tempting to poll every five seconds. For large batches that's wasted traffic. In my logs, the same 20,000-request batch (≈900 input tokens each) finished in 62 minutes on a quiet day and 4 hours 11 minutes on a busy Monday afternoon. Treat completion time as unknowable and widen the interval as you wait.
import timedef wait_for_batch(batch_id: str, base: float = 30.0, cap: float = 600.0) -> None: interval = base while True: status = client.messages.batches.retrieve(batch_id) counts = status.request_counts print(f"processing={counts.processing} succeeded={counts.succeeded} " f"errored={counts.errored} expired={counts.expired}") if status.processing_status == "ended": return time.sleep(interval) interval = min(interval * 1.5, cap) # 30s → 45s → ... → 10 min max
request_counts accumulates errored and expired counts while the batch is still running. Watch them mid-flight and you catch the leak during processing, not three days later. I page myself when errored crosses 1% of submitted.
Reconcile — Diff the Ledger Against the Results Stream
Once ended, classify every result into four buckets and diff against the manifest.
from collections import defaultdictdef reconcile(batch_id: str) -> dict: # 1. What we believe we submitted expected = {} with MANIFEST.open(encoding="utf-8") as mf: for line in mf: rec = json.loads(line) expected[rec["custom_id"]] = rec buckets = defaultdict(list) seen = set() # 2. Classify the results stream for result in client.messages.batches.results(batch_id): seen.add(result.custom_id) rtype = result.result.type if rtype == "succeeded": buckets["succeeded"].append(result) elif rtype == "errored": etype = result.result.error.error.type key = "retryable" if etype in ( "overloaded_error", "api_error", "rate_limit_error" ) else "permanent" buckets[key].append(result) else: # canceled / expired buckets["retryable"].append(result) # 3. IDs in the ledger that never appeared in results (should be zero — verify it) missing = [cid for cid in expected if cid not in seen] print(f"succeeded={len(buckets['succeeded'])} " f"retryable={len(buckets['retryable'])} " f"permanent={len(buckets['permanent'])} missing={len(missing)}") return {"buckets": buckets, "missing": missing, "expected": expected}
The dividing line is the error type. overloaded_error, api_error, and rate_limit_error heal with time; invalid_request_error means the request itself is broken and will fail identically every time. Treating both as one generic "error" was my original mistake.
One more trap worth naming: the results stream is not guaranteed to preserve submission order. Join results to inputs by index and your data corrupts silently the first time the order shifts. Always join on custom_id. Also note that results remain downloadable for 29 days from batch creation — relevant if your aggregation runs monthly.
Requeue — Bump the Attempt, Cap the Attempts
Only the retryable bucket goes back out, in a fresh batch, under attempt-bumped IDs.
MAX_ATTEMPTS = 3def requeue(recon: dict, items_by_id: dict) -> list[dict]: retry_requests = [] with MANIFEST.open("a", encoding="utf-8") as mf: for result in recon["buckets"]["retryable"]: rec = recon["expected"][result.custom_id] next_attempt = rec["attempt"] + 1 if next_attempt > MAX_ATTEMPTS: print(f"give up: {rec['source_id']} (attempt {rec['attempt']})") continue new_cid = f"{rec['source_id']}__a{next_attempt}" src = items_by_id[rec["source_id"]] retry_requests.append({ "custom_id": new_cid, "params": { # rebuild params from source data, not from the result object "model": "claude-sonnet-5", "max_tokens": 300, "messages": [{"role": "user", "content": f"Return three fitting tags for this article as a JSON array:\n\n{src['text']}"}], }, }) mf.write(json.dumps({"custom_id": new_cid, "source_id": rec["source_id"], "input_sha": rec["input_sha"], "attempt": next_attempt}) + "\n") return retry_requests
Always set a ceiling (three here). An uncapped auto-requeue plus one misclassified permanent failure equals a machine that converts your budget into identical errors. Items that exhaust their attempts stay in the ledger for a human to read the next morning. In the 20,000-request run, 31 of 32 retryable items succeeded on the second attempt; one needed a third.
Protect the aggregation side too. Because multiple attempts of the same source_id can theoretically both succeed, my aggregation query keeps only the highest-attempt success per source_id — double-counting becomes structurally impossible.
Claude Sonnet 5, released June 30, 2026, carries introductory pricing of $2 input / $10 output per million tokens through August 31, 2026 ($3 / $15 after). Batches take a further 50% off, so during the introductory window the effective rate is $1 input / $5 output.
For the 20,000-request run (≈900 input, ≈250 output tokens each):
Path
Input 18M tok
Output 5M tok
Total
Real-time API (intro pricing)
$36.00
$50.00
$86.00
Batch (intro × 50%)
$18.00
$25.00
$43.00
Requeue overhead (32 items)
$0.03
$0.04
$0.07
The whole reconcile-and-requeue apparatus costs rounding error. The hour I spent manually investigating 41 missing rows cost far more. Since moving to submit-at-night, read-one-report-in-the-morning, batches stopped being a submit-and-pray exercise.
A batch tops out at 100,000 requests or 256MB, but I split jobs at 20,000–30,000 anyway. Expirations cluster in the tail of large batches, and smaller batches keep both the blast radius and the requeue small.
Make Reconciliation a Road You Always Travel
A safety net you can skip is a safety net you will skip. My cron pipeline runs reconciliation unconditionally as its final step and logs one summary line.
Submission appends to the manifest (submission and ledger write treated as one transaction)
After ended, reconcile and always log the four counts: succeeded / retryable / permanent / missing
Alert a human if missing > 0 or permanent > 0.5% of submitted
Auto-requeue retryables with the attempt cap at 3
With those four lines of process, discovering a gap went from "a vague unease three days later" to "one log line the same day."
The Next Step
Open your existing batch consumer and look for the spot where it picks up successes without branching on result.type. That is the single most valuable thing to fix today. The ledger and reconciliation above drop into most pipelines in under an hour.
If you run unattended batch jobs like I do, I hope these notes save you the three confusing days they cost me.
Share
Thank You for Reading
Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.