CLAUDE LABJP
MODEL — Claude Sonnet 5 becomes the default across all plans, with stronger planning, tool use, and autonomyPRICE — Sonnet 5 launches at $2 input / $10 output per million tokens through August 31MODEL — Sonnet 5 nears Opus 4.8 performance at a lower price for always-on agentsCODE — Claude Code adopts Sonnet 5 as default with a native 1M-token context windowCODE — Claude Code adds sandbox credential blocking and org-level model restrictionsCLOUD — Claude is generally available in Microsoft Foundry on Azure with Azure-native accessMODEL — Claude Sonnet 5 becomes the default across all plans, with stronger planning, tool use, and autonomyPRICE — Sonnet 5 launches at $2 input / $10 output per million tokens through August 31MODEL — Sonnet 5 nears Opus 4.8 performance at a lower price for always-on agentsCODE — Claude Code adopts Sonnet 5 as default with a native 1M-token context windowCODE — Claude Code adds sandbox credential blocking and org-level model restrictionsCLOUD — Claude is generally available in Microsoft Foundry on Azure with Azure-native access
Articles/API & SDK
API & SDK/2026-07-02Advanced

When 41 of 20,000 Message Batches Requests Quietly Vanished — Field Notes on Reconciling and Requeuing Partial Failures

processing_status: ended does not mean every request succeeded. How errored and expired results hide inside a finished batch, and how a custom_id ledger catches every gap and requeues safely — with real cost and timing numbers.

Claude API100Message Batchesbatch processing4error handling3production operations

Premium Article

I watched the batch flip to ended, closed the laptop, and slept well. The problem didn't surface until three days later.

Of the 20,000 requests I had submitted to the Message Batches API for a tag-reclassification job, only 19,959 rows had landed in my aggregation table. Forty-one were simply gone — no exception, no log line, nothing. As an indie developer running several technical blogs in parallel, I lean on batches constantly for bulk metadata work, and this was the first time I had met a failure this quiet.

The cause turned out to be my own assumption, not an API bug. processing_status: "ended" only means the box finished processing. Whether each item inside the box succeeded is a separate question the API expects you to ask.

These notes cover where the counts leak, a custom_id ledger that makes every gap visible, and a requeue design that never double-processes — with the numbers I measured along the way.

"ended" Is Not "All Succeeded" — the Four Ways a Request Finishes

Every request in a batch carries its own result.type. There are four outcomes.

result.typeMeaningTypical triggerCorrect response
succeededCompleted normally; contains a messageAggregate as usual
erroredIndividual request failed; carries an error objectinvalid_request (bad params), api_error, overloadedBranch on error type (below)
canceledCaught in a batch cancellationUnprocessed items at manual cancelRequeue
expiredMissed the 24-hour processing windowLarge batches in busy periodsRequeue

The crucial part: a batch reaches ended normally even when errored and expired results are mixed in. Nothing throws. The SDK doesn't warn you. If your consumer loop only picks up successes, the failures disappear without a witness.

My 41 missing items broke down as 28 errored (19 overload-class, 9 invalid_request) and 13 expired. The nine invalid_request failures were empty article bodies I had passed through unfiltered — failures that no retry will ever fix. Requeue everything indiscriminately and those nine fail forever, on your budget.

Build the Ledger First — Designing a custom_id Manifest

Reconciliation requires a record of what you sent that lives outside the API. Without it there is nothing to diff against.

At submission time, write each custom_id and a hash of its input to a JSONL manifest.

import anthropic
import hashlib
import json
from pathlib import Path
 
client = anthropic.Anthropic()  # ANTHROPIC_API_KEY from the environment
 
MANIFEST = Path("batch_manifest.jsonl")
 
def build_requests(items: list[dict]) -> list[dict]:
    """items: [{"id": "article-0001", "text": "..."}]"""
    requests = []
    with MANIFEST.open("a", encoding="utf-8") as mf:
        for item in items:
            # attempt=1 embedded in the custom_id (bumped to 2, 3 on requeue)
            custom_id = f"{item['id']}__a1"
            requests.append({
                "custom_id": custom_id,
                "params": {
                    "model": "claude-sonnet-5",
                    "max_tokens": 300,
                    "messages": [{
                        "role": "user",
                        "content": f"Return three fitting tags for this article as a JSON array:\n\n{item['text']}"
                    }],
                },
            })
            mf.write(json.dumps({
                "custom_id": custom_id,
                "source_id": item["id"],
                "input_sha": hashlib.sha256(item["text"].encode()).hexdigest()[:16],
                "attempt": 1,
            }) + "\n")
    return requests
 
batch = client.messages.batches.create(requests=build_requests(items))
print(f"batch_id={batch.id} submitted={len(items)}")

The __a1 suffix is the load-bearing detail. Anthropic rejects duplicate custom_ids within one batch, but does nothing about duplicates across batches. Resubmit a failed item under its original ID and you can no longer tell which response belongs to which attempt when you aggregate. With the attempt number in the ID, the ledger always resolves "which attempt is current for this source_id" unambiguously.

After being burned by those empty bodies, I also added a gate at the top of build_requestsif not item["text"].strip(): continue — and log what it rejects. The cheapest place to fix an invalid_request is before it enters the batch.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
A reconciliation harness that diffs your custom_id ledger against the results stream and mechanically surfaces succeeded / errored / canceled / expired counts
An attempt-numbered custom_id scheme that prevents double-processing on requeue, plus a decision table separating retryable failures from permanent ones
Measured costs stacking the Sonnet 5 introductory pricing with the 50% batch discount, and practical numbers for polling intervals and the 24-hour window
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API & SDK2026-06-28
Measure Streaming CPU and Dropped Chunks to Stabilize Long Batch Jobs
You start an overnight batch, and by morning only half of it finished. The culprits were CPU pinned during streaming and a quiet connection drop. Here is a monitor wrapper that measures stream CPU and throughput, and resumes from interruptions.
API & SDK2026-06-27
Designing the Give-Up Condition in Self-Repair Loops: Four Error Classes, Four Retry Budgets
LLM self-repair loops break on the fantasy that 'if you keep fixing, it eventually passes.' Classify errors into four classes, give each its own retry budget. Working TypeScript and real cost numbers included.
API & SDK2026-07-02
Introductory Pricing Has an End Date — Effective-Dated Cost Forecasts for the Sonnet 5 Price Step
Claude Sonnet 5's introductory $2/$10 pricing ends on 2026-08-31 and reverts to $3/$15. A static price map will quietly understate your September forecast by a third. Here is an effective-dated price table and forecast design that absorbs the step.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →