CLAUDE LABJP
WWDC — WWDC 2026 confirms Siri runs on Google Gemini; third-party handoff to ChatGPT is dropped, and Siri AI won't ship in the EU under the DMA at iOS 27BILLING — 6 days until the Jun 15 change: Agent SDK, headless Claude Code, GitHub Actions, and third-party agents move to API-rate monthly creditOUTAGE — claude.ai, Claude Code, and Cowork saw an outage (Jun). Scheduled runs are safest when built around fallbackModel and retriesDYNAMIC-WORKFLOWS — Dynamic workflows are on by default on Max/Team and the API, for codebase-wide bug hunts and independent verificationULTRACODE — Claude Code's new ultracode setting sits in the effort menu, fixing effort to xhigh while Claude decides when to run a workflowOPUS4.8 — Claude Opus 4.8 is settled in as the default across major plans, with stronger coding, agentic, and reasoning skillsWWDC — WWDC 2026 confirms Siri runs on Google Gemini; third-party handoff to ChatGPT is dropped, and Siri AI won't ship in the EU under the DMA at iOS 27BILLING — 6 days until the Jun 15 change: Agent SDK, headless Claude Code, GitHub Actions, and third-party agents move to API-rate monthly creditOUTAGE — claude.ai, Claude Code, and Cowork saw an outage (Jun). Scheduled runs are safest when built around fallbackModel and retriesDYNAMIC-WORKFLOWS — Dynamic workflows are on by default on Max/Team and the API, for codebase-wide bug hunts and independent verificationULTRACODE — Claude Code's new ultracode setting sits in the effort menu, fixing effort to xhigh while Claude decides when to run a workflowOPUS4.8 — Claude Opus 4.8 is settled in as the default across major plans, with stronger coding, agentic, and reasoning skills
Articles/API & SDK
API & SDK/2026-04-11Intermediate

Claude API Batch Processing — Reduce API Costs by Up to 90% with Asynchronous Batch Implementation

Master Claude API batch processing for efficient large-scale requests. Learn async batch patterns to reduce costs and avoid rate limits.

Claude API99Batch Processing2Async Processing2Cost Optimization5Large-Scale Requests

Do you need to process large volumes of text or run extensive data analysis using Claude API?

Traditional real-time API calls incur per-request costs that multiply quickly at scale. The solution: Claude API's batch processing feature allows you to submit hundreds or thousands of requests asynchronously and process results in bulk—saving up to 90% on API costs.

Batch Processing vs Real-Time API Calls

Real-Time API Calls

  • Execute immediately when triggered
  • Results return within seconds
  • Higher per-request cost structure
  • Best for interactive applications

Batch Processing

  • Submit multiple requests in a single file
  • Results available within minutes to hours
  • Up to 90% cost reduction per request
  • Ideal for scheduled, non-urgent tasks

Use batch processing when result delivery can be delayed, but cost efficiency matters.

Real-World Use Cases for Batch Processing

1. Large-Scale Content Analysis

Example: Analyze 1,000 customer reviews
Real-time: 1,000 × $0.003 = $3.00
Batch: 1,000 × $0.0003 = $0.30 (90% savings)

2. Scheduled Data Processing

Run nightly analysis of thousands of logs, daily sentiment analysis of social media posts, or weekly report generation. Perfect for jobs that don't require immediate results.

3. Academic Research & Paper Analysis

Students and researchers can combine batch processing with the academic plan to analyze hundreds of papers, datasets, or experiment results at minimal cost.

4. Multi-Language Translation

Translate 500+ documents into multiple languages in a single batch, maintaining quality while maximizing cost efficiency.

Implementing Batch Processing: Step-by-Step

Step 1: Understanding the JSONL Format

Batch API expects requests in JSONL format (JSON Lines: one JSON object per line).

{"custom_id": "req-1", "params": {"model": "claude-3-5-sonnet-20241022", "max_tokens": 100, "messages": [{"role": "user", "content": "Explain batch processing briefly"}]}}
{"custom_id": "req-2", "params": {"model": "claude-3-5-sonnet-20241022", "max_tokens": 100, "messages": [{"role": "user", "content": "Write a short story about API design"}]}}

Key fields:

  • custom_id: Unique identifier for each request (you choose the naming)
  • params: Standard Claude API message parameters

Step 2: Generate JSONL File (Node.js Example)

const fs = require('fs');
 
// Sample requests to process
const requests = [
  { id: 'review-1', text: 'Analyze sentiment: "Great product, but shipping was slow"' },
  { id: 'review-2', text: 'Analyze: "Price is fair, quality exceeds expectations"' },
  { id: 'review-3', text: 'Analyze: "Best purchase I\'ve made this year\!"' },
];
 
// Convert to JSONL
const jsonl = requests
  .map(req => JSON.stringify({
    custom_id: req.id,
    params: {
      model: 'claude-3-5-sonnet-20241022',
      max_tokens: 150,
      messages: [{
        role: 'user',
        content: req.text
      }]
    }
  }))
  .join('\n');
 
// Save to file
fs.writeFileSync('batch_requests.jsonl', jsonl);
console.log('✅ JSONL file created successfully');

Step 3: Submit Batch Using Anthropic SDK

const Anthropic = require('@anthropic-ai/sdk');
const fs = require('fs');
 
const client = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY
});
 
async function submitBatch() {
  // Read JSONL file
  const fileContent = fs.readFileSync('batch_requests.jsonl');
  
  // Upload file to Anthropic
  const response = await client.beta.files.upload(
    { file: new File([fileContent], 'batch_requests.jsonl') },
    { headers: { 'anthropic-beta': 'files-api-2025-04-14' } }
  );
  
  const fileId = response.id;
  console.log(`📤 File uploaded: ${fileId}`);
  
  // Create and submit batch
  const batch = await client.beta.batches.create(
    { input_file_id: fileId },
    { headers: { 'anthropic-beta': 'files-api-2025-04-14' } }
  );
  
  console.log(`📋 Batch created: ${batch.id}`);
  console.log(`⏳ Status: ${batch.processing_status}`);
  
  return batch.id;
}
 
submitBatch().catch(console.error);

Step 4: Retrieve and Process Results

async function checkBatchStatus(batchId) {
  const batch = await client.beta.batches.retrieve(batchId, {
    headers: { 'anthropic-beta': 'files-api-2025-04-14' }
  });
  
  console.log(`📊 Status: ${batch.processing_status}`);
  console.log(`✅ Succeeded: ${batch.request_counts.succeeded}`);
  console.log(`❌ Failed: ${batch.request_counts.failed}`);
  
  // Still processing
  if (batch.processing_status === 'in_progress') {
    console.log('⏳ Still processing. Check again in a few minutes.');
    return;
  }
  
  // Retrieve results
  if (batch.result_file_id) {
    const results = await client.beta.files.retrieveContent(batch.result_file_id, {
      headers: { 'anthropic-beta': 'files-api-2025-04-14' }
    });
    console.log('✅ Results downloaded');
  }
}
 
// Check status by batch ID
checkBatchStatus('batch_xxxxx').catch(console.error);

Optimization Techniques

1. Optimize Request Granularity

// ❌ Low efficiency: 1 request = 1 review
{ custom_id: 'review-1', params: { messages: [{ role: 'user', content: 'Analyze review A' }] } }
 
// ✅ Better efficiency: 1 request = 10 reviews
{ custom_id: 'batch-1-10', params: { messages: [{ role: 'user', content: 'Analyze these reviews:\nA: ...\nB: ...' }] } }

Batch multiple items in a single request to reduce total API calls.

2. Test with Small Batches First

// Test with 10 requests, verify results, then scale to 1,000
const testBatch = requests.slice(0, 10);
// Submit test → Review results → If OK, submit full batch

3. Pre-Validate Request Format

Catch formatting errors, token limit violations, and other issues before submitting large batches.

Looking back and Next Steps

Claude API batch processing enables:

  • 90% cost reduction on compatible workloads
  • Efficient handling of massive request volumes
  • Freedom from rate limit concerns

Batch processing excels for non-urgent, periodic tasks—daily data analysis, scheduled content generation, academic research, and bulk translations.

To go deeper, check out the official Anthropic documentation on Batch APIs, or see the Claude API Cost Optimization Guide for broader efficiency strategies.

Start small with test batches, validate results, then scale confidently to production workloads!

It started with captioning a few thousand images

While organizing assets for a wallpaper app, I had a few thousand images waiting to be sorted. I could send each one to Claude individually to get a caption and a category — but I didn't need real-time answers. If a job ran overnight and the results were ready by morning, that was plenty.

This kind of "not urgent, but high volume" work is exactly what the Claude API Message Batches feature is built for. You submit requests in bulk and they're processed asynchronously, and the input and output tokens cost 50% less than standard API calls. In my case, queuing everything at night meant the full set of results was waiting for me the next morning.

Rather than just covering the API surface, this article focuses on the things that actually tripped me up when I ran a real batch of meaningful size — and how I got around them.

Start with the smallest possible run

I always confirm the shape with two or three requests first. The custom_id is the key you'll use later to map results back to your source data, so always give it a meaningful value. I used the image filenames directly.

import anthropic
 
client = anthropic.Anthropic()
 
batch = client.messages.batches.create(
    requests=[
        {
            "custom_id": "wallpaper_0001.jpg",
            "params": {
                "model": "claude-haiku-4-5-20251001",
                "max_tokens": 512,
                "messages": [
                    {"role": "user", "content": "Classify the mood of this wallpaper in one word: a faint ring of light in the night sky"}
                ],
            },
        },
        {
            "custom_id": "wallpaper_0002.jpg",
            "params": {
                "model": "claude-haiku-4-5-20251001",
                "max_tokens": 512,
                "messages": [
                    {"role": "user", "content": "Classify the mood of this wallpaper in one word: a mountain range in morning mist"}
                ],
            },
        },
    ]
)
 
print(f"Batch ID: {batch.id}")

For a simple task like classification, there's no need to reach for a top-tier model — Haiku was more than enough. Combined with the batch discount, that drops the cost another notch.

Check status — but don't poll too aggressively

Right after creation the batch is still processing. Results can come back in a few minutes, or, when things are busy, on the order of hours. Hammering the endpoint every few seconds is wasteful, so I wait a few minutes first and then widen the interval.

import time
 
while True:
    batch = client.messages.batches.retrieve(batch.id)
    if batch.processing_status == "ended":
        break
    counts = batch.request_counts
    print(f"Processing... ok {counts.succeeded} / err {counts.errored} / in-flight {counts.processing}")
    time.sleep(60)
 
print("Batch ended")

What you're checking is not "did everything succeed" but "has processing reached ended." Even once it's ended, the contents are a mix of successes and failures. Confuse the two and you'll silently drop the failed ones.

Result order isn't guaranteed — which is why custom_id matters

This is what caught me first. Results don't necessarily come back in the order you submitted them. The safe approach is to stream them one at a time and map each back to your source data by custom_id.

results = {}
 
for item in client.messages.batches.results(batch.id):
    cid = item.custom_id
    if item.result.type == "succeeded":
        results[cid] = item.result.message.content[0].text
    elif item.result.type == "errored":
        # Don't swallow it — push onto a retry list
        print(f"Failed: {cid} -> {item.result.error}")
        results[cid] = None
    elif item.result.type == "expired":
        # Not processed within 24 hours
        print(f"Expired: {cid}")
        results[cid] = None
 
# Since custom_id was the filename, results map straight back to the source image
ok = sum(1 for v in results.values() if v is not None)
print(f"Retrieved {ok} / {len(results)}")

Besides succeeded, result.type can be errored, expired, or canceled. expired means processing didn't finish within 24 hours. In production, collect just the failed and expired custom_ids and re-submit them as a smaller batch — that way nothing slips through.

Split the work when there's a lot of it

A single batch can hold up to 10,000 requests. Rather than packing each one to the limit, I submitted a few thousand at a time. The reason is simple: when you need to re-run part of it, a smaller rollback unit is easier to handle. I chunked by the numeric part of the filename and logged a batch ID per chunk, so I could see at a glance how far things had gotten.

def chunk(items, size=2000):
    for i in range(0, len(items), size):
        yield items[i:i + size]
 
for n, group in enumerate(chunk(all_requests)):
    b = client.messages.batches.create(requests=group)
    print(f"chunk {n}: {b.id} ({len(group)} requests)")

How to think about cost

The batch discount applies to both input and output tokens. What actually moved the needle for me was stacking three things.

First, pushing routine tasks like classification and summarization onto Haiku. Second, trimming max_tokens down to what's truly needed — if all you want back is a one-word category, 512 is plenty and there's no reason to set it higher. Third, routing every non-urgent job through batches. The more carefully I identified work that could give up real-time responses, the quieter the end-of-month bill became.

Conversely, pushing work that needs to respond instantly to a user's action into a batch will hurt the experience. My own line is to reserve batches for "work no one is waiting on."

My grandfather, a temple carpenter, would measure all his timber together before cutting it in one focused pass. Lining up the prep and then running it through beats cutting one piece at a time, ad hoc — it ends up cleaner and faster. Building batch pipelines gives me that same feeling.

As a next step, pick just one piece of "not urgent" work you already have and confirm the round trip with a two- or three-request batch. Once your custom_id lines up with a key in your source data, you can scale the count up from there.

Share

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

If you found this article helpful, a small tip ($1.50) would mean a lot to us. Your support helps keep this site ad-free and covers server and hosting costs.

Related Articles

API & SDK2026-06-01
Grouping Crashes by Root Cause: A Triage Design Built on the Claude API
Crashlytics 'Issues' often scatter the same root cause across separate entries. After years of running apps with 50M+ cumulative downloads, here is how I use the Claude API to regroup crashes by actual root cause and rank them, with working code and real numbers.
API & SDK2026-05-31
Isolating Poison Messages in a Claude Async Pipeline: A Dead-Letter Queue Implementation Note
How one broken input can stall an entire batch — and how to isolate these 'poison messages' with a Cloudflare Queues dead-letter queue. Covers classifying Claude API failures and safe redrive, all from production experience.
API & SDK2026-04-03
Claude API × Cloudflare AI Gateway: Complete Production Guide — Unified Monitoring, Cost Reduction & Fallback Architecture
Learn how to place Cloudflare AI Gateway in front of Claude API to achieve request visibility, rate control, prompt caching, and automatic fallback strategies — with complete implementation code for production-grade AI systems.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →