Claude API Batch Processing — Reduce API Costs by Up to 90% with Asynchronous Batch Implementation

Do you need to process large volumes of text or run extensive data analysis using Claude API?

Traditional real-time API calls incur per-request costs that multiply quickly at scale. The solution: Claude API's batch processing feature allows you to submit hundreds or thousands of requests asynchronously and process results in bulk—saving up to 90% on API costs.

Batch Processing vs Real-Time API Calls

Real-Time API Calls

Execute immediately when triggered
Results return within seconds
Higher per-request cost structure
Best for interactive applications

Batch Processing

Submit multiple requests in a single file
Results available within minutes to hours
Up to 90% cost reduction per request
Ideal for scheduled, non-urgent tasks

Use batch processing when result delivery can be delayed, but cost efficiency matters.

Real-World Use Cases for Batch Processing

1. Large-Scale Content Analysis

Example: Analyze 1,000 customer reviews
Real-time: 1,000 × $0.003 = $3.00
Batch: 1,000 × $0.0003 = $0.30 (90% savings)

2. Scheduled Data Processing

Run nightly analysis of thousands of logs, daily sentiment analysis of social media posts, or weekly report generation. Perfect for jobs that don't require immediate results.

3. Academic Research & Paper Analysis

Students and researchers can combine batch processing with the academic plan to analyze hundreds of papers, datasets, or experiment results at minimal cost.

4. Multi-Language Translation

Translate 500+ documents into multiple languages in a single batch, maintaining quality while maximizing cost efficiency.

Implementing Batch Processing: Step-by-Step

Step 1: Understanding the JSONL Format

Batch API expects requests in JSONL format (JSON Lines: one JSON object per line).

{"custom_id": "req-1", "params": {"model": "claude-3-5-sonnet-20241022", "max_tokens": 100, "messages": [{"role": "user", "content": "Explain batch processing briefly"}]}}
{"custom_id": "req-2", "params": {"model": "claude-3-5-sonnet-20241022", "max_tokens": 100, "messages": [{"role": "user", "content": "Write a short story about API design"}]}}

Key fields:

custom_id: Unique identifier for each request (you choose the naming)
params: Standard Claude API message parameters

Step 2: Generate JSONL File (Node.js Example)

const fs = require('fs');
 
// Sample requests to process
const requests = [
  { id: 'review-1', text: 'Analyze sentiment: "Great product, but shipping was slow"' },
  { id: 'review-2', text: 'Analyze: "Price is fair, quality exceeds expectations"' },
  { id: 'review-3', text: 'Analyze: "Best purchase I\'ve made this year\!"' },
];
 
// Convert to JSONL
const jsonl = requests
  .map(req => JSON.stringify({
    custom_id: req.id,
    params: {
      model: 'claude-3-5-sonnet-20241022',
      max_tokens: 150,
      messages: [{
        role: 'user',
        content: req.text
      }]
    }
  }))
  .join('\n');
 
// Save to file
fs.writeFileSync('batch_requests.jsonl', jsonl);
console.log('✅ JSONL file created successfully');

Step 3: Submit Batch Using Anthropic SDK

const Anthropic = require('@anthropic-ai/sdk');
const fs = require('fs');
 
const client = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY
});
 
async function submitBatch() {
  // Read JSONL file
  const fileContent = fs.readFileSync('batch_requests.jsonl');
  
  // Upload file to Anthropic
  const response = await client.beta.files.upload(
    { file: new File([fileContent], 'batch_requests.jsonl') },
    { headers: { 'anthropic-beta': 'files-api-2025-04-14' } }
  );
  
  const fileId = response.id;
  console.log(`📤 File uploaded: ${fileId}`);
  
  // Create and submit batch
  const batch = await client.beta.batches.create(
    { input_file_id: fileId },
    { headers: { 'anthropic-beta': 'files-api-2025-04-14' } }
  );
  
  console.log(`📋 Batch created: ${batch.id}`);
  console.log(`⏳ Status: ${batch.processing_status}`);
  
  return batch.id;
}
 
submitBatch().catch(console.error);

Step 4: Retrieve and Process Results

async function checkBatchStatus(batchId) {
  const batch = await client.beta.batches.retrieve(batchId, {
    headers: { 'anthropic-beta': 'files-api-2025-04-14' }
  });
  
  console.log(`📊 Status: ${batch.processing_status}`);
  console.log(`✅ Succeeded: ${batch.request_counts.succeeded}`);
  console.log(`❌ Failed: ${batch.request_counts.failed}`);
  
  // Still processing
  if (batch.processing_status === 'in_progress') {
    console.log('⏳ Still processing. Check again in a few minutes.');
    return;
  }
  
  // Retrieve results
  if (batch.result_file_id) {
    const results = await client.beta.files.retrieveContent(batch.result_file_id, {
      headers: { 'anthropic-beta': 'files-api-2025-04-14' }
    });
    console.log('✅ Results downloaded');
  }
}
 
// Check status by batch ID
checkBatchStatus('batch_xxxxx').catch(console.error);

Optimization Techniques

1. Optimize Request Granularity

// ❌ Low efficiency: 1 request = 1 review
{ custom_id: 'review-1', params: { messages: [{ role: 'user', content: 'Analyze review A' }] } }
 
// ✅ Better efficiency: 1 request = 10 reviews
{ custom_id: 'batch-1-10', params: { messages: [{ role: 'user', content: 'Analyze these reviews:\nA: ...\nB: ...' }] } }

Batch multiple items in a single request to reduce total API calls.

2. Test with Small Batches First

// Test with 10 requests, verify results, then scale to 1,000
const testBatch = requests.slice(0, 10);
// Submit test → Review results → If OK, submit full batch

3. Pre-Validate Request Format

Catch formatting errors, token limit violations, and other issues before submitting large batches.

Looking back and Next Steps

Claude API batch processing enables:

90% cost reduction on compatible workloads
Efficient handling of massive request volumes
Freedom from rate limit concerns

Batch processing excels for non-urgent, periodic tasks—daily data analysis, scheduled content generation, academic research, and bulk translations.

To go deeper, check out the official Anthropic documentation on Batch APIs, or see the Claude API Cost Optimization Guide for broader efficiency strategies.

Start small with test batches, validate results, then scale confidently to production workloads!

It started with captioning a few thousand images

While organizing assets for a wallpaper app, I had a few thousand images waiting to be sorted. I could send each one to Claude individually to get a caption and a category — but I didn't need real-time answers. If a job ran overnight and the results were ready by morning, that was plenty.

This kind of "not urgent, but high volume" work is exactly what the Claude API Message Batches feature is built for. You submit requests in bulk and they're processed asynchronously, and the input and output tokens cost 50% less than standard API calls. In my case, queuing everything at night meant the full set of results was waiting for me the next morning.

Rather than just covering the API surface, this article focuses on the things that actually tripped me up when I ran a real batch of meaningful size — and how I got around them.

Start with the smallest possible run

I always confirm the shape with two or three requests first. The custom_id is the key you'll use later to map results back to your source data, so always give it a meaningful value. I used the image filenames directly.

import anthropic
 
client = anthropic.Anthropic()
 
batch = client.messages.batches.create(
    requests=[
        {
            "custom_id": "wallpaper_0001.jpg",
            "params": {
                "model": "claude-haiku-4-5-20251001",
                "max_tokens": 512,
                "messages": [
                    {"role": "user", "content": "Classify the mood of this wallpaper in one word: a faint ring of light in the night sky"}
                ],
            },
        },
        {
            "custom_id": "wallpaper_0002.jpg",
            "params": {
                "model": "claude-haiku-4-5-20251001",
                "max_tokens": 512,
                "messages": [
                    {"role": "user", "content": "Classify the mood of this wallpaper in one word: a mountain range in morning mist"}
                ],
            },
        },
    ]
)
 
print(f"Batch ID: {batch.id}")

For a simple task like classification, there's no need to reach for a top-tier model — Haiku was more than enough. Combined with the batch discount, that drops the cost another notch.

Check status — but don't poll too aggressively

Right after creation the batch is still processing. Results can come back in a few minutes, or, when things are busy, on the order of hours. Hammering the endpoint every few seconds is wasteful, so I wait a few minutes first and then widen the interval.

import time
 
while True:
    batch = client.messages.batches.retrieve(batch.id)
    if batch.processing_status == "ended":
        break
    counts = batch.request_counts
    print(f"Processing... ok {counts.succeeded} / err {counts.errored} / in-flight {counts.processing}")
    time.sleep(60)
 
print("Batch ended")

What you're checking is not "did everything succeed" but "has processing reached ended." Even once it's ended, the contents are a mix of successes and failures. Confuse the two and you'll silently drop the failed ones.

Result order isn't guaranteed — which is why custom_id matters

This is what caught me first. Results don't necessarily come back in the order you submitted them. The safe approach is to stream them one at a time and map each back to your source data by custom_id.

results = {}
 
for item in client.messages.batches.results(batch.id):
    cid = item.custom_id
    if item.result.type == "succeeded":
        results[cid] = item.result.message.content[0].text
    elif item.result.type == "errored":
        # Don't swallow it — push onto a retry list
        print(f"Failed: {cid} -> {item.result.error}")
        results[cid] = None
    elif item.result.type == "expired":
        # Not processed within 24 hours
        print(f"Expired: {cid}")
        results[cid] = None
 
# Since custom_id was the filename, results map straight back to the source image
ok = sum(1 for v in results.values() if v is not None)
print(f"Retrieved {ok} / {len(results)}")

Besides succeeded, result.type can be errored, expired, or canceled. expired means processing didn't finish within 24 hours. In production, collect just the failed and expired custom_ids and re-submit them as a smaller batch — that way nothing slips through.

Split the work when there's a lot of it

A single batch can hold up to 10,000 requests. Rather than packing each one to the limit, I submitted a few thousand at a time. The reason is simple: when you need to re-run part of it, a smaller rollback unit is easier to handle. I chunked by the numeric part of the filename and logged a batch ID per chunk, so I could see at a glance how far things had gotten.

def chunk(items, size=2000):
    for i in range(0, len(items), size):
        yield items[i:i + size]
 
for n, group in enumerate(chunk(all_requests)):
    b = client.messages.batches.create(requests=group)
    print(f"chunk {n}: {b.id} ({len(group)} requests)")

How to think about cost

The batch discount applies to both input and output tokens. What actually moved the needle for me was stacking three things.

First, pushing routine tasks like classification and summarization onto Haiku. Second, trimming max_tokens down to what's truly needed — if all you want back is a one-word category, 512 is plenty and there's no reason to set it higher. Third, routing every non-urgent job through batches. The more carefully I identified work that could give up real-time responses, the quieter the end-of-month bill became.

Conversely, pushing work that needs to respond instantly to a user's action into a batch will hurt the experience. My own line is to reserve batches for "work no one is waiting on."

My grandfather, a temple carpenter, would measure all his timber together before cutting it in one focused pass. Lining up the prep and then running it through beats cutting one piece at a time, ad hoc — it ends up cleaner and faster. Building batch pipelines gives me that same feeling.

As a next step, pick just one piece of "not urgent" work you already have and confirm the round trip with a two- or three-request batch. Once your custom_id lines up with a key in your source data, you can scale the count up from there.