Higher Rate Limits Don't Mean Tighter Schedules — Spend the Headroom on 429 Recovery

When the June update raised the rate limits, my first thought was: now I can pack the posting schedule tighter. As an indie developer, I run several sites on my own, with a number of jobs that hit the API at fixed times. If there was more ceiling to work with, surely I could shrink the intervals I'd been carefully spacing out and cram more into the same window.

Here's the conclusion up front: I didn't tighten the intervals. Instead, I took all of that new headroom and routed it into retry budget for when things fail. After running it this way for a couple of weeks, I'm convinced this is the right call for anything that runs unattended. Let me walk through why.

Rate-limit headroom isn't there to be used up

When the ceiling goes up, it's tempting to design as if you'll run right against it. But in automation, the thing that actually hurts you isn't steady-state throughput — it's the moment something jams. A deploy cutover, a brief network wobble, a slow upstream service. When those moments cause jobs to pile up, it doesn't matter how high your limit is: the number of requests running at that exact instant is what trips the 429 (rate exceeded).

If you fill the extra ceiling with steady-state volume, you've left no slack to absorb the wobble when it comes. So I decided to treat the headroom not as "permission to run more things at once" but as inventory for pushing back with retries when something shakes loose.

Rate limits and credit spend are different axes

Conflating these two leads to the wrong design. A rate limit is about throughput — how many calls per unit of time — and exceeding it returns a 429. Monthly credit consumption is about cost — when you burn through your total. The former is a seconds-and-minutes problem; the latter is a days-and-months problem, and they call for completely different remedies.

Spreading credit consumption so you don't exhaust it early in the cycle is a separate topic I covered in Don't Drain Your Non-Rollover Monthly Credits Early or Leave Them on the Table — Designing a Burn-Rate Pacing Scheduler. This article stays purely on the throughput side: how you live with 429s. Even if the limit doubles, if your behavior when you hit a 429 is sloppy, your stability barely improves.

Fix concurrency independent of the limit

The first thing I did was re-lay the schedule so jobs never collide at the same instant — without touching the limit numbers at all. I staggered each task's start time and added jitter (a few minutes of randomness) so that everything doesn't happen to bunch up in the same minute.

Rather than reasoning "in theory I can run N at once" from the limit, it's far easier to read unattended runs when you simply decide "only one or two ever run concurrently." The limit is the safety margin for that decision, not a target to fill.

Route the headroom into retries and backoff

This is the heart of it. Spend the freed-up throughput on exponential backoff retries for when you hit a 429 or a transient error. The key is to respect the retry-after the server hands back — its instruction takes priority over any wait you computed yourself.

// Retry on 429 and transient errors with exponential backoff that honors retry-after
async function callWithBackoff(doRequest, { maxRetries = 5, baseMs = 1000 } = {}) {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    const res = await doRequest();
 
    // Success, or a client error that won't improve on retry — return as-is
    if (res.status !== 429 && res.status < 500) return res;
    if (attempt === maxRetries) return res; // out of budget: return the last result
 
    // If the server sent retry-after, obey it first (seconds)
    const retryAfter = res.headers.get("retry-after");
    const serverWait = retryAfter ? Number(retryAfter) * 1000 : 0;
 
    // Exponential backoff + full jitter (desynchronize concurrent retries)
    const backoff = baseMs * 2 ** attempt;
    const jittered = Math.random() * backoff;
 
    const waitMs = Math.max(serverWait, jittered);
    await new Promise((r) => setTimeout(r, waitMs));
  }
}
 
// Example usage
const res = await callWithBackoff(() =>
  fetch("https://api.anthropic.com/v1/messages", {
    method: "POST",
    headers: {
      "x-api-key": "YOUR_API_KEY",
      "anthropic-version": "2023-06-01",
      "content-type": "application/json",
    },
    body: JSON.stringify({
      model: "claude-opus-4-8",
      max_tokens: 1024,
      messages: [{ role: "user", content: "..." }],
    }),
  })
);

Two things matter here. First, it compares retry-after against its own backoff and takes the longer of the two. If the server is telling you to wait longer and you re-fire early on your own math, you'll just hit another 429. Second is full jitter. When several jobs hit a 429 at once, a fixed wait lines their retry timing right back up, so they all stampede again at the same moment. Spreading the wait across 0 to backoff breaks that synchronization.

Back when the limit was low, adding retries created a vicious cycle where the retries themselves triggered the next overage. Now that there's headroom, those retries run with confidence. That, precisely, is why I left the headroom alone instead of tightening intervals.

Watch retry counts, not limit hits

One operational note to close. After the limit increase, I changed the metric I watch on the dashboard. I used to track the raw number of 429s; now I track how many times a single job retried across my Dolice Labs jobs.

A 429 gets absorbed by backoff, so the occasional one isn't a problem. What's dangerous is retry counts creeping upward over time. That's the sign that steady-state volume is approaching the ceiling — even with a high limit, it suggests too many things are running concurrently. When retry counts start climbing, the move isn't to tighten intervals; it's to spread the start times out even further. That's the rule of thumb I settled on after two weeks.

As you add more unattended jobs, "can it recover on its own when it jams" ends up mattering far more than the limit number itself. If you're juggling several automated jobs the way I am, now — right after the limit went up — is the moment to set aside the urge to pack the schedule and instead revisit your retry and jitter design. The next time the limits wobble, that headroom is what carries you through.