⬡ API & SDK/2026-06-24Advanced

What I Decided the Day the Ceiling Doubled: A Headroom Budget for Scheduled Jobs on One Shared API Key

Why I did not compress my intervals when the rate limit doubled, and how to design a headroom budget for running several scheduled jobs on one shared API key, with measurement and working code.

Claude API⁸⁶ rate limits³ scheduled jobs capacity planning operations¹²

✦ Premium Article

When I read that the ceiling had been doubled, my first thought was simple: now I can halve the spacing between jobs. As someone who pushes articles to several sites on a fixed daily schedule, any room to tighten is room to move.

Then I opened my logs, watched them for a few minutes, and stopped. What doubled was the ceiling, not the amount I was actually consuming. Whether tightening is safe depends not on the ceiling but on how much space sits between the floor and that ceiling right now.

This article is about the line I drew for myself that day. Once you treat a rate limit as something to allocate rather than something to spend down, your operations stay remarkably quiet even when the limit moves. For anyone running several scheduled jobs on one shared API key, here is how I measure headroom, hand it out, and decide whether to hold it steady, with the code I actually use.

What changes when you treat headroom as a budget

Conversations about rate limits tend to drift toward what happens after a 429: retries and backoff. That is reactive defense. What I want to cover here is proactive allocation, deciding ahead of time how much to use at steady state.

The two are easy to confuse but are not the same. Cost pacing is about money, "how much will I spend this month," and spending it raises your bill. A headroom budget is about speed, "what fraction of the per-window ceiling will I use," and spending it does not change your bill, but exhausting it stalls the jobs and retries that come after.

I chose the word budget because headroom is a shared resource. Under one shared key, the content-generation job and the Stripe event handler draw from the same ceiling. If one runs up to the edge, the other behaves as if its own limit had quietly dropped. That is exactly why it helps to decide in advance who may use how much.

Measure where you stand from the headers

Before budgeting, you need to know your current consumption. The Claude API returns remaining quota in response headers, so you can start from measurement instead of guesswork.

These are the headers I watch. Note that the requests dimension and the tokens dimension apply independently.

Header	Meaning
anthropic-ratelimit-requests-limit	Request ceiling for the window
anthropic-ratelimit-requests-remaining	Requests left
anthropic-ratelimit-requests-reset	When the request quota recovers (RFC3339)
anthropic-ratelimit-tokens-limit	Token ceiling for the window
anthropic-ratelimit-tokens-remaining	Tokens left
anthropic-ratelimit-tokens-reset	When the token quota recovers
retry-after	On a 429, the seconds to wait

Start by slipping a thin layer in right after every call that records these. It is just a function that pulls the headers from the response and stores them in a structured form.

// ratelimit.ts — read remaining quota from response headers
type RateSnapshot = {
  at: string;          // measurement time (ISO)
  job: string;         // which job
  reqLimit: number;
  reqRemaining: number;
  reqResetSec: number; // seconds until reset
  tokLimit: number;
  tokRemaining: number;
  tokResetSec: number;
};
 
function secUntil(iso: string | null): number {
  if (!iso) return 0;
  const ms = new Date(iso).getTime() - Date.now();
  return Math.max(0, Math.round(ms / 1000));
}
 
export function readSnapshot(job: string, headers: Headers): RateSnapshot {
  const num = (k: string) => Number(headers.get(k) ?? "0");
  return {
    at: new Date().toISOString(),
    job,
    reqLimit: num("anthropic-ratelimit-requests-limit"),
    reqRemaining: num("anthropic-ratelimit-requests-remaining"),
    reqResetSec: secUntil(headers.get("anthropic-ratelimit-requests-reset")),
    tokLimit: num("anthropic-ratelimit-tokens-limit"),
    tokRemaining: num("anthropic-ratelimit-tokens-remaining"),
    tokResetSec: secUntil(headers.get("anthropic-ratelimit-tokens-reset")),
  };
}

How you reach the headers depends on your client, but the reliable path is to call once in a form that hands you the raw response (the withResponse style) and pull the headers from there. The RateSnapshot you get is used by both the accounting and the budget decision below.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦Reading requests and tokens remaining and reset from anthropic-ratelimit-* headers and accounting consumption per job

✦Deciding how much of the ceiling to spend at steady state (around 70%) and how to reserve headroom for retries, manual runs, and bursts

✦Why I held the budget steady even after the limit doubled, and the traps I hit with reset windows, token limits, and 429 attribution

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Attribute consumption per job

The awkward part of a shared key is that the remaining quota in the headers is the total across all jobs. "10% left" does not tell you who used it. So I keep a small separate ledger that accounts consumption by job name.

The idea is plain. On every call, each job adds the tokens it used (available from usage) under its own name. When the window rolls over, you close out that window. That alone makes "who used how much" visible.

// ledger.ts — record per-job consumption per window
type WindowKey = string; // e.g. "2026-06-24T13:00" (rounded to the minute)
 
const ledger = new Map<WindowKey, Map<string, number>>(); // window -> job -> tokens
 
function windowKey(d = new Date()): WindowKey {
  const z = (n: number) => String(n).padStart(2, "0");
  return `${d.getUTCFullYear()}-${z(d.getUTCMonth() + 1)}-${z(d.getUTCDate())}T${z(d.getUTCHours())}:${z(d.getUTCMinutes())}`;
}
 
export function record(job: string, usedTokens: number) {
  const wk = windowKey();
  if (!ledger.has(wk)) ledger.set(wk, new Map());
  const w = ledger.get(wk)!;
  w.set(job, (w.get(job) ?? 0) + usedTokens);
}
 
export function shareOfWindow(job: string): { tokens: number; pct: number } {
  const w = ledger.get(windowKey());
  if (!w) return { tokens: 0, pct: 0 };
  const total = [...w.values()].reduce((a, b) => a + b, 0);
  const mine = w.get(job) ?? 0;
  return { tokens: mine, pct: total ? Math.round((mine / total) * 100) : 0 };
}

In production you would write this to KV or a small table keyed by window rather than a Map. My jobs run as separate processes, so I keep the ledger in shared storage and run the closing aggregation as its own job. If you leave it in memory while your jobs run in different processes, the ledger fragments per process and the totals lose their meaning, so watch for that.

How to size the headroom budget

This is the heart of it. Once accounting shows who uses how much, you decide who may use how much. I do not hand out the ceiling directly. I split it into three shares first.

Steady-state share: what the scheduled jobs may use normally. I aim for 70% of the ceiling.
Retry and burst share: slack to absorb retries of transient failures and unexpected concurrency. About 15%.
Manual share: room for the calls I make by hand during the day. The remaining 15%.

As a formula, the per-window ceiling you may hand to each scheduled job looks like this.

// budget.ts — compute the headroom budget
type BudgetInput = {
  tokLimit: number;      // per-window token ceiling from the headers
  steadyRatio: number;   // share for steady state (e.g. 0.70)
  jobWeights: Record<string, number>; // split among scheduled jobs
};
 
export function perJobBudget(input: BudgetInput): Record<string, number> {
  const steady = Math.floor(input.tokLimit * input.steadyRatio);
  const totalWeight = Object.values(input.jobWeights).reduce((a, b) => a + b, 0);
  const out: Record<string, number> = {};
  for (const [job, w] of Object.entries(input.jobWeights)) {
    out[job] = Math.floor((steady * w) / totalWeight);
  }
  return out;
}
 
// Example: steady share is 70% of the ceiling, split across 3 jobs by weight
const budgets = perJobBudget({
  tokLimit: 400_000,
  steadyRatio: 0.70,
  jobWeights: { contentGen: 3, stripeSync: 1, integrityCheck: 1 },
});
// contentGen gets the larger budget

Before running, each job compares its budget against its consumption this window from the ledger. If it would exceed the budget, it skips this window and defers to the next. This is the first brake, and it engages before backoff ever does.

// guard.ts — check the budget before running
import { shareOfWindow } from "./ledger";
 
export function shouldRun(job: string, budgetTokens: number, estTokens: number): boolean {
  const used = shareOfWindow(job).tokens;
  return used + estTokens <= budgetTokens;
}

For estTokens, the job's recent average consumption is plenty. I use the median of the last 20 runs. I use the median rather than the mean so that the occasional huge input does not drag the budget around.

Why I held the budget steady after the ceiling doubled

Back to the decision from the opening. When the ceiling doubled, I could have re-pinned the 70% steady share to "70% of the new ceiling," which would double my spendable amount. Tightening was on the table.

I held it because what was stalling my operations was never the ceiling. The ledger showed that even at steady-state peaks I was using only around 40% of the old limit. The cause of stalls was not speed; it was the rare coincidence of a retry and a manual run firing at once. Raising the ceiling does nothing to that coincidence.

What helps here is redirecting the freed ceiling into the retry and burst share instead of into consumption. Under the new limit, recomposing the split from 70/15/15 to 60/25/15 leaves steady consumption unchanged while thickening only the slack that absorbs collisions.

In my case, that recomposition alone made daytime 429s nearly disappear. I spent the increase on running quieter, not faster. When you are an indie developer carrying operations alone, one fewer alert in the middle of the night feels worth more than a throughput number.

As a rule of thumb, I think of it this way. If your steady consumption was above half of the old limit, the increase is worth spending on tightening. If it was below half, you will run more stably by routing the increase into headroom.

Traps I hit in production

Running this as designed, I tripped at a few seams between the headers and the budget. Let me share them first.

Reset is not the window length. requests-reset and tokens-reset point to "when the quota next recovers," not the boundary of a fixed-length window. Call continuously and the reset time keeps sliding forward. I first mistook this for a fixed window and skewed when I closed the ledger. Round the ledger window to a minute span you choose yourself, and treat the header reset separately as a hint for "how long until there is room."

The token limit hits first. Even with requests to spare, you can cap out on tokens, which is routine for jobs handling long text. Hold the budget in tokens, not request counts. I originally budgeted by request count and got confused when the remaining graph seemed to lie.

Do not misattribute the 429. When a shared key returns a 429, another job may have caused it. If you penalize only the job that received the 429, innocent jobs get punished repeatedly. Follow retry-after faithfully, but identify the cause by looking at the top consumers of that window in the ledger.

Input and output token limits stand apart. Depending on the plan, input and output tokens carry separate ceilings. Make output-tokens-remaining the primary signal for output-heavy jobs and input-tokens-remaining for input-heavy ones. Watch only one graph and you will stall on the side you were not watching.

The next step

If you run several scheduled jobs on one shared key, start by adding just the single layer that records the headers. Without that measurement, both the budget and the accounting are only guesswork. Collect a week of RateSnapshot rows and you will see, without guessing, where in the ceiling your operations actually run.

With those numbers in hand, decide whether to spend the increase on speed or on headroom. I have come to believe the order should always be that one.

Thank you for reading. If it spares even one nighttime alert for someone else carrying their operations alone, I will be glad.

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.