Measuring a Week of Headless Usage the Night Before the Billing Change

Starting tomorrow (June 15), headless claude -p, the Claude Agent SDK, and GitHub Actions runs move to a separate pool of monthly credits, no longer drawn from your subscription limits. The thing that tripped me up was simple: I had nothing on hand to decide whether the new plan would be enough. Would Pro's monthly credits cover me, or did I need Max? I genuinely couldn't tell by feel. As an indie developer juggling several unattended jobs, I wanted that answer in numbers, not vibes. A single day isn't enough for a precise forecast, but I at least wanted a number for roughly how many tokens my unattended runs burn. So I spent this past week logging it.

The short version: only after measuring did I learn that my nightly batches were heavier than I assumed, while the one-off daytime runs were basically rounding error. Picking a plan is far easier once you can see that breakdown.

Why a single "total tokens" number misleads you

When billing comes up, it's tempting to look only at the grand total for the month. But once you move to monthly credits, what matters is less the total and more which jobs are eating the credits. Two months with the same total behave completely differently if one is "a few heavy runs" and the other is "many light runs" — the place you'd cut is not the same.

Headless runs in particular hide cost in output tokens and in prompt-cache reads and writes. Claude's usage object separates input_tokens / output_tokens from cache_creation_input_tokens and cache_read_input_tokens, and there's an asymmetry: cache reads are cheap, cache creation is pricey. If you only watch the total, you miss that structure entirely, so I decided to record everything split by job and by token type.

Wrap each run and append usage one line at a time

What I did is almost embarrassingly plain: wrap each headless run in a thin shim and append one JSONL line of usage when it finishes. I swapped the Agent SDK jobs over to call through this wrapper.

// usage-logger.mjs
import { appendFileSync } from "node:fs";
import { query } from "@anthropic-ai/claude-agent-sdk";
 
const LOG_PATH = process.env.USAGE_LOG ?? "./headless-usage.jsonl";
 
// job: a label for what this run is ("nightly-build", etc.)
export async function runWithUsage(job, prompt, options = {}) {
  const startedAt = new Date().toISOString();
  let usage = null;
 
  for await (const message of query({ prompt, options })) {
    // The SDK emits a final result message that carries usage
    if (message.type === "result" && message.usage) {
      usage = message.usage;
    }
  }
 
  if (usage) {
    const row = {
      job,
      startedAt,
      finishedAt: new Date().toISOString(),
      input: usage.input_tokens ?? 0,
      output: usage.output_tokens ?? 0,
      cacheCreate: usage.cache_creation_input_tokens ?? 0,
      cacheRead: usage.cache_read_input_tokens ?? 0,
    };
    appendFileSync(LOG_PATH, JSON.stringify(row) + "\n");
  }
  return usage;
}

Two things matter here. First, always attach a job label — without it you can't split the breakdown later. I just used my real job names (nightly build, integrity check, article generation). Second, capture the usage from the result message. Usage-like figures stream by mid-flight, but the settled value lives in the final result. Summing the intermediate values double-counts, so record only the last one.

I use JSONL (one record per line) because appending is safe and hard to corrupt. Even if several cron jobs write at once, line-oriented writes rarely interleave, and the aggregation step can read it line by line. A CSV that loses a column mid-file becomes unreadable; with JSONL you drop the one broken line and the rest survives.

Fold a week down by job

After a week, sum by job and estimate the cost. Pricing is per MTok, so you divide token counts by a million and multiply (the rates below are a rough example using API input $10 / output $50; your actual charge follows your plan's credit conversion).

// summarize-usage.mjs
import { readFileSync } from "node:fs";
 
const PRICE = { input: 10, output: 50, cacheCreate: 12.5, cacheRead: 1 }; // $/MTok (example)
 
const rows = readFileSync(process.env.USAGE_LOG ?? "./headless-usage.jsonl", "utf8")
  .split("\n")
  .filter(Boolean)
  .map((line) => JSON.parse(line));
 
const byJob = {};
for (const r of rows) {
  const j = (byJob[r.job] ??= { runs: 0, input: 0, output: 0, cacheCreate: 0, cacheRead: 0 });
  j.runs += 1;
  j.input += r.input;
  j.output += r.output;
  j.cacheCreate += r.cacheCreate;
  j.cacheRead += r.cacheRead;
}
 
const cost = (t) =>
  (t.input * PRICE.input +
    t.output * PRICE.output +
    t.cacheCreate * PRICE.cacheCreate +
    t.cacheRead * PRICE.cacheRead) /
  1_000_000;
 
const table = Object.entries(byJob)
  .map(([job, t]) => ({ job, runs: t.runs, weekUSD: +cost(t).toFixed(2), monthUSD: +(cost(t) * 30 / 7).toFixed(2) }))
  .sort((a, b) => b.monthUSD - a.monthUSD);
 
console.table(table);
console.log("Estimated monthly total $", table.reduce((s, r) => s + r.monthUSD, 0).toFixed(2));

monthUSD is just "the week's actuals scaled by 30/7." It isn't rigorous, but for lining up against a plan limit to judge "fits / doesn't fit," it was plenty. Sending the output to console.table lays every job out in a row, so the order in which you'd trim them is right there.

What only showed up once I measured

In my case, the heavy batch that runs overnight accounted for most of the monthly figure, while the daytime one-offs were rounding error even summed together. I'd assumed the frequent daytime runs were the cost driver, so this was the exact opposite of my hunch. It's the weight per run, not the frequency, that dominates — obvious in hindsight, but seeing it in my own numbers was the payoff.

The other surprise was that for one job the cacheRead token count was several times the input. That job feeds the same large preamble (repo conventions, a template) every run, and the cache kept it cheap. Which is also a warning: rewrite it in a way that defeats the cache and this column spikes. If you're going to touch prompt structure during the migration, keep one eye on this number.

What I decided with the time left

A week isn't much, but once the breakdown was in numbers the decision was anticlimactic. I trimmed the heavy nightly batch down to two jobs, left the frequent light runs alone for now to stay inside the monthly-credit envelope, and paired it with a three-tier Claude Code fallbackModel so an overloaded morning doesn't stall the run. The plan mechanics themselves are in how the June 15 Claude Code billing change affects headless runs, and the method for reading ahead from early-month data is in forecasting Claude API token cost from the first three days.

There isn't much you can do the night before a billing change, but try dropping usage-logger.mjs over your runs and capturing even a single batch tonight. One logged line turns the plan conversation from "probably fine" into "this job costs $X a month." Thanks for reading.