⬡ API & SDK/2026-06-15Advanced

On the day the billing change took effect, I added per-stage cost metering to my headless runs

The June 15 billing change moved headless runs and agent delegation onto monthly credits. Here is a thin metering layer that records token usage per stage tag from response.usage and emits a daily cost report, with working code.

claude-agent-sdk⁶ headless⁸ cost-control² metering billing-change observability¹⁰

✦ Premium Article

The move to monthly credits took effect today, and my automated publishing pipeline's headless runs went from "flat inside the subscription" to "every call eats credits." I had already revised which stages run where before the cutover, but the thing I realized on day one was that I had never actually measured how much each stage was spending.

As an indie developer running four sites, I had estimates. But an estimate is an estimate, not a measurement. If I'm going to allocate non-rolling credits across a month, I need a ledger that says "this stage actually burned this much last week," not a guess. So on the first day I slipped in a thin metering layer that records each API call's token usage by stage tag and converts it to a cost figure daily. Here is what's inside it.

This isn't a story about a fancy observability stack. The goal is the cheapest possible path to a state where I can later interrogate cost by stage, without rewriting the calling code.

Why the Console billing screen alone isn't enough

The Anthropic Console usage view shows account-wide and per-API-key consumption. But in my setup, a single API key runs stages with very different characters mixed together: article generation, quality-gate checks, news fetching, translation sync. Opening the Console, I can't tell whether this week's biggest credit sink was generation or a quality gate that kept retrying.

Monthly credits don't roll over. That means if you can't identify in advance which stage will run short at month's end, a low-priority stage can quietly consume a high-priority one's credits without you noticing. At the Console's granularity, this "per-stage contention" is invisible — and that was the real problem.

What I needed was a ledger that records usage with a stage name on every call, so I can later ask "what's the running monthly total for stage=quality-gate." Only the application can build that.

Don't drop tokens from response.usage

The foundation of metering is the usage object included in every response. Here's the first trap I hit: usage is not just two fields for input and output. If you use prompt caching, you actually get back four kinds of tokens.

// The actual shape of usage in the Anthropic SDK response
// (cache_* become non-zero when caching is in play)
type RawUsage = {
  input_tokens: number;                  // non-cached input
  output_tokens: number;                 // generated output
  cache_creation_input_tokens?: number;  // writes to cache (premium-priced)
  cache_read_input_tokens?: number;      // reads from cache (heavily discounted)
};

If you sum only input_tokens + output_tokens, the cache-write and cache-read tokens fall out of the ledger entirely. Cache writes carry a premium over normal input and cache reads are much cheaper — an asymmetric pricing structure — so unless you keep all four separately, your cost conversion won't reconcile. I first aggregated only two fields, and my measured numbers stubbornly refused to match the Console bill. That cost me half a day.

So I put a normalizer up front that always keeps the four kinds in separate fields.

// Fill missing fields with 0 and always produce all four kinds
export function normalizeUsage(raw: Partial<RawUsage> | undefined) {
  return {
    input: raw?.input_tokens ?? 0,
    output: raw?.output_tokens ?? 0,
    cacheWrite: raw?.cache_creation_input_tokens ?? 0,
    cacheRead: raw?.cache_read_input_tokens ?? 0,
  };
}

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦You'll be able to explain where this month's credits actually went, backed by measured per-stage data instead of guesses

✦You can drop in a metering wrapper that captures all four token kinds from response.usage (input, output, cache-write, cache-read) and converts them to cost

✦You'll be able to decide which stages stay headless and which move back to your subscription, justified by real credit consumption

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Keep the rate table out of the logic

Converting to cost needs rates, and rates change. They change when a model is updated, and they change with exchange rates. Hard-coding rates into the logic means touching the aggregation code on every revision, so I separate them as configuration from the start.

The important part here is giving each of the four token kinds its own rate. If you price cache reads at the same rate as input, you'll overstate the cost of cache-heavy stages.

// Rates are held as "currency per million tokens" (per-MTok).
// Replace the values with your contract, the latest price sheet, and FX.
// Keyed by model name so you can hold a separate table per model.
type Rate = { input: number; output: number; cacheWrite: number; cacheRead: number };
 
const RATE_TABLE: Record<string, Rate> = {
  // Example: placeholder values. Swap in your real rates.
  "default": { input: 3.0, output: 15.0, cacheWrite: 3.75, cacheRead: 0.3 },
};
 
function rateFor(model: string): Rate {
  // Fall back to default if there's no exact match
  return RATE_TABLE[model] ?? RATE_TABLE["default"];
}
 
// Convert normalized usage into a cost figure
export function usageToCost(model: string, u: ReturnType<typeof normalizeUsage>) {
  const r = rateFor(model);
  const perMTok = (tokens: number, rate: number) => (tokens / 1_000_000) * rate;
  return (
    perMTok(u.input, r.input) +
    perMTok(u.output, r.output) +
    perMTok(u.cacheWrite, r.cacheWrite) +
    perMTok(u.cacheRead, r.cacheRead)
  );
}

Keying rates by model name means calls that fell back to a different model still aggregate at the correct rate. If you run several models in a fallback chain and this isn't unified, you won't notice after the fact that "the run you assumed was cheap actually ran on the expensive model."

The metering wrapper — drop it in with one line

With the foundation in place, wrap the API call in a thin layer. The goal is to add a record to the ledger by passing only a stage name, without rewriting the calling code.

import Anthropic from "@anthropic-ai/sdk";
 
const client = new Anthropic();
 
// One record in the ledger
type CostRecord = {
  ts: string;     // ISO 8601
  stage: string;  // stage tag ("generate" / "quality-gate" etc.)
  model: string;
  cost: number;
  tokens: ReturnType<typeof normalizeUsage>;
};
 
// Keep the destination swappable (see below)
export interface CostSink {
  write(rec: CostRecord): Promise<void>;
}
 
// A wrapper that simply wraps messages.create
export async function meteredCreate(
  sink: CostSink,
  stage: string,
  params: Anthropic.MessageCreateParamsNonStreaming,
) {
  const res = await client.messages.create(params);
  const usage = normalizeUsage(res.usage);
  const cost = usageToCost(params.model, usage);
  // So a recording failure never stops the real work, the sink swallows errors
  await sink.write({
    ts: new Date().toISOString(),
    stage,
    model: params.model,
    cost,
    tokens: usage,
  });
  return res;
}

The call site changes like this:

// before:
// const res = await client.messages.create({ model, max_tokens, messages });
 
// after: just add a stage
const res = await meteredCreate(sink, "generate", { model, max_tokens, messages });

The stage tags can be coarse. At first I tried to tag finely per function and gave up. Granularity down to "the unit where credit contention happens" is enough. In my case that settled into four: generate / quality-gate / news-fetch / translate.

One operational caution: it defeats the purpose if a failure in sink.write drags the actual generation down with it. Keep recording best-effort, swallow write exceptions inside, and just log them. A missing ledger row doesn't hurt; a stalled article generation does.

Where to record — start with one append-only file per day

I kept CostSink abstract so the destination can be swapped later. But you don't need a sophisticated backend on day one. I started with "append to one JSON Lines file per day." Aggregation can then be handled later with cat and a small script.

import { appendFile } from "node:fs/promises";
 
// A sink that appends JSON Lines (one record per line) to a dated file
export class JsonlFileSink implements CostSink {
  constructor(private dir: string) {}
  async write(rec: CostRecord): Promise<void> {
    const day = rec.ts.slice(0, 10); // YYYY-MM-DD
    try {
      await appendFile(`${this.dir}/cost-${day}.jsonl`, JSON.stringify(rec) + "\n");
    } catch (e) {
      // Don't stop the real work on a metering failure
      console.warn("[cost-meter] write failed:", (e as Error).message);
    }
  }
}

JSON Lines keeps each line independent, so if the process dies mid-append the existing lines stay intact. A single array-JSON file would become unparseable if interrupted, and you'd lose the whole ledger. The plainness of append-only is, here, what makes it robust.

When you want to move to KV or Durable Objects, you only reimplement CostSink — the call sites stay untouched. If you want to go as far as hard-stopping at a budget cap, pairing this ledger with the stop strategies in my budget circuit breaker design for the Claude API connects "measure, then stop when exceeded" into one line.

Daily report — per-stage cost at a glance

Once the ledger accumulates, fold a day's worth into something readable per stage. What matters for monthly-credit allocation is per-stage total cost, call count, and the cache-read ratio. A stage with a low cache-read ratio is a sign there's still room to revisit how caching is applied.

import { readFile } from "node:fs/promises";
 
type StageAgg = { cost: number; calls: number; cacheReadTokens: number; totalInputTokens: number };
 
export async function dailyReport(dir: string, day: string) {
  const lines = (await readFile(`${dir}/cost-${day}.jsonl`, "utf8"))
    .split("\n").filter(Boolean);
 
  const byStage = new Map<string, StageAgg>();
  for (const line of lines) {
    const r: CostRecord = JSON.parse(line);
    const a = byStage.get(r.stage) ?? { cost: 0, calls: 0, cacheReadTokens: 0, totalInputTokens: 0 };
    a.cost += r.cost;
    a.calls += 1;
    a.cacheReadTokens += r.tokens.cacheRead;
    a.totalInputTokens += r.tokens.input + r.tokens.cacheRead + r.tokens.cacheWrite;
    byStage.set(r.stage, a);
  }
 
  // Sort by cost, highest first
  const rows = [...byStage.entries()]
    .map(([stage, a]) => ({
      stage,
      cost: Math.round(a.cost * 10000) / 10000,
      calls: a.calls,
      // Share of input that came from cache reads (higher means caching is working)
      cacheReadRatio: a.totalInputTokens
        ? Math.round((a.cacheReadTokens / a.totalInputTokens) * 100)
        : 0,
    }))
    .sort((x, y) => y.cost - x.cost);
 
  return { day, total: rows.reduce((s, r) => s + r.cost, 0), rows };
}

When I actually ran the first day's report, the stages that diverged from my estimate stood out clearly. In my case, quality-gate retries ate more credits than I expected, not generation. Each time a rejected article was regenerated, both generation and the gate ran twice. Staring at the Console total, I'd probably never have spotted this skew. Only after splitting by stage tag did the obvious truth — stages with more retries have worse credit efficiency — show up as my own number.

Don't let metering itself become a load

Finally, a few things I watch so the metering itself doesn't add cost or latency. Aggregating tokens and converting to cost is plain arithmetic, so it's negligible against an API call. The heavy part is I/O to the destination, so for high-frequency stages, decide up front whether to buffer records in memory and flush every few dozen, or whether a per-call file append is fine. My workload is at most a few dozen to a couple hundred calls a day, so a plain append hasn't troubled me.

One more: never write prompt bodies into the ledger. Record only token counts, model name, stage tag, and cost. Keeping bodies bloats the ledger and becomes a place where sensitive content scatters in plaintext. You don't need the body to know the cost.

What to do next

Start by wrapping just your single most-frequent API call in meteredCreate and giving it one stage. A JSON Lines append is a fine destination for today. Run it for a day and you'll have your first real number: what that stage actually cost. You won't want to go back to watching only the Console total.

If you're at the stage of revisiting the allocation itself, reading this alongside revising stage allocation for the monthly-credit move gives you both wheels — measure, and allocate. This is a layer I only added on day one myself, and I'll be tuning it over the next few days as the measurements come in. Thank you for reading.

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.