CLAUDE LABJP
BILLING — The Jun 15 change is now live: Agent SDK, headless runs, GitHub Actions, and third-party agents leave subscription limits for separate monthly credits ($20/$100/$200) metered at full API rates, no rolloverRETIRED — As of today, Sonnet 4 and Opus 4 are retired from the API; scripts referencing older models should switch to the latest generation such as Opus 4.8EXPORT — Claude Fable 5 and Mythos 5 are suspended for all foreign nationals under a US export-control directive (Jun 12); Anthropic calls it a misunderstanding and is working to restore accessSAFE — Only the two new Mythos-class models are affected; every other model including Opus 4.8 keeps running normallySUBAGENTS — Claude Code sub-agents can now spawn their own sub-agents (up to 5 levels), and Dynamic workflows arrived in research previewINCIDENT — A Jun 5 outage raised error rates across claude.ai, the API, Claude Code, and Cowork, a reminder to design retries and fallbacks into automated runsBILLING — The Jun 15 change is now live: Agent SDK, headless runs, GitHub Actions, and third-party agents leave subscription limits for separate monthly credits ($20/$100/$200) metered at full API rates, no rolloverRETIRED — As of today, Sonnet 4 and Opus 4 are retired from the API; scripts referencing older models should switch to the latest generation such as Opus 4.8EXPORT — Claude Fable 5 and Mythos 5 are suspended for all foreign nationals under a US export-control directive (Jun 12); Anthropic calls it a misunderstanding and is working to restore accessSAFE — Only the two new Mythos-class models are affected; every other model including Opus 4.8 keeps running normallySUBAGENTS — Claude Code sub-agents can now spawn their own sub-agents (up to 5 levels), and Dynamic workflows arrived in research previewINCIDENT — A Jun 5 outage raised error rates across claude.ai, the API, Claude Code, and Cowork, a reminder to design retries and fallbacks into automated runs
Articles/API & SDK
API & SDK/2026-06-15Advanced

On the day the billing change took effect, I added per-stage cost metering to my headless runs

The June 15 billing change moved headless runs and agent delegation onto monthly credits. Here is a thin metering layer that records token usage per stage tag from response.usage and emits a daily cost report, with working code.

claude-agent-sdk6headless8cost-control2meteringbilling-changeobservability10

Premium Article

The move to monthly credits took effect today, and my automated publishing pipeline's headless runs went from "flat inside the subscription" to "every call eats credits." I had already revised which stages run where before the cutover, but the thing I realized on day one was that I had never actually measured how much each stage was spending.

As an indie developer running four sites, I had estimates. But an estimate is an estimate, not a measurement. If I'm going to allocate non-rolling credits across a month, I need a ledger that says "this stage actually burned this much last week," not a guess. So on the first day I slipped in a thin metering layer that records each API call's token usage by stage tag and converts it to a cost figure daily. Here is what's inside it.

This isn't a story about a fancy observability stack. The goal is the cheapest possible path to a state where I can later interrogate cost by stage, without rewriting the calling code.

Why the Console billing screen alone isn't enough

The Anthropic Console usage view shows account-wide and per-API-key consumption. But in my setup, a single API key runs stages with very different characters mixed together: article generation, quality-gate checks, news fetching, translation sync. Opening the Console, I can't tell whether this week's biggest credit sink was generation or a quality gate that kept retrying.

Monthly credits don't roll over. That means if you can't identify in advance which stage will run short at month's end, a low-priority stage can quietly consume a high-priority one's credits without you noticing. At the Console's granularity, this "per-stage contention" is invisible — and that was the real problem.

What I needed was a ledger that records usage with a stage name on every call, so I can later ask "what's the running monthly total for stage=quality-gate." Only the application can build that.

Don't drop tokens from response.usage

The foundation of metering is the usage object included in every response. Here's the first trap I hit: usage is not just two fields for input and output. If you use prompt caching, you actually get back four kinds of tokens.

// The actual shape of usage in the Anthropic SDK response
// (cache_* become non-zero when caching is in play)
type RawUsage = {
  input_tokens: number;                  // non-cached input
  output_tokens: number;                 // generated output
  cache_creation_input_tokens?: number;  // writes to cache (premium-priced)
  cache_read_input_tokens?: number;      // reads from cache (heavily discounted)
};

If you sum only input_tokens + output_tokens, the cache-write and cache-read tokens fall out of the ledger entirely. Cache writes carry a premium over normal input and cache reads are much cheaper — an asymmetric pricing structure — so unless you keep all four separately, your cost conversion won't reconcile. I first aggregated only two fields, and my measured numbers stubbornly refused to match the Console bill. That cost me half a day.

So I put a normalizer up front that always keeps the four kinds in separate fields.

// Fill missing fields with 0 and always produce all four kinds
export function normalizeUsage(raw: Partial<RawUsage> | undefined) {
  return {
    input: raw?.input_tokens ?? 0,
    output: raw?.output_tokens ?? 0,
    cacheWrite: raw?.cache_creation_input_tokens ?? 0,
    cacheRead: raw?.cache_read_input_tokens ?? 0,
  };
}

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
You'll be able to explain where this month's credits actually went, backed by measured per-stage data instead of guesses
You can drop in a metering wrapper that captures all four token kinds from response.usage (input, output, cache-write, cache-read) and converts them to cost
You'll be able to decide which stages stay headless and which move back to your subscription, justified by real credit consumption
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API & SDK2026-04-24
Shadow Mode with Claude Agent SDK — Measuring Agent Accuracy on Live Traffic Without Touching Users
You want to ship an AI agent to production, but you can't measure its real accuracy without exposing real users. Shadow mode solves that paradox. This guide shows how to run a Claude Agent SDK agent alongside your existing workflow, log the deltas, and promote it step by step.
API & SDK2026-06-14
Making Claude Agent SDK Tools Idempotent — Stopping Double Execution with Deterministic Keys and an Outbox
An implementation log for stopping a Claude Agent SDK retry or session resume from processing the same payment twice. Three patterns — deterministic idempotency keys, an outbox, and a lightweight wrapper — with runnable code and production metrics.
API & SDK2026-06-14
Record Which Model Actually Answered — Attestation Logging for Headless Pipelines
Persist the model field and usage from every API response so you can detect when the served model differs from the one you requested, and reconcile per-model cost ahead of the usage credits change.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →