We Collected Plenty of Claude Code Usage Data — Then Budgeting Still Didn't Work. Field Notes on Cost Attribution and Idle Seats

The Claude Code Analytics API gives you data, but data alone doesn't run a budget. Here are field notes on turning raw usage logs into cost attribution, idle-seat detection, and alerts that don't cry wolf — with working code and the thresholds we actually use.

claude-code¹²³ analytics³ cost monitoring⁷ team budget²

✦ Premium Article

The data was there, but the meeting couldn't use it

When we rolled out the Claude Code Analytics API, the first thing I built was a dashboard of daily token consumption and cost. The numbers came out cleanly. But every monthly budget meeting stalled on the same question: "This figure — which work did it pay for?"

You can see per-seat consumption. The trouble is that seats and work aren't one-to-one. One person spans several repositories, spends one week firefighting an incident and the next writing a feature in a sprint. Per-seat numbers answer "who used it"; what budgeting actually wants to know is "what it was used on." Running several apps and blogs in parallel as an indie developer myself, I'd already learned that cost has to be tied to work, not to people, before it becomes something you can act on.

This article walks through the three stages it took to turn the API's raw data into something usable in a budget meeting — attributing cost to workstreams, detecting idle seats, and building alerts that don't go stale — with the code we actually run. The focus isn't setup; it's the part where you get stuck after the data is already flowing.

First, pin down the granularity of the raw data

The Analytics API returns metrics per day. The first job is to understand exactly what granularity comes back. Get vague here and every downstream cost attribution drifts.

import os
import httpx
 
BASE = "https://api.anthropic.com/v1/organizations/usage_report/claude_code"
 
def fetch_daily(starting_at: str, ending_at: str) -> list[dict]:
    """Fetch daily Claude Code usage records.
    Note: one record = one day x one user x one model."""
    headers = {
        "x-api-key": os.environ["ANTHROPIC_ADMIN_API_KEY"],
        "anthropic-version": "2023-06-01",
    }
    records, page = [], None
    with httpx.Client(timeout=30) as client:
        while True:
            params = {"starting_at": starting_at, "ending_at": ending_at, "limit": 1000}
            if page:
                params["page"] = page
            r = client.get(BASE, headers=headers, params=params)
            r.raise_for_status()
            body = r.json()
            records.extend(body["data"])
            page = body.get("next_page")
            if not page:
                break
    return records

Two things matter here. First, each record is a "day x user x model" tuple. If the same person uses both Sonnet and Opus on the same day, their single day splits into two records. Aggregate over the wrong key and a mixed-model seat double-counts or quietly drops cost.

Second, always page all the way through. A larger limit won't return everything; you have to loop until next_page is null. On a large team near month-end, skipping this silently loses a few days of data. I missed this early on and once argued a budget off the first-of-month numbers alone — and was wrong.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦An aggregation that attributes cost to workstreams instead of seats, and why that framing changes the conversation

✦Idle-seat logic that surfaces the seats you pay for but nobody actually uses, every week

✦A three-part alert threshold (week-over-week, moving average, absolute floor) that stays meaningful instead of going stale

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Stage 1: attribute cost to work, not to seats

What worked in the budget meeting was re-pointing cost at "repository groups = workstreams" rather than at people. The Analytics API alone doesn't expose fine-grained repository data, so we join it against an internal ledger of who primarily owns which project. The ledger doesn't need to be perfect. If 80% of the primary owners are right, the discussion moves.

from collections import defaultdict
 
# Ledger: user -> workstream (their primary project)
SEAT_TO_LINE = {
    "alice@example.com": "Payments platform",
    "bob@example.com": "Mobile app",
    "carol@example.com": "Mobile app",
    "dave@example.com": "Internal tools",
}
 
# Approximate unit price per model (USD per 1M tokens; update from your billing actuals)
PRICE_PER_MTOK = {
    "claude-opus-4-8": {"input": 15.0, "output": 75.0},
    "claude-sonnet-4-6": {"input": 3.0, "output": 15.0},
}
 
def cost_of(record: dict) -> float:
    model = record.get("model", "")
    price = PRICE_PER_MTOK.get(model)
    if not price:
        return 0.0
    inp = record.get("input_tokens", 0) / 1_000_000
    out = record.get("output_tokens", 0) / 1_000_000
    return inp * price["input"] + out * price["output"]
 
def cost_by_line(records: list[dict]) -> dict[str, float]:
    totals = defaultdict(float)
    for rec in records:
        line = SEAT_TO_LINE.get(rec.get("actor_email"), "Unassigned")
        totals[line] += cost_of(rec)
    return dict(sorted(totals.items(), key=lambda kv: kv[1], reverse=True))

Once this aggregation was in place, the meeting's question shifted from "who's expensive" to "which workstream are we investing in." The latter isn't an accusation; it's a decision. Being able to say "the payments platform carried 40% of last month — is that intentional?" turns the cost conversation forward-looking.

When "Unassigned" starts growing, that's your signal to refresh the ledger. I use a loose rule: if unassigned exceeds 15% of the total, revisit it. Keeping a way to detect the drift lasts longer than chasing a clean 0%.

Stage 2: surface the seats you pay for but nobody uses

The biggest cost win wasn't trimming expensive seats — it was returning seats nobody used. Because Claude Code involves per-seat billing, an inactive seat is just fixed cost. Session counts and lines-changed from the Analytics API let you flag idle seats mechanically.

def idle_seats(records: list[dict], roster: set[str], days: int = 14) -> list[dict]:
    """Return seats with little or no activity over the last `days`.
    Rule: zero sessions, or (sessions exist but very few lines added)."""
    activity = defaultdict(lambda: {"sessions": 0, "lines_added": 0})
    for rec in records:
        a = activity[rec.get("actor_email")]
        a["sessions"] += rec.get("num_sessions", 0)
        a["lines_added"] += rec.get("lines_added", 0)
 
    flagged = []
    for email in roster:
        a = activity.get(email, {"sessions": 0, "lines_added": 0})
        if a["sessions"] == 0:
            flagged.append({"email": email, "reason": "completely idle", **a})
        elif a["lines_added"] < 20:
            flagged.append({"email": email, "reason": "barely active", **a})
    return sorted(flagged, key=lambda x: (x["sessions"], x["lines_added"]))

The two-stage rule is deliberate, because session count alone misleads you. A seat that only logs in and does no real work looks "used" by session count. Adding lines-added as a second angle surfaces the "opened it but didn't write" seats.

One caution: don't use the idle list for performance reviews. Keep it strictly as material for reallocating seats. Parental leave, a long design phase, a review-only role — there are plenty of legitimate reasons not to write code. I always add a "one-line reason" column to this list so a human's context can override the machine's verdict. The numbers start a conversation; they don't end one.

Stage 3: design alerts that don't go stale from false positives

My first cost alert was "notify when a day's spend crosses a threshold." Within two weeks nobody looked at it, because the natural dip on weekends and rise on weekdays set it off constantly. An alert is worthless unless you can keep a state where firing always means acting.

So I combined three conditions and switched to notifying only when several fire at once — not any single one.

Condition	What it catches	Threshold we use
vs. same weekday last week	A spike with the weekday cycle removed	+60% or more
Deviation from 7-day moving average	A slow upward trend	1.5x the average or more
Absolute floor	Suppresses noise while amounts are small	Ignore days under $50

def should_alert(today_cost: float, same_dow_last_week: float, ma7: float) -> bool:
    if today_cost < 50:                       # stay quiet while amounts are small
        return False
    spike_vs_lastweek = same_dow_last_week > 0 and today_cost >= same_dow_last_week * 1.6
    spike_vs_trend = ma7 > 0 and today_cost >= ma7 * 1.5
    return spike_vs_lastweek and spike_vs_trend

The absolute floor made a real difference. Early in a rollout, or on a small team, a ratio doubles easily but the dollar amount is rounding error. With ratio conditions alone, you'd fire constantly during ramp-up and lose trust.

Joining the two ratio conditions with and is also deliberate. Week-over-week alone fires after a holiday; the moving average alone fires from a pre-long-weekend rush. Both firing together only happens for "a trend-level spike that isn't a weekday effect or a temporary wave." In practice, the fewer alerts you emit, the more each one is worth.

Looking back

The Analytics API returns raw numbers, and on their own they don't solve budgeting. What worked was: (1) attributing cost to workstreams instead of people to keep the discussion forward-looking, (2) surfacing unused seats weekly rather than chasing expensive ones, and (3) narrowing alerts to multiple conditions firing together to kill false positives. None of these are flashy features — they're a thin translation layer between raw data and the context on the ground.

As a first step, I'd suggest putting a single spreadsheet ledger of workstreams together and running just cost_by_line. The moment you can see "how much each kind of work costs," you'll naturally know whether you need the other two. I hope these notes help if you're stuck on cost management for a team rollout too.

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.