⬡ API & SDK/2026-06-14Advanced

Record Which Model Actually Answered — Attestation Logging for Headless Pipelines

Persist the model field and usage from every API response so you can detect when the served model differs from the one you requested, and reconcile per-model cost ahead of the usage credits change.

Claude API⁶⁸ headless⁷ cost management³ logging² Fable 5

✦ Premium Article

Last month I reconciled my automated content pipeline's API bill against my estimate and found a few hundred yen I couldn't account for. The call count matched my logs. The token counts matched. Only the total was off. When I dug in, the culprit was that the model I had requested and the model that actually answered diverged on a small slice of requests. I was logging the output text and the token counts, but not which model produced the response — so pinpointing where the gap came from took me half a day.

If you run Claude headless, you may have hit something similar. I had assumed that because I pin model on every call, the response naturally comes from that pinned model. In reality, the model field inside the response is the one that gets billed, and it is not guaranteed to match the string you sent. This article walks through recording the served model on every call and reconciling it against both cost and quality, using the code I shipped into my own pipeline.

When the bill didn't match the estimate

My pipeline generates content for four sites and fires roughly 480 requests a day — about 14,000 calls a month. I was already storing each call's prompt, output, and input/output token counts as JSON Lines. Estimating "sum of input tokens × price + sum of output tokens × price" should have landed close to the invoice.

For June 2026 it didn't. The total ran higher than my estimate. Dividing back down to individual calls, a small fraction looked like they were billed at a higher rate than the model I thought I was using. My logs only held the requested model name, so I had no way to prove, after the fact, which model's rate each charge belonged to. That was the starting point.

The lesson is blunt: the model the response declares — not the one you requested — is the truth about cost. And starting June 15, the move to usage credits makes per-model rate differences flow straight into the bill. Being able to explain drift after the fact matters more now than it ever did.

The response already tells you which model answered

The Messages API response body has always included a model field and a usage object. This is not an echo of your request; it is the server declaring which model produced this response. Most implementations pull out the text and throw the rest away — but that is exactly where cost reconciliation lives.

import Anthropic from "@anthropic-ai/sdk";
 
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
 
const res = await client.messages.create({
  model: "claude-fable-5",          // the model I requested
  max_tokens: 4096,
  messages: [{ role: "user", content: "Draft an article for me" }],
});
 
console.log(res.model);             // the model that actually answered (billing basis)
console.log(res.usage);             // { input_tokens, output_tokens, ... }
console.log(res.id);                // a unique ID per request

res.model does not always equal the "claude-fable-5" I asked for. That is the point. When it matches, you're fine; when it differs, it becomes the entry point for asking why. Keep res.id too — it's the correlation key for support tickets and reproduction work.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦An implementation that persists the response model field and usage on every call to catch drift between requested and served model

✦A reconciliation function that compares cost by actual served model after the move to usage credits

✦A monitoring gate that surfaces request-versus-reality drift early instead of at month end

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Why request and reality diverge — fallback is normal

Drift is rarely a malfunction; most of it is behavior working as designed. Three common sources stand out.

First, the model's own safety fallback. Fable 5 (the Mythos class) is documented to block responses in high-risk domains such as cybersecurity or biology and chemistry, falling back to Claude Opus 4.8. It is said to trigger on under 5% of sessions — not zero. Even harmless article generation like mine can have a response come back from a different model if any term in the prompt brushes the boundary.

Second, client-side fallback configuration. Claude Code offers fallbackModel (up to three models tried in order), which switches to a lower tier automatically under load. If you have that enabled in headless runs, the served model changes at the moment of overload.

Third, aliases and version resolution. If you use an alias like claude-3-5-sonnet-latest, the resolved concrete version lands in res.model. The request and response strings not matching character-for-character is, in this case, perfectly normal.

None of these mean something is broken. The problem isn't that drift happened — it's that not recording it leaves you unable to explain it later.

Before / After — a log that records the served model

Here's the naive log I started with. Output and token counts survive, but neither the served model, the requested model, nor the request ID do.

// Before: only the output survives
async function generate(prompt: string) {
  const res = await client.messages.create({
    model: "claude-fable-5",
    max_tokens: 4096,
    messages: [{ role: "user", content: prompt }],
  });
  const text = res.content[0].type === "text" ? res.content[0].text : "";
  appendJsonl("gen.log", { text, tokens: res.usage.output_tokens });
  return text;
}

This log can't prove which model was billed at month end. Here's the version that records the served model on every call — requested model, served model, usage, ID, and timestamp, one line each. I call this the attestation log.

// After: keep both the requested and the served model
import { appendFileSync } from "node:fs";
 
type Attestation = {
  at: string;            // ISO8601
  requestId: string;     // res.id
  requestedModel: string;
  servedModel: string;   // res.model
  inputTokens: number;
  outputTokens: number;
  drift: boolean;        // requested !== served
};
 
async function generate(prompt: string, requestedModel = "claude-fable-5") {
  const res = await client.messages.create({
    model: requestedModel,
    max_tokens: 4096,
    messages: [{ role: "user", content: prompt }],
  });
 
  const rec: Attestation = {
    at: new Date().toISOString(),
    requestId: res.id,
    requestedModel,
    servedModel: res.model,
    inputTokens: res.usage.input_tokens,
    outputTokens: res.usage.output_tokens,
    drift: res.model !== requestedModel,
  };
  appendFileSync("attestation.jsonl", JSON.stringify(rec) + "\n");
 
  return res.content[0].type === "text" ? res.content[0].text : "";
}

The diff is just two fields — servedModel and drift — but the explanatory power changes dramatically. Filter to the rows where drift is set and you can locate, in seconds, which requests produced the month-end gap. If you use aliases, drift will always be set, so reconcile by model family rather than exact string match — covered next.

Reconcile cost by the served model's rate

With the attestation log, you can put the requested-model estimate and the served-model actual cost side by side. Rates differ sharply by model. Fable 5's API rate, for example, is published at $10 per million input tokens and $50 per million output. A fallback target has a different rate, so even a sub-5% trigger shows up in the total.

Keep the rate table as constants in code and refresh them from the pricing page — reconciling against stale numbers causes its own incidents, so I check at the start of each month.

// Prices are USD per million tokens. Always confirm current values on the pricing page.
const PRICE: Record<string, { in: number; out: number }> = {
  "claude-fable-5": { in: 10, out: 50 },
  "claude-opus-4-8": { in: 0, out: 0 },   // fill from the pricing page
};
 
function costUsd(model: string, inTok: number, outTok: number): number {
  const p = PRICE[model] ?? { in: 0, out: 0 };
  return (inTok / 1_000_000) * p.in + (outTok / 1_000_000) * p.out;
}
 
function reconcile(records: Attestation[]) {
  let estimated = 0;   // by requested model (pre-estimate)
  let actual = 0;      // by served model (real cost)
  for (const r of records) {
    estimated += costUsd(r.requestedModel, r.inputTokens, r.outputTokens);
    actual += costUsd(r.servedModel, r.inputTokens, r.outputTokens);
  }
  return { estimated, actual, gap: actual - estimated };
}

A positive gap is the portion that drifted to a higher-rate model, via fallback or otherwise. In my June data, drift was set on about 1.3% of the 14,000 calls — yet that difference explained almost the entire gap against my estimate. Being able to say "1% of calls caused this" with a number behind it was the real relief.

Build drift rate into your quality gate

Leave reconciliation until month end and you'll notice a spike in abnormal fallbacks too late. I added one threshold check on drift rate to the quality gates I run before push. If the drift rate over a recent window exceeds expectations, I don't halt generation — I emit a warning and prompt a human look.

function driftGate(records: Attestation[], windowSize = 500, threshold = 0.05) {
  const recent = records.slice(-windowSize);
  if (recent.length === 0) return { ok: true, rate: 0 };
  const drifted = recent.filter((r) => r.drift).length;
  const rate = drifted / recent.length;
  if (rate > threshold) {
    console.warn(`drift rate ${(rate * 100).toFixed(1)}% > ${threshold * 100}%`);
    const byModel = new Map<string, number>();
    for (const r of recent.filter((x) => x.drift)) {
      byModel.set(r.servedModel, (byModel.get(r.servedModel) ?? 0) + 1);
    }
    console.warn("served breakdown:", Object.fromEntries(byModel));
  }
  return { ok: rate <= threshold, rate };
}

Not force-stopping generation here is deliberate. Fable 5's safety fallback is normal behavior, and sometimes you just hit more boundary-adjacent terms than usual. The gate's job is awareness, not a kill switch. I start the threshold at 5% in my environment and tune it against the steady-state reading (about 1.3%).

Where it hooks into headless / the Agent SDK

The implementation key is to write the attestation in exactly one place: the outermost wrapper around your API client. Call messages.create directly in scattered spots and you will inevitably miss records. I route every client call through a single function that always writes the attestation.

// every generation goes through this one function
export async function callClaude(params: Anthropic.MessageCreateParams) {
  const res = await client.messages.create(params);
  appendFileSync("attestation.jsonl", JSON.stringify({
    at: new Date().toISOString(),
    requestId: res.id,
    requestedModel: typeof params.model === "string" ? params.model : "",
    servedModel: res.model,
    inputTokens: res.usage.input_tokens,
    outputTokens: res.usage.output_tokens,
    drift: res.model !== params.model,
  }) + "\n");
  return res;
}

With the Agent SDK the idea is the same: receive the requests that each tool execution or delegation step issues internally through one shared measurement layer. When you nest subagents, it gets hard to see which level answered with which model, so storing the requestId alongside a level ID makes the follow-up investigation far easier.

Your next step

I recommend rolling this out in three steps.

Add two fields to your existing generation log: servedModel (res.model) and requestId (res.id). The code change is tiny, but it turns your month-end bill from a mystery into something you can explain.
Accumulate a week of attestations, run reconcile, and learn your environment's steady-state drift rate alongside the requested-versus-served cost gap.
Add driftGate to your pre-push gates and tune the threshold against your measured steady state.

Those three steps alone prepare you for per-model billing from June 15 onward. Even at the scale of an indie developer running the whole pipeline solo, the measurement layer collapses into a single place, so the cost to adopt stays small.

Just record, quietly, the model the response declared. That one habit turns cost mismatches from a puzzle into an identified line item. Since adding these two fields, my own month-end reconciliation became a quick confirmation rather than an investigation. I hope it helps if you're running headless too.

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.